What Is Apache Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
No class definition found exception in Hadoop while running mapreduce job:
Hadoop is used to process big data sets.
To process big data sets we need to write some maprereduce classes using hadoop api.
To process data in hadoop, it needs a job file which should contain mapreduce classes, dependent lib’s, and a hadoop cluster.
Even though we include all lib’s and successfully execute our class from command line without hodoop cluster, hadoop may throw no class definition found exception if we run in cluster.
Some tips to avoid this problem:
Never include mapreduce class files jar in library of job file if there is a dependency between your mapreduce class and other libraries in your lib. Instead of including your mapreduce class in library, keep mapreduce classes as it in your package directory structure in job.
If we make a jar of maprereduce class files and include it in lib of job file the problem we face is hadoop won’t load other dependent lib jar form job. The reason for this is since we do call only required mapreduce class from job it loads only jar which contain the mapreduce code. So other jar’s wont loaded while we running mapreduce job in hadoop cluster that will give us no class definition found exception even though we included all jar files required in lib of our job file.
Structure of hadoop job file and creating a job file:
For example lets consider job file name as Test.job
Extract content from Test.job in a temporary directory using jar command, and copy Test.job into temporary directory.
$ mkdir temporary
$ cd temporary
$ jar xf Test.job
where com is package name structure which contain all required mapreduce classes, and lib should contain all library files required by the mapreduce classes.
Update job file after making any changes again with jar command as shown below
$ jar cf Test.job * (This will create new Test.job file in your temporary directory)
Lets say our mapreduce class name is TestMapReduce in package com.test.abc
To run TestMapReduce class in hadoop cluster:
$ HADOOP_HOME/bin/hadoop jar Test.job com.test.abc.TestMapReduce <InputPath> <OutputPath>