Top Five Hadoop Interview Questions
1.What do the four V’s of Big Data denote?
Volume: The volume represents the amount of data which is growing at an exponential rate i.e. in Petabytes and Exabytes.
Velocity: Velocity refers to the rate at which data is growing, which is very fast. Today, yesterday’s data are considered as old data. Nowadays, social media is a major contributor in the velocity of growing data.
Variety: Variety refers to the heterogeneity of data types. In another word, the data which are gathered has a variety of formats like videos, audios, csv, etc. So, these various formats represent the variety of data.
Veracity: Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and incompleteness.
2.What are the most common Input Formats in Hadoop?
There are three most common input formats in Hadoop:
- Text Input Format: Default input format in Hadoop.
- Key Value Input Format: used for plain text files where the files are broken into lines
- Sequence File Input Format: used for reading files in sequence
3.What is a checkpoint?
In brief, “Checkpointing” is a process that takes an FsImage, edit log and compacts them into a new
FsImage. Thus, instead of replaying an edit log, the NameNode can load the final in-memory state directly from the FsImage. This is a far more efficient operation and reduces NameNode startup time. Checkpointing is performed by Secondary NameNode.
4.What are the main components of a Hadoop Application?
Hadoop applications have wide range of technologies that provide great advantage in solving complex business problems.
Core components of a Hadoop application are-
- Hadoop Common
- HDFS
- Hadoop MapReduce
- YARN
- Data Access Components are - Pig and Hive
- Data Storage Component is - HBase
- Data Integration Components are - Apache Flume, Sqoop, Chukwa
- Data Management and Monitoring Components are - Ambari, Oozie and Zookeeper.
- Data Serialization Components are - Thrift and Avro
- Data Intelligence Components are - Apache Mahout and Drill
5.What are the core methods of a Reducer?
setup(): this method is used for configuring various parameters like input data size, distributed cache.- public void setup (context)
- public void reduce(Key, Value, context)
public void cleanup (context)
Are you Interested to learn Hadoop - Click Here
Comments
Post a Comment