Hadoop Tutorial

May 04, 2017

Top Five Hadoop Interview Questions

1.What do the four V’s of Big Data denote?

Volume: The volume represents the amount of data which is growing at an exponential rate i.e. in Petabytes and Exabytes.

Velocity: Velocity refers to the rate at which data is growing, which is very fast. Today, yesterday’s data are considered as old data. Nowadays, social media is a major contributor in the velocity of growing data.

Variety: Variety refers to the heterogeneity of data types. In another word, the data which are gathered has a variety of formats like videos, audios, csv, etc. So, these various formats represent the variety of data.

Veracity: Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and incompleteness.

2.What are the most common Input Formats in Hadoop?

There are three most common input formats in Hadoop:

Text Input Format: Default input format in Hadoop.

Key Value Input Format: used for plain text files where the files are broken into lines

Sequence File Input Format: used for reading files in sequence

3.What is a checkpoint?

In brief, “Checkpointing” is a process that takes an FsImage, edit log and compacts them into a new

FsImage. Thus, instead of replaying an edit log, the NameNode can load the final in-memory state directly from the FsImage. This is a far more efficient operation and reduces NameNode startup time. Checkpointing is performed by Secondary NameNode.

4.What are the main components of a Hadoop Application?

Hadoop applications have wide range of technologies that provide great advantage in solving complex business problems.

Core components of a Hadoop application are-

Hadoop Common
HDFS
Hadoop MapReduce
YARN
Data Access Components are - Pig and Hive
Data Storage Component is - HBase
Data Integration Components are - Apache Flume, Sqoop, Chukwa
Data Management and Monitoring Components are - Ambari, Oozie and Zookeeper.
Data Serialization Components are - Thrift and Avro
Data Intelligence Components are - Apache Mahout and Drill

Click Here to read Full details about Hadoop Components

5.What are the core methods of a Reducer?

setup(): this method is used for configuring various parameters like input data size, distributed cache.

public void setup (context)

reduce(): heart of the reducer always called once per key with the associated reduced task

public void reduce(Key, Value, context)

cleanup(): this method is called to clean temporary files, only once at the end of the task
public void cleanup (context)

Are you Interested to learn Hadoop - Click Here

Useful Hadoop Tutorials

Search This Blog

Hadoop Tutorial

Top Five Hadoop Interview Questions

Comments

Post a Comment

Popular posts from this blog