Posts

Showing posts from May, 2017
Image
Top Ten Differences Between Apache Hive and Hbase S.NO Apache Hive Apache Hbase 1 Hive is Datawarehousing tool and used to process the data in hadoop and HDFS.Hive is similar to SQL because it analyze and process the data with querying language. Apache Hbase is open source framework and it is a NoSql Database. 2 Hive runs on MapReduce and top of the Hadoop Hbase runs on  top of the HDFS 3 Main Purpose of hive is analyze,querying and processing the datasets Main Purpose of hbase is read and write more number of data sets 4 In Hive Metastore,Tables,Partitions and Buckets are used to data storage. In Hbase data are stored in Column and Row wise of tables 5 Update the data is complicated in Hive Hbase is easily update the data using queries 6 Hive is not interact with queries for processing the data Hbase is mainly interact with queries language. 7 Metastore,Execution Engine and MapReduce are main components of hive. Master server,Regions,Zookepper  and region server a
Image
Apche Hadoop Flume Tutorial What is Apache Flume? Apache Flume is one tool and used to moving data from one place to another place.Flume is the distributed systems that transporting the data at reliable manner. Flume is most important part of hadoop ecosystem. In Apache flume all data unit consider as one event. It collecting log data from various  web servers to HDFS. Features of Apache Flume: Main Feature of flume is collected data from multiple web servers It import large amount of data that produced by facebook,twitter. It supports Fan-in-Fan-out flows and more amount of sources and destination types. It collects the data from multiple sources and move to destination. Main Components of apche Flume: Event Agent Source Sink Channel Client Click Here to Read Full Explained Article about Flume
Image
Apache Hadoop Hive Tutorial What is Hive? Hive is Datawarehousing tool and used to process the data in hadoop and HDFS.Hive is similar to SQL because it analyze and process the data with querying language.  Hive runs on MapReduce  and top of the hadoop. Hive also known as HiveQL.Main Fuctions of the hive is data summarisation,querying and analysis. Features of Hive: Hive processed data from hadoop and hdfs Hive designed by OLAP It Supports Sql and HiveQl Data Storage of Hive: Metastore Tables Partitions Buckets Click Here to Read Full Explained Article
Top Five Hadoop Interview Questions 1. What do the four V’s of Big Data denote? Volume : The volume represents the amount of data which is growing at an exponential rate i.e. in Petabytes and Exabytes. Velocity : Velocity refers to the rate at which data is growing, which is very fast. Today, yesterday’s data are considered as old data. Nowadays, social media is a major contributor in the velocity of growing data. Variety : Variety refers to the heterogeneity of data types. In another word, the data which are gathered has a variety of formats like videos, audios, csv, etc. So, these various formats represent the variety of data. Veracity : Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and incompleteness. 2. What are the most common Input Formats in Hadoop? There are three most common input formats in Hadoop: Text Input Format:  Default input format in Hadoop. Key Value Input Format:  used for plain text files wh