Hadoop Tutorial

Posts

Showing posts from April, 2017

April 27, 2017

Difference Between NoSql Cassandra and Apache Hadoop What is Cassandra? Cassandra is the NoSql Database and it handles the more amount of data between multiple servers. It serves data from database to online transactional applications and business intelligence because cassandra is the open source database. Cassandra created by facebook and designed for peer to peer nodes. It partitions the data across hadoop cluster and counts the copy of data from the database. What is Hadoop? Hadoop is an open source framework which is used to store large amount of data sets.Hadoop is provided for data storage,data access,data processing and security operations. Many organizations are used hadoop for storage purpose because Hadoop storing large amount of data quickly. Difference Between NoSql Cassandra and Apache Hadoop: S.No NoSql Cassandra Apache Hadoop 1 Cassandra is the no NoSql database and mainly used for architecture and handle more amount of data between multiple ...

April 24, 2017

CAP Theorem in Hadoop What is CAP Theorem? CAP theorem is designed for distributed file systems(collection of interconnected nodes).CAP Theorem also known as Brewer’s theorem and used to distributed consistency.It contains follwing three technical terms for distributed systems. C – Consistency A – Availability P – Partition Tolerance Consistency: When you read data it will give same data how many times read and server send response each and every request but systems always consistent when read data.(all node having same data) Availability: It means all requests give response and no error accured in this systems. Partition Tolerance: All functions run all time when more nodes not responsive and commnication break between two nodes Distributed systems statisfy any two terms only and not satisfy three terms Selecting Two options in CAP Theorem: CP – Consistency/Partition Tolerance: It wait for response form partioned nodes and that ...

April 19, 2017

How to Install Hadoop on Ubuntu What is Hadoop? Hadoop is the open source and java based framework.It is used to storing lage amount amount of data and having more components to accessing the data.In Hadoop installation java is most important because hadoop is java based framework.Here we are discuss about how to install hadoop on Ubuntu operating system. Hadoop Having following three main layers 1.HDFS – Used to stores the Large amount of data that stored file system are runs on Hadoop cluster machines. 2.MapReduce – Used to Processing the large amount of data set in the form of key /value pair. 3.Yarn – Responsible for managing resources in cluster and scheduling applications.\ Steps to Install Java: Step 1: Click here to download Java Hadoop Programming are written in java so java installation are most important to hadoop. Step 2: Comment for install java Comment: $ sudo apt-get update $ sudo apt-get install openjdk-8-jre $...

April 16, 2017

Hadoop Ecosystem Tutorial Meaning of Hadoop Ecosystem: Hadoop ecosystem is not a service and programming , Hadoop ecosystem is the one type of platform which used to processing a large amount of Hadoop Data.Hadoop ecosystem using HDFS and MapReduce for Storing and processing the large amount of data and also used Hive for querying the data.Hadoop Ecosystem consists of following three different types of data Structured Data – Data having clear structure which can be stored at tablular form Semi Structured Data – Data having some structure which cannot stored at tabular form UnStructured Data – Data doesnot having any structure which cannot stored data at tabular form Click Here to read Main Components of Hadoop Ecosystem

April 14, 2017

How to Get Hadoop Developer Jobs Who is Hadoop Developer? Hadoop Developer role is similar to Software Developer.Responsible of hadoop developer is programming and develop the hadoop applications and all components of hadoop ecosystem.Here we discuss about main roles and responsibilities to become a hadoop Developer. Click Here to Re ad Full Article

April 09, 2017

What is MapReduce in Hadoop? MapReduce is the one of the processing tool of Hadoop and it processing large amount of data.It divides a main tasks into subtasks and it processing at parallel.Programmers are written program at MapReduce and its automattically parallelized.Mapreduce having one component called driver and it used to initializing job to mapreduce.MapReduce contains follwing tasks 1.Map 2.Reduce Click Here to Read Full Article

April 05, 2017

Top Two Use Cases of Hadoop Introduction: In this world many companies are using Hadoop for data storage because Hadoop storing and analysing large amount of data.Main services of hadoop is stores any type of data from any source.Here we will discuss about main use cases of hadoop. Top Two Use Cases of Hadoop are 1.Financial Service Use Case 2.Healthcare Use Case Click Here to Read Full Article

April 02, 2017

Hadoop Cluster Architecture and Core Components What is Hadoop Cluster? Cluster means Many Computers are worked together as one system.Hadoop Cluster means Computer Cluster used at Hadoop. Hadoop Cluster Mainly designed for storing large amount of unstructed data in Distributed file systems. It referred as “Shared Nothing” Systems and shared data between nodes. Hadoop Clusters are Arranged in racks and it having three nodes which is worker node,master node and Client nodes. Hadoop Cluster Architecture: Hadoop Cluster Having 110 racks and that racks having slave machines. One Rack Switch are placed on top of the Each rack Slave Machines are connected as cables which connected on rack switch. Rack Switch contains 80 Ports NameNode and Job tracker of hadoop cluster are refered as slaves only. Components of Hadoop: Hadoop Cluster has three core Components Client Master Slave Client: Main purpose of Client are submitted the Ma...