How to Import Data From MySql to Hadoop Using Sqoop

Sqoop is the basic data transfer tool and used to import/export data from Relational Database into hadoop. Sqoop is able to import Teradata and Other JDBC Databases. For Hadoop integration sqoop installation is most important so first install the sqoop on Hadoop.

Before accessing the MySql you have to make two changes in MySql db
1. Enable remote access for database:
Step 1: Open the file
vim /etc/mysql/my.cnf
Step 2: Change Bind address localhost to IP address
bind-address = <IP Address>
Step 3: Restart MySql
/etc/init.d/mysqld restart
2. Create user for all nodes:
Step 1: Connect with root password
mysql -uroot -p<root_pass>
Step 2: Create users
create user  ‘hadoop’@'<ip of master>’ IDENTIFIED BY ‘hadoop’;
(First ‘hadoop’ is username and second ‘hadoop’ is password)
Sqoop Installation:
Step 1: Sqoop install from Apche sqoop Link – http://www.apache.org/dyn/closer.cgi/sqoop/
Step 2: Set the Hadoop_Home
Step 3: Start the Sqoop
$SQOOP_HOME/bin/sqoop
Sqoop Import:
Sqoop import requires connection string (username and password) for accessing the mysql data. In Sqoop import command line are used to accessing the database value.
Step 1: Connect Database
connect jdbc:mysql://<ip address of mysql server> :<port of mysql>/<database_name>
Step 2: Import Data using Database username and password
–username hadoop –password hadoop
Step 3: Import MySql Table
sqoop import-all-tables –connect jdbc:mysql:
Table name is which table is imported. In sqoop you can access MySql queries also.
Step 4: Split the table data
–split-by <field to split the data> -m 8
Split by option is used to import the data in parallel. Split by is splits the data for all mapper and in this code m8 is the number of mappers. Sqoop calculate the MIN and MAX value field and create the mappers for this values.
Above process are runs correctly you can integrate the database successfully..

Comments

Popular posts from this blog