Bda 11
Bda 11
&
DATASCIENCE
Experiment No. 11
Title: Use of Sqoop tool to transfer data between Hadoop and relational database servers.
a) Sqoop – Installation
Objectives:
To gain hands-on experience in various aspects of Sqoop tool ,installation and commands.
Theory:
What is Sqoop?
Apache Sqoop is a tool designed for efficiently transferring bulk data between
Apache Hadoop and external datastores such as relational databases, enterprise data
warehouses.
Sqoop is used to import data from external datastores into Hadoop Distributed File System or
related Hadoop eco-systems like Hive and HBase. Similarly, Sqoop can also be used to
extract data from Hadoop or its eco-systems and export it to external datastores such as
relational databases, enterprise data warehouses.
Sqoop works with relational databases such as Teradata, Netezza, Oracle, MySQL, Postgres
etc.
Why is Sqoop used?
For Hadoop developers, the interesting work starts after data is loaded into HDFS.
Developers play around the data in order to find the magical insights concealed in that Big
Data. For this, the data residing in the relational database management systems need to be
transferred to HDFS, play around the data and might need to transfer back to relational
database management systems.
Sqoop automates most of the process, depends on the database to describe the schema of the
data to be imported.
Sqoop uses MapReduce framework to import and export the data, which provides parallel
mechanism as well as fault tolerance. Sqoop makes developers life easy by providing
command line interface. Developers just need to provide basic information like source,
destination and database authentication details in the sqoop command. Sqoop takes care of
remaining part.
Basic Commands:
Sqoop import command imports a table from an RDBMS to HDFS. Each record from a table is
considered as a separate record in HDFS. Records can be stored as text files, or in
binary representation as Avro or SequenceFiles.
Generic Syntax:
$ sqoop import (generic args) (import args)
5. Incremental Exports
Syntax:
$ sqoop import --connect --table --username --password --incremental --check-column --
last- value Sqoop import supports two types of incremental imports:
1. Append
2. Lastmodified.
6. Sqoop-Eval
Sqoop-eval command allows users to quickly run simple SQL queries against a database
and the results are printed on to the console. Generic Syntax:
$ sqoop eval (generic args) (eval args)
7. Sqoop-List-Database
Used to list all the database available on RDBMS server. Generic Syntax:
$ sqoop list-databases (generic args) (list databases args)
$ sqoop-list-databases (generic args) (list databases args)
Syntax:
$ sqoop list-databases –connect
8. Sqoop-List-Tables
Used to list all the tables in a specified database. Generic Syntax:
$ sqoop list-tables (generic args) (list tables args)
$ sqoop-list-tables (generic args) (list tables args)
Syntax:
$ sqoop list-tables –connect
Conclusion:Thus, Data Visualization using R/HIVE was performed successfully.
R1 R2 R3
DOP DOS Conduction File Record Viva -Voce Total Signature
5 Marks 5 Marks 5 Marks 15 Marks