How to Install Sqoop on Amazon Elastic Map Reduce (EMR)

It is possible to install Sqoop on Amazon EMR. You can use Sqoop to import and export data from a relational database such as MySQL. Here’s how I did it with MySQL. If you are using a different database, you’ll probably need a different JDBC connector for that database.

I’m using Amazon’s Hadoop version 0.20.205, which, I think, was the default. You can see all supported versions of Amazon’s Hadoop here:

http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/usingemr_config_supportedversions.html

I downloaded sqoop-1.4.1-incubating__hadoop-0.20.tar.gz from here: http://www.apache.org/dyn/closer.cgi/sqoop/

I downloaded mysql-connector-java-5.1.19.tar.gz from here: http://www.mysql.com/downloads/connector/j/

Once I downloaded these two tar.gz files, I uploaded them to an S3 bucket. I also put this script below in the S3 bucket. Make sure to replace <BUCKET_NAME> with your own bucket name.

#!/bin/bash
#Install Sqoop - s3://<BUCKET_NAME>/install_sqoop.sh
cd
hadoop fs -copyToLocal s3://<BUCKET_NAME>/sqoop-1.4.1-incubating__hadoop-0.20.tar.gz sqoop-1.4.1-incubating__hadoop-0.20.tar.gz
tar -xzf sqoop-1.4.1-incubating__hadoop-0.20.tar.gz
hadoop fs -copyToLocal s3://<BUCKET_NAME>/mysql-connector-java-5.1.19.tar.gz mysql-connector-java-5.1.19.tar.gz
tar -xzf mysql-connector-java-5.1.19.tar.gz
cp mysql-connector-java-5.1.19/mysql-connector-java-5.1.19-bin.jar sqoop-1.4.1-incubating__hadoop-0.20/lib/

After I started a job flow, I added this script as a step to the job flow. You can do this via the API, or the CLI like this:

./elastic-mapreduce -j <JOBFLOW_ID> --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg s3://<BUCKET_NAME>/install_sqoop.sh

Once the step completes, you can run sqoop imports and exports. Here’s an example of a sqoop export:

./sqoop-1.4.1-incubating__hadoop-0.20/bin/sqoop export --connect jdbc:mysql://<MYSQL_HOST>/<DATABASE_NAME> --table <TABLE_NAME> --export-dir <HDFS_PATH> --fields-terminated-by , --input-null-non-string '\\N' --username <USERNAME> --password <PASSWORD>

Hope that helped. Let me know if you have any questions.

16 Responses to “How to Install Sqoop on Amazon Elastic Map Reduce (EMR)”

  1. Bhavesh Shah Says:

    Hello,
    I have followed followed your link and it qorks for me. Sqoop gets installed on Amazon EMR. But there is one problem, when I run the sqoop-import query I am getting errors as:

    12/04/27 09:29:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/45d0bca50cb78c50c20acf18fcd64f90/QualityMeasureComputation.jar
    Exception in thread “main” java.lang.NoSuchMethodError: com.cloudera.sqoop.manager.ImportJobContext.setConnManager(Lcom/cloudera/sqoop/manager/ConnManager;)V
    at com.microsoft.sqoop.SqlServer.MSSQLServerManager.importTable(MSSQLServerManager.java:142)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

  2. sumit Says:

    We have tried the all the steps to install SQOOP for MS SQL db connectivity but giving error while importing data from MS SQL 2008 R2 db to HDFS.
    Error:
    Exception in thread “main” java.lang.NoSuchMethodError: com.cloudera.sqoop.manager.ImportJobContext.setConnManager(Lcom/cloudera/sqoop/manager/ConnManager;)V
    at com.microsoft.sqoop.SqlServer.MSSQLServerManager.importTable(MSSQLServerManager.java:142)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

    we are using sqoop-1.4.1

    It worked for only one time, now it is not. Please let us how to resolve above error..

  3. Kyle Mulka Says:

    Bhavesh, Sumit, my instructions will probably only work for MySQL. You will probably need to install a JDBC driver for MS SQL to use MS SQL.

  4. BigDataBlog » Some Links and more on Sqoop installation Says:

    [...] to some other great blogs – if any of them work better than my instructions let me know http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/ https://ccp.cloudera.com/display/CDH4B2/Sqoop+Installation [...]

  5. Brendan Benson Says:

    This is a huge help. It sucks that this has to be such a hacky process through AWS. Thanks for the example.

    One typo: in the script runner command, it should be –args, not –arg.

  6. meble Says:

    helpfull information’s about how to install sqoop thx!:)

  7. ryan Says:

    Got any examples of how to take data from Hive table and make a CSV file on HDFS?

    Also, have you tried the ‘–num-mappers’ param to sqoop (http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_syntax_3)? Wondering if/how much that speeds things up. I’m going to be trying an export from hive into a mysql table and doing an update on a multi-col key (via –update-key field1,field2,field3) where my original source is a dynamodb table. Using EMR to aggregate metrics and push the aggregation numbers into a mysql table.

  8. Siyin Says:

    Hi,

    I followed your steps exactly and got the following error running sqoop:

    Exception in thread “main” java.lang.NoClassDefFoundError: com/cloudera/sqoop/Sqoop
    Caused by: java.lang.ClassNotFoundException: com.cloudera.sqoop.Sqoop
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    Could not find the main class: com.cloudera.sqoop.Sqoop. Program will exit.

    Any ideas on why it’s looking for Cloudera’s Sqoop library and how do I get past this? Thanks.

  9. ranjith Says:

    The last sqoop import and export statement
    ./sqoop-1.4.1-incubating__hadoop-0.20/bin/sqoop export –connect jdbc:mysql:/// –table –export-dir –fields-terminated-by , –input-null-non-string ‘\\N’ –username –password

    Is this a part of the emr job flow ? Can we add this as a step to the job flow that which runs the export of data from hive to mysql on the emr instance? Or do I need to run this from terminal?

  10. Rupinder Says:

    Thanks. This was helpful.
    Is it possible to configure the script to install Sqoop as a bootstrap action in EMR Console ?

  11. Rupinder Says:

    I just checked and successfully installed sqoop as a bootstrap action in EMR.

  12. Justin Says:

    This worked for me. One obstacles I had was that I wrote my .sh file on Windows so the “End Of Line” format was incorrect.

    You can use Notepad++ to fix this, go to Edit > EOL Conversion > UNIX Format

    Thanks Kyle.

  13. Sqoop via an ssh tunnel on Amazon EMR « Getting there is 99.5% the fun Says:

    [...] turns out other people have solved this problem, but here’s our solution for posterity. Note that it’s [...]

  14. sqoop installation on Amazon EMRQueryBy | QueryBy, ejjuit, query, query by, queryby.com, android doubt, ios question, sql query, sqlite query, nodejsquery, dns query, update query, insert query, kony, mobilesecurity, postquery, queryposts.com, sapquery, j Says:

    [...] sqoop installation on Amazon EMRI am trying to install Sqoop on Amazon EMR cluster, as per the steps described in Kyle Mulka’s blog “http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/#comments&#8220;. [...]

  15. sqoop installation on Amazon EMRCopyQuery CopyQuery | Question & Answer Tool for your Technical Queries,CopyQuery, ejjuit, query, copyquery, copyquery.com, android doubt, ios question, sql query, sqlite query, nodejsquery, dns query, update query, ins Says:

    [...] I am trying to install Sqoop on Amazon EMR cluster, as per the steps described in Kyle Mulka’s blog “http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/#comments&#8220;. [...]

  16. nowoczesne meble pokojowe Says:

    Pretty! This was an extremely wonderful article. Thank you for providing this info.

Leave a Reply