Kyle Mulka's Blog

How to Install Sqoop on Amazon Elastic Map Reduce (EMR)

It is possible to install Sqoop on Amazon EMR. You can use Sqoop to import and export data from a relational database such as MySQL. Here’s how I did it with MySQL. If you are using a different database, you’ll probably need a different JDBC connector for that database.

I’m using Amazon’s Hadoop version 0.20.205, which, I think, was the default. You can see all supported versions of Amazon’s Hadoop here:

http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/usingemr_config_supportedversions.html

I downloaded sqoop-1.4.1-incubating__hadoop-0.20.tar.gz from here: http://www.apache.org/dyn/closer.cgi/sqoop/

I downloaded mysql-connector-java-5.1.19.tar.gz from here: http://www.mysql.com/downloads/connector/j/

Once I downloaded these two tar.gz files, I uploaded them to an S3 bucket. I also put this script below in the S3 bucket. Make sure to replace <BUCKET_NAME> with your own bucket name.

#!/bin/bash #Install Sqoop - s3://<BUCKET_NAME>/install_sqoop.sh cd hadoop fs -copyToLocal s3://<BUCKET_NAME>/sqoop-1.4.1-incubating__hadoop-0.20.tar.gz sqoop-1.4.1-incubating__hadoop-0.20.tar.gz tar -xzf sqoop-1.4.1-incubating__hadoop-0.20.tar.gz hadoop fs -copyToLocal s3://<BUCKET_NAME>/mysql-connector-java-5.1.19.tar.gz mysql-connector-java-5.1.19.tar.gz tar -xzf mysql-connector-java-5.1.19.tar.gz cp mysql-connector-java-5.1.19/mysql-connector-java-5.1.19-bin.jar sqoop-1.4.1-incubating__hadoop-0.20/lib/

After I started a job flow, I added this script as a step to the job flow. You can do this via the API, or the CLI like this:

./elastic-mapreduce -j <JOBFLOW_ID> --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg s3://<BUCKET_NAME>/install_sqoop.sh

Once the step completes, you can run sqoop imports and exports. Here’s an example of a sqoop export:

./sqoop-1.4.1-incubating__hadoop-0.20/bin/sqoop export --connect jdbc:mysql://<MYSQL_HOST>/<DATABASE_NAME> --table <TABLE_NAME> --export-dir <HDFS_PATH> --fields-terminated-by , --input-null-non-string '\\N' --username <USERNAME> --password <PASSWORD>

Hope that helped. Let me know if you have any questions.

This entry was posted on Monday, April 23rd, 2012 at 8:54 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

14 Responses to “How to Install Sqoop on Amazon Elastic Map Reduce (EMR)”

Bhavesh Shah Says:
April 27th, 2012 at 3:05 am
Hello,
I have followed followed your link and it qorks for me. Sqoop gets installed on Amazon EMR. But there is one problem, when I run the sqoop-import query I am getting errors as:

12/04/27 09:29:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/45d0bca50cb78c50c20acf18fcd64f90/QualityMeasureComputation.jar
Exception in thread “main” java.lang.NoSuchMethodError: com.cloudera.sqoop.manager.ImportJobContext.setConnManager(Lcom/cloudera/sqoop/manager/ConnManager;)V
at com.microsoft.sqoop.SqlServer.MSSQLServerManager.importTable(MSSQLServerManager.java:142)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
sumit Says:
April 30th, 2012 at 4:04 am
We have tried the all the steps to install SQOOP for MS SQL db connectivity but giving error while importing data from MS SQL 2008 R2 db to HDFS.
Error:
Exception in thread “main” java.lang.NoSuchMethodError: com.cloudera.sqoop.manager.ImportJobContext.setConnManager(Lcom/cloudera/sqoop/manager/ConnManager;)V
at com.microsoft.sqoop.SqlServer.MSSQLServerManager.importTable(MSSQLServerManager.java:142)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

we are using sqoop-1.4.1

It worked for only one time, now it is not. Please let us how to resolve above error..
Kyle Mulka Says:
May 8th, 2012 at 11:41 am
Bhavesh, Sumit, my instructions will probably only work for MySQL. You will probably need to install a JDBC driver for MS SQL to use MS SQL.
BigDataBlog » Some Links and more on Sqoop installation Says:
May 31st, 2012 at 2:21 am
[…] to some other great blogs – if any of them work better than my instructions let me know http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/ https://ccp.cloudera.com/display/CDH4B2/Sqoop+Installation […]
Brendan Benson Says:
June 9th, 2012 at 4:40 pm
This is a huge help. It sucks that this has to be such a hacky process through AWS. Thanks for the example.

One typo: in the script runner command, it should be –args, not –arg.
meble Says:
June 20th, 2012 at 8:42 am
helpfull information’s about how to install sqoop thx!:)
ryan Says:
June 27th, 2012 at 3:15 pm
Got any examples of how to take data from Hive table and make a CSV file on HDFS?

Also, have you tried the ‘–num-mappers’ param to sqoop (http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_syntax_3)? Wondering if/how much that speeds things up. I’m going to be trying an export from hive into a mysql table and doing an update on a multi-col key (via –update-key field1,field2,field3) where my original source is a dynamodb table. Using EMR to aggregate metrics and push the aggregation numbers into a mysql table.
Siyin Says:
July 17th, 2012 at 9:01 am
Hi,

I followed your steps exactly and got the following error running sqoop:

Exception in thread “main” java.lang.NoClassDefFoundError: com/cloudera/sqoop/Sqoop
Caused by: java.lang.ClassNotFoundException: com.cloudera.sqoop.Sqoop
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.cloudera.sqoop.Sqoop. Program will exit.

Any ideas on why it’s looking for Cloudera’s Sqoop library and how do I get past this? Thanks.
ranjith Says:
August 10th, 2012 at 2:37 pm
The last sqoop import and export statement
./sqoop-1.4.1-incubating__hadoop-0.20/bin/sqoop export –connect jdbc:mysql:/// –table –export-dir –fields-terminated-by , –input-null-non-string ‘\\N’ –username –password

Is this a part of the emr job flow ? Can we add this as a step to the job flow that which runs the export of data from hive to mysql on the emr instance? Or do I need to run this from terminal?
Rupinder Says:
November 21st, 2012 at 6:26 am
Thanks. This was helpful.
Is it possible to configure the script to install Sqoop as a bootstrap action in EMR Console ?
Rupinder Says:
November 21st, 2012 at 7:11 am
I just checked and successfully installed sqoop as a bootstrap action in EMR.
Justin Says:
November 21st, 2012 at 7:30 pm
This worked for me. One obstacles I had was that I wrote my .sh file on Windows so the “End Of Line” format was incorrect.

You can use Notepad++ to fix this, go to Edit > EOL Conversion > UNIX Format

Thanks Kyle.
Sqoop via an ssh tunnel on Amazon EMR « Getting there is 99.5% the fun Says:
March 28th, 2013 at 7:07 pm
[…] turns out other people have solved this problem, but here’s our solution for posterity. Note that it’s […]
SN Says:
December 31st, 2015 at 12:03 pm
I tried these steps to install SQOOP but I get the following error message. Can someone please tell me why?

Exception in thread “main” java.lang.RuntimeException: java.io.IOException: Cannot run program “/mnt/var/lib/hadoop/steps/s-2EZU086RJ5ABY/./install-sqoop-SN.sh” (in directory “.”): error=2, No such file or directory
at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:143)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: Cannot run program “/mnt/var/lib/hadoop/steps/s-2EZU086RJ5ABY/./install-sqoop-SN.sh” (in directory “.”): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at com.amazon.elasticmapreduce.scriptrunner.ProcessRunner.exec(ProcessRunner.java:96)
… 6 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
… 7 more

How to Install Sqoop on Amazon Elastic Map Reduce (EMR)

14 Responses to “How to Install Sqoop on Amazon Elastic Map Reduce (EMR)”

Leave a Reply

Archives

Categories