This is ment as a repository to easily and successfully run an example with Hadoop to better understand it. Below there will be some code that you can copy-paste into your terminal.
Using this command should install docker on your machine. This is the suggested fastest way to download and install docker
sudo su
curl -fsSL get.docker.com -o get-docker.sh
sh get-docker.sh
Once we have docker installed we can pull the instance we're gonna use using the command below
sudo su
docker pull sequenceiq/hadoop-docker:2.7.0
Then once we're done pulling we start the container
docker run -it sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash
We should now be in the docker container. You should see as your $ 'bash-4.1#'.
Once we are in the docker container we can start by running our job using the script below:
cd $HADOOP_PREFIX
sudo bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount input output
In order to see the output of our job we run the code below:
bin/hdfs dfs -cat output/*
The code for the wordcount should be found here.
If you would like to run it again you will get an error stating that there is a file that already exists in the output directory. Use this command below to remove the file to run it again.
'''bash bin/hadoop fs -rm -r -skipTrash /user/root/output/ ''' '''