Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md

Repository files navigation

Hadoop Example

This is ment as a repository to easily and successfully run an example with Hadoop to better understand it. Below there will be some code that you can copy-paste into your terminal.

This will run through a couple of steps:

Table of Contents

Install Docker
Pull the Hadoop Docker instance
Run an example using word count
Example Code

Install Docker

Linux

Using this command should install docker on your machine. This is the suggested fastest way to download and install docker

sudo su
curl -fsSL get.docker.com -o get-docker.sh
sh get-docker.sh

Pull the Hadoop instance

Once we have docker installed we can pull the instance we're gonna use using the command below

sudo su
docker pull sequenceiq/hadoop-docker:2.7.0

Then once we're done pulling we start the container

docker run -it sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash

We should now be in the docker container. You should see as your $ 'bash-4.1#'.

Run an example using word count

Once we are in the docker container we can start by running our job using the script below:

cd $HADOOP_PREFIX
sudo bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount input output

In order to see the output of our job we run the code below:

bin/hdfs dfs -cat output/*

Example code

The code for the wordcount should be found here.

Remove file

If you would like to run it again you will get an error stating that there is a file that already exists in the output directory. Use this command below to remove the file to run it again.

'''bash bin/hadoop fs -rm -r -skipTrash /user/root/output/ ''' '''

About

Used for COMP 440 or Design of Databases

docker count hadoop docker-container hadoop-instance

Report repository

Releases

No releases published

Packages

No packages published