#IBMSOE: Hadoop#
IBMSOE Hadoop hosts buildable Hadoop source trees optimized for Linux on POWER. These trees support both RedHat Enterprise Linux (RHEL) v6.5 on big-endian with PowerVM and Ubuntu v14.04 on litte-endian with PowerKVM.
The repository is work in progress; that is it will evolve over time, with new versions and additional patches. The source trees are provided 'as-is' without any warranty whatsoever.
The tree is a clone from Apache Hadoop. For more information about Hadoop visit the Apache site or the Hadoop Wiki.
This software carries the same Apache licence version2 as the original without any modification. See the licence for details.
Information regarding Linux on Power can be found at the Developer Works Linux on Power Community
You can follow IBM Power Linux on twitter
##Cryptography## Be aware that this code contains cryptographic libraries the use, import or export of which is controlled in in many places.
##Current version## The tree currently matches Hadoop 2.4.1 release.
##Prerequisites## The following components are required prior to building Hadoop:
###Build tool chain###
- Maven Use Maven3 NOT Maven2. This is confusing as Maven v3 is referred to simply as Maven in yum and apt, so it is tempting to think that Maven2 is the more recent version.
- Ant
- cmake
- gcc and g++ (XLC/XLC++ may work but haven't been tested)
- IBM Java JDK. OpenJDK is not usable at the time of writing.
- automake
- autoconf
- git
- protobuf (To be locally built)
- libsnappy,libsnappy-dev (ubuntu)
- snappy,snappy-devel (RHEL)
- openssl,openssl-dev (ubuntu)
- openssl,openssl-devel (RHEL)
The gcc compiler and other build tools are available from the IBM Advance Toolchain
IBM SDKs for big-endian and little-endian Linux are available from IBM Hursley
The remaining tools can be installed with a Linux package manager (eg. yum or apt)
###Runtime requisites###
- openssl
- openssh
- zlib
- IBM Java run-time. OpenJDK is not usable at the time of writing.
#Building#
Install the prerequisite packages. On ubuntu apt-get install build-essential brings much of what is required. On RedHat it's not quite as simple.
Set the JAVA_HOME environment variable to point to your Java installation.
Clone the github tree:
> git clone git://github.com/ibmsoe/hadoop-common/
Detailed instructions for building Hadoop are given in the BUILDING.txt file in the root directory. For the impatient, here's the short version:
##Compile and Install into Maven cache## To compile and install Hadoop in to Maven cache using JNI and snappy use the following build command from the root of the Hadoop installation directory:
> mvn install -Pnative -DskipTests -Drequire.snappy
The build process on its own will take a few minutes. Without the -DskipTests it will the best part of a day to run.
##Create a tar distribution package## To create a tar containing a Hadoop build use the following command
> mvn package -Pnative,dist -Drequire.snappy -DskipTests -Dtar
##To run the built-in tests## To just run the tests use:
> mvn test -Pnative -Drequire.snappy or omit the -DskipTests during the compilation, installation or packaging phases.
And be prepared to wait.
The -Pnative profile switch sets the build for JNI. To build without JNI omit this option.
#Configuring and running Hadoop#
Hadoop cluster configuration is beyond the scope of this Readme. Refer to the Apache wiki. Several articles can be found on the web.