MapReduce

❗️
IGFS and Ignite Hadoop Accelerator were discontinued. Instead, use an architecture described on the page below to achieve Hadoop deployments acceleration with Ignite in-memory clusters:https://ignite.apache.org/use-cases/hadoop-acceleration.html

Ignite In-Memory MapReduce allows to effectively parallelize the processing data stored in any Hadoop file system. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing. In-memory MapReduce provides dramatic performance boosts for CPU-intensive tasks while requiring only minimal change to existing applications

Configure Ignite

Apache Ignite Hadoop Accelerator map-reduce engine processes Hadoop jobs within Ignite cluster. Several prerequisites must be satisfied.

IGNITE_HOME environment variable must be set and point to the root of Ignite installation directory.
Each cluster node must have Hadoop jars in CLASSPATH.
See respective Ignite installation guide for your Hadoop distribution for details.
Cluster nodes accepts job execution requests listening particular socket. By default each Ignite node is listening for incoming requests on 127.0.0.1:11211. You can override the host and port using ConnectorConfiguration class:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
  ...
  <property name="connectorConfiguration">
    <list>
      <bean class="org.apache.ignite.configuration.ConnectorConfiguration">
        <property name="host" value="myHost" />
        <property name="port" value="12345" />        
    	</bean>
    </list>    
  </property>
</bean>

Run Ignite

When Ignite node is configured, start it using the following command:

$ bin/ignite.sh

Configure Hadoop

To run Hadoop job using Ignite job tracker several prerequisites must be satisfied:

IGNITE_HOME environment variable must be set and point to the root of Ignite installation directory.
Hadoop must have Ignite JARS ${IGNITE_HOME}\libs\ignite-core-[version].jar and ${IGNITE_HOME}\libs\hadoop\ignite-hadoop-[version].ja" in CLASSPATH.

This can be achieved in several ways.

Add these JARs to HADOOP_CLASSPATH environment variable.
Copy or symlink these JARs to the folder where your Hadoop installation stores shared libraries.
See respective Ignite installation guide for your Hadoop distribution for details.

Your Hadoop job must be configured to user Ignite job tracker. Two configuration properties are responsible for this:

mapreduce.framework.name must be set to ignite
mapreduce.jobtracker.address must be set to the host/port your Ignite nodes are listening.

This also can be achieved in several ways. First, you may create separate mapred-site.xml file with these configuration properties and use it for job runs:

<configuration>
  ...
  <property>
    <name>mapreduce.framework.name</name>
    <value>ignite</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>127.0.0.1:11211</value>
  </property>
  ...
</configuration>

Second, you may override default mapred-site.xml of your Hadoop installation. This will force all Hadoop jobs to pick Ignite jobs tracker by default unless it is overriden on job level somehow.

Third, you may set these properties for particular job programmatically:

Configuration conf = new Configuration();
...
conf.set(MRConfig.FRAMEWORK_NAME,  IgniteHadoopClientProtocolProvider.FRAMEWORK_NAME);
conf.set(MRConfig.MASTER_ADDRESS, "127.0.0.1:11211);
...
Job job = new Job(conf, "word count");
...

Run Hadoop

How you run a job depends on how you have configured your Hadoop.

If you created separate mapred-site.xml:

hadoop --config [path_to_config] [arguments]

If you modified default mapred-site.xml, then --config option is not necessary:

hadoop [arguments]

If you start the job programmatically, then submit it:

...
Job job = new Job(conf, "word count");
...
job.submit();