MapReduce
IGFS and Ignite Hadoop Accelerator were discontinued. Instead, use an architecture described on the page below to achieve Hadoop deployments acceleration with Ignite in-memory clusters:https://ignite.apache.org/use-cases/hadoop-acceleration.html
Ignite In-Memory MapReduce allows to effectively parallelize the processing data stored in any Hadoop file system. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing. In-memory MapReduce provides dramatic performance boosts for CPU-intensive tasks while requiring only minimal change to existing applications
Configure Ignite
Apache Ignite Hadoop Accelerator map-reduce engine processes Hadoop jobs within Ignite cluster. Several prerequisites must be satisfied.
-
IGNITE_HOMEenvironment variable must be set and point to the root of Ignite installation directory. -
Each cluster node must have Hadoop jars in CLASSPATH.
See respective Ignite installation guide for your Hadoop distribution for details. -
Cluster nodes accepts job execution requests listening particular socket. By default each Ignite node is listening for incoming requests on
127.0.0.1:11211. You can override the host and port usingConnectorConfigurationclass:
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
...
<property name="connectorConfiguration">
<list>
<bean class="org.apache.ignite.configuration.ConnectorConfiguration">
<property name="host" value="myHost" />
<property name="port" value="12345" />
</bean>
</list>
</property>
</bean>
Run Ignite
When Ignite node is configured, start it using the following command:
$ bin/ignite.sh
Configure Hadoop
To run Hadoop job using Ignite job tracker several prerequisites must be satisfied:
-
IGNITE_HOMEenvironment variable must be set and point to the root of Ignite installation directory. -
Hadoop must have Ignite JARS
${IGNITE_HOME}\libs\ignite-core-[version].jarand${IGNITE_HOME}\libs\hadoop\ignite-hadoop-[version].ja" in CLASSPATH.
This can be achieved in several ways.
- Add these JARs to
HADOOP_CLASSPATHenvironment variable. - Copy or symlink these JARs to the folder where your Hadoop installation stores shared libraries.
See respective Ignite installation guide for your Hadoop distribution for details.
- Your Hadoop job must be configured to user Ignite job tracker. Two configuration properties are responsible for this:
mapreduce.framework.namemust be set toignitemapreduce.jobtracker.addressmust be set to the host/port your Ignite nodes are listening.
This also can be achieved in several ways. First, you may create separate mapred-site.xml file with these configuration properties and use it for job runs:
<configuration>
...
<property>
<name>mapreduce.framework.name</name>
<value>ignite</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>127.0.0.1:11211</value>
</property>
...
</configuration>
Second, you may override default mapred-site.xml of your Hadoop installation. This will force all Hadoop jobs to pick Ignite jobs tracker by default unless it is overriden on job level somehow.
Third, you may set these properties for particular job programmatically:
Configuration conf = new Configuration();
...
conf.set(MRConfig.FRAMEWORK_NAME, IgniteHadoopClientProtocolProvider.FRAMEWORK_NAME);
conf.set(MRConfig.MASTER_ADDRESS, "127.0.0.1:11211);
...
Job job = new Job(conf, "word count");
...
Run Hadoop
How you run a job depends on how you have configured your Hadoop.
If you created separate mapred-site.xml:
hadoop --config [path_to_config] [arguments]
If you modified default mapred-site.xml, then --config option is not necessary:
hadoop [arguments]
If you start the job programmatically, then submit it:
...
Job job = new Job(conf, "word count");
...
job.submit();
Updated over 5 years ago
