The document outlines a practical exercise involving the execution of a Hadoop MapReduce job using a WordCount program. It details the steps taken to set up the environment, create input files, run the job, and retrieve output results. Additionally, it includes the Java code for the WordCount program, which processes text input to count word occurrences.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views8 pages
DSBDA grp b 1
The document outlines a practical exercise involving the execution of a Hadoop MapReduce job using a WordCount program. It details the steps taken to set up the environment, create input files, run the job, and retrieve output results. Additionally, it includes the Java code for the WordCount program, which processes text input to count word occurrences.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8
PRACTICAL-11
Name: Vishal Dattatraya Doke
Roll No: 16 Batch: T1
Microsoft Windows [Version 10.0.19045.5608]
(c) Microsoft Corporation. All rights reserved. C:\WINDOWS\system32>start-all.cmd This script is Deprecated. Instead use start-dfs.cmd and start-yarn.cmd starting yarn daemons C:\WINDOWS\system32>jps 2656 NodeManager 7216 ResourceManager 6724 NameNode 6836 DataNode 10952 Jps C:\WINDOWS\system32>hadoop fs -mkdir /input C:\WINDOWS\system32>hadoop fs -put C:\Users\Vishal\Documents\FILES\input1.txt /input C:\WINDOWS\system32>hadoop fs -ls /input Found 1 items -rw-r--r-- 1 VISHAL DOKE supergroup 80 2025-04-09 03:45 /input/input1.txt C:\WINDOWS\system32>hadoop jar C:\Users\Vishal\Documents\JARFILE\ MapReduceWordCount.jar com.mapreduce.wc/WordCount /input/input1.txt /output 2025-04-07 13:45:33,092 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 2025-04-07 13:45:34,309 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/Admin/.staging/job_1744008556181_0001 2025-04-07 13:45:34,924 INFO input.FileInputFormat: Total input files to process : 1 2025-04-07 13:45:35,462 INFO mapreduce.JobSubmitter: number of splits:1 2025-04-07 13:45:35,965 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1744008556181_0001 2025-04-07 13:45:35,967 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-04-07 13:45:36,183 INFO conf.Configuration: resource-types.xml not found 2025-04-07 13:45:36,183 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2025-04-07 13:45:36,626 INFO impl.YarnClientImpl: Submitted application application_1744008556181_0001 2025-04-07 13:45:36,673 INFO mapreduce.Job: The url to track the job: http://DESKTOP- 0729C31:8088/proxy/application_1744008556181_0001/ 2025-04-07 13:45:36,675 INFO mapreduce.Job: Running job: job_1744008556181_0001 2025-04-07 13:45:48,957 INFO mapreduce.Job: Job job_1744008556181_0001 running in uber mode : false 2025-04-07 13:45:48,962 INFO mapreduce.Job: map 0% reduce 0% 2025-04-07 13:45:54,080 INFO mapreduce.Job: map 100% reduce 0% 2025-04-07 13:46:08,266 INFO mapreduce.Job: map 100% reduce 100% 2025-04-07 13:46:09,292 INFO mapreduce.Job: Job job_1744008556181_0001 completed successfully 2025-04-07 13:46:09,391 INFO mapreduce.Job: Counters: 54 File System Counters FILE: Number of bytes read=129 FILE: Number of bytes written=478023 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=183 HDFS: Number of bytes written=68 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3174 Total time spent by all reduces in occupied slots (ms)=9861 Total time spent by all map tasks (ms)=3174 Total time spent by all reduce tasks (ms)=9861 Total vcore-milliseconds taken by all map tasks=3174 Total vcore-milliseconds taken by all reduce tasks=9861 Total megabyte-milliseconds taken by all map tasks=3250176 Total megabyte-milliseconds taken by all reduce tasks=10097664 Map-Reduce Framework Map input records=7 Map output records=8 Map output bytes=107 Map output materialized bytes=129 Input split bytes=103 Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=129 Reduce input records=8 Reduce output records=6 Spilled Records=16 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=70 CPU time spent (ms)=996 Physical memory (bytes) snapshot=508809216 Virtual memory (bytes) snapshot=749785088 Total committed heap usage (bytes)=362283008 Peak Map Physical memory (bytes)=304926720 Peak Map Virtual memory (bytes)=426901504 Peak Reduce Physical memory (bytes)=203882496 Peak Reduce Virtual memory (bytes)=322883584 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=80 File Output Format Counters Bytes Written=68 C:\Windows\system32>hadoop dfs -cat /output/* DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. LAPTOP 1 MAHARASHTRA 2 SUBSCRIBERS 1 TECHNICAL 1 VISHAL 2 WINDOWS 1 C:\Windows\system32>hadoop dfs -get /output/part-r-00000 C:\Users\Admin\Documents\FILES\ textfile.txt DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. C:\Windows\system32>hadoop fs -rm -r /input/input1.txt Deleted /input/input1.txt C:\Windows\system32>hadoop fs -rm -r /output Deleted /output C:\Windows\system32>stop-all.cmd This script is Deprecated. Instead use stop-dfs.cmd and stop-yarn.cmd SUCCESS: Sent termination signal to the process with PID 696. SUCCESS: Sent termination signal to the process with PID 14080. stopping yarn daemons SUCCESS: Sent termination signal to the process with PID 7240. SUCCESS: Sent termination signal to the process with PID 10956. INFO: No tasks running with the specified criteria. C:\Windows\system32> ********************************************************************************** input1.txt Technical Windows Vishal Subscribers Maharashtra laptop Vishal Maharashtra ********************************************************************************** WordCount.java package com.mapreduce.wc; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static void main(String[] args) throws Exception { Configuration c = new Configuration(); String[] files = new GenericOptionsParser(c, args).getRemainingArgs(); // Ensure correct input arguments if (files.length < 2) { System.err.println("Usage: WordCount <input path> <output path>"); System.exit(-1); } Path input = new Path(files[0]); Path output = new Path(files[1]); Job j = Job.getInstance(c, "wordcount"); j.setJarByClass(WordCount.class); j.setMapperClass(MapForWordCount.class); j.setReducerClass(ReduceForWordCount.class); j.setOutputKeyClass(Text.class); j.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(j, input); FileOutputFormat.setOutputPath(j, output); System.exit(j.waitForCompletion(true) ? 0 : 1); } // Mapper Class public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text wordText = new Text(); public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException { String line = value.toString().trim(); String[] words = line.split("\\s+"); // Handles multiple spaces for (String word : words) { if (!word.isEmpty()) { // Avoid empty strings wordText.set(word.trim().toUpperCase()); con.write(wordText, one); } } } } // Reducer Class public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } con.write(word, new IntWritable(sum)); } } } **********************************************************************************