环境:Vmware 8.0 和Ubuntu11.04
Hadoop 实战之MapReduce链接作业之预处理
第一步:首先创建一个工程命名为HadoopTest.目录结构如下图:
第二步: 在/home/tanglg1987目录下新建一个start.sh脚本文件,每次启动虚拟机都要删除/tmp目录下的全部文件,重新格式化namenode,代码如下:
-
sudorm-rf/tmp/*
-
rm-rf/home/tanglg1987/hadoop-0.20.2/logs
-
hadoopnamenode-format
-
hadoopdatanode-format
-
start-all.sh
-
hadoopfs-mkdirinput
-
hadoopdfsadmin-safemodeleave
第三步:给start.sh增加执行权限并启动hadoop伪分布式集群,代码如下:
-
chmod777/home/tanglg1987/start.sh
-
./start.sh
执行过程如下:
第四步:上传本地文件到hdfs
在/home/tanglg1987目录下新建Customer.txt内容如下:
-
100tom90
-
101mary85
-
102kate60
上传本地文件到hdfs:
-
hadoopfs-put/home/tanglg1987/ChainMapper.txtinput
第五步:新建一个ChainMapperDemo.java,代码如下:
-
packagecom.baison.action;
-
importjava.io.IOException;
-
importjava.util.*;
-
importjava.lang.String;
-
importorg.apache.hadoop.fs.Path;
-
importorg.apache.hadoop.conf.*;
-
importorg.apache.hadoop.io.*;
-
importorg.apache.hadoop.mapred.*;
-
importorg.apache.hadoop.util.*;
-
importorg.apache.hadoop.mapred.lib.*;
-
publicclassChainMapperDemo{
-
publicstaticclassMap00extendsMapReduceBaseimplements
-
Mapper<Text,Text,Text,Text>{
-
publicvoidmap(Textkey,Textvalue,OutputCollectoroutput,
-
Reporterreporter)throwsIOException{
-
Textft=newText("100");
-
if(!key.equals(ft)){
-
output.collect(key,value);
-
}
-
}
-
}
-
publicstaticclassMap01extendsMapReduceBaseimplements
-
Mapper<Text,Text,Text,Text>{
-
publicvoidmap(Textkey,Textvalue,OutputCollectoroutput,
-
Reporterreporter)throwsIOException{
-
Textft=newText("101");
-
if(!key.equals(ft)){
-
output.collect(key,value);
-
}
-
}
-
}
-
publicstaticclassReduceextendsMapReduceBaseimplements
-
Reducer<Text,Text,Text,Text>{
-
publicvoidreduce(Textkey,Iteratorvalues,OutputCollectoroutput,
-
Reporterreporter)throwsIOException{
-
while(values.hasNext()){
-
output.collect(key,values.next());
-
}
-
-
}
-
}
-
publicstaticvoidmain(String[]args)throwsException{
-
String[]arg={"hdfs://localhost:9100/user/tanglg1987/input/ChainMapper.txt",
-
"hdfs://localhost:9100/user/tanglg1987/output"};
-
JobConfconf=newJobConf(ChainMapperDemo.class);
-
conf.setJobName("ChainMapperDemo");
-
conf.setInputFormat(KeyValueTextInputFormat.class);
-
conf.setOutputFormat(TextOutputFormat.class);
-
ChainMappercm=newChainMapper();
-
JobConfmapAConf=newJobConf(false);
-
cm.addMapper(conf,Map00.class,Text.class,Text.class,Text.class,
-
Text.class,true,mapAConf);
-
JobConfmapBConf=newJobConf(false);
-
cm.addMapper(conf,Map01.class,Text.class,Text.class,Text.class,
-
Text.class,true,mapBConf);
-
conf.setReducerClass(Reduce.class);
-
conf.setOutputKeyClass(Text.class);
-
conf.setOutputValueClass(Text.class);
-
FileInputFormat.setInputPaths(conf,newPath(arg[0]));
-
FileOutputFormat.setOutputPath(conf,newPath(arg[1]));
-
JobClient.runJob(conf);
-
}
-
}
第六步:Run On Hadoop,运行过程如下:
12/10/17 21:05:53 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/10/17 21:05:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/10/17 21:05:53 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/10/17 21:05:54 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/17 21:05:54 INFO mapred.JobClient: Running job: job_local_0001
12/10/17 21:05:54 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/17 21:05:54 INFO mapred.MapTask: numReduceTasks: 1
12/10/17 21:05:54 INFO mapred.MapTask: io.sort.mb = 100
12/10/17 21:05:54 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/17 21:05:54 INFO mapred.MapTask: record buffer = 262144/327680
12/10/17 21:05:54 INFO mapred.MapTask: Starting flush of map output
12/10/17 21:05:54 INFO mapred.MapTask: Finished spill 0
12/10/17 21:05:54 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/10/17 21:05:54 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/ChainMapper.txt:0+35
12/10/17 21:05:54 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
12/10/17 21:05:54 INFO mapred.LocalJobRunner:
12/10/17 21:05:54 INFO mapred.Merger: Merging 1 sorted segments
12/10/17 21:05:54 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 16 bytes
12/10/17 21:05:54 INFO mapred.LocalJobRunner:
12/10/17 21:05:54 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/10/17 21:05:54 INFO mapred.LocalJobRunner:
12/10/17 21:05:54 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/10/17 21:05:54 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9100/user/tanglg1987/output
12/10/17 21:05:54 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:05:54 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
12/10/17 21:05:55 INFO mapred.JobClient: map 100% reduce 100%
12/10/17 21:05:55 INFO mapred.JobClient: Job complete: job_local_0001
12/10/17 21:05:55 INFO mapred.JobClient: Counters: 15
12/10/17 21:05:55 INFO mapred.JobClient: FileSystemCounters
12/10/17 21:05:55 INFO mapred.JobClient: FILE_BYTES_READ=36152
12/10/17 21:05:55 INFO mapred.JobClient: HDFS_BYTES_READ=70
12/10/17 21:05:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=73202
12/10/17 21:05:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=12
12/10/17 21:05:55 INFO mapred.JobClient: Map-Reduce Framework
12/10/17 21:05:55 INFO mapred.JobClient: Reduce input groups=1
12/10/17 21:05:55 INFO mapred.JobClient: Combine output records=0
12/10/17 21:05:55 INFO mapred.JobClient: Map input records=3
12/10/17 21:05:55 INFO mapred.JobClient: Reduce shuffle bytes=0
12/10/17 21:05:55 INFO mapred.JobClient: Reduce output records=1
12/10/17 21:05:55 INFO mapred.JobClient: Spilled Records=2
12/10/17 21:05:55 INFO mapred.JobClient: Map output bytes=12
12/10/17 21:05:55 INFO mapred.JobClient: Map input bytes=35
12/10/17 21:05:55 INFO mapred.JobClient: Combine input records=0
12/10/17 21:05:55 INFO mapred.JobClient: Map output records=1
12/10/17 21:05:55 INFO mapred.JobClient: Reduce input records=1
第七步:查看结果集,运行结果如下:
-
sudorm-rf/tmp/*
-
rm-rf/home/tanglg1987/hadoop-0.20.2/logs
-
hadoopnamenode-format
-
hadoopdatanode-format
-
start-all.sh
-
hadoopfs-mkdirinput
-
hadoopdfsadmin-safemodeleave
第三步:给start.sh增加执行权限并启动hadoop伪分布式集群,代码如下:
-
chmod777/home/tanglg1987/start.sh
-
./start.sh
分享到:
相关推荐
一个基于Hadoop平台进行的单词统计系统,其中包含了伪分布架构,并且包含HDFS数据存储,结合Java后台利用Mapreduce架包进行单词的统计与分析。包含了完整的实践过程,内涵源代码,以及实验命令,内容丰富,实验过程...
Hadoop技术内幕mapreduce.pdf 个人收集电子书,仅用学习使用,不可用于商业用途,如有版权问题,请联系删除!
hadoop-mapreduce-examples-2.7.1.jar
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...
hadoop mapreduce测试样例,文档,源码
Hadoop入门之MapReduce实训源数据下载,原文网址:https://blog.csdn.net/vpqtxzmzezeqjj9977/article/details/81088950
Hadoop HDFS和MapReduce架构浅析.pdf 更多资源请点击:https://blog.csdn.net/weixin_44155966
大数据Hadoop核心模块之MapReduce,文档有概念、案例、代码,Mapreduce中,不可多得文档!
第1版上市后广受好评,被誉为学习Hadoop技术的经典著作之一。与第1版相比,第2版技术更新颖,所有技术都针对最新版进行了更新;内容更全面,几乎每一个章节都增加了新内容,而且增加了新的章节;实战性更强,案例更...
(1)熟悉Hadoop开发包 (2)编写MepReduce程序 (3)调试和运行MepReduce程序 (4)完成上课老师演示的内容 二、实验环境 Windows 10 VMware Workstation Pro虚拟机 Hadoop环境 Jdk1.8 二、实验内容 1.单词计数实验...
4.7 温故知新 4.8 小结 4.9 更多资源 第5章 高阶MapReduce 5.1 链接MapReduce 作业 5.1.1 顺序链接MapReduce作业 5.1.2 具有复杂依赖的MapReduce链接 5.1.3 预处理和后处理阶段的链接 5.2 联结不同来源的...
在hadoop平台上,用mapreduce编程实现大数据的词频统计
Hadoop 用mapreduce实现Wordcount实例,绝对能用
赠送jar包:hadoop-mapreduce-client-core-2.5.1.jar; 赠送原API文档:hadoop-mapreduce-client-core-2.5.1-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-core-2.5.1-sources.jar; 赠送Maven依赖信息文件:...
本书对Hadoop Mapreduce进行详细讲解,切合实际应用,能够更深入地学习MapReduce,确实是一本不错的书。
java操作hadoop之mapreduce计算整数的最大值和最小值实战源码,附带全部所需jar包,欢迎下载一起学习。
hadoop map reduce mapreduce
hadoop 框架下 mapreduce源码例子 wordcount ,eclipse下,hadoop 2.2 可以运行
赠送jar包:hadoop-mapreduce-client-app-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-app-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-app-2.6.5-sources.jar; 赠送Maven依赖信息文件:...
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...