1、JDK安装
下载网址:
http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html
如果本地有安装包,则用SecureCRT连接Linux机器,然后用rz指令进行上传文件;
下载后获得jdk-6u29-linux-i586-rpm.bin文件,使用sh jdk-6u29-linux-i586-rpm.bin进行安装,
等待安装完成即可;java默认会安装在/usr/java下;
在命令行输入:vi /etc/profile在里面添加如下内容export JAVA_HOME=/usr/java/jdk1.6.0_29export JAVA_BIN=/usr/java/jdk1.6.0_29/binexport PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport JAVA_HOME JAVA_BIN PATH CLASSPATH
进入 /usr/bin/目录cd /usr/binln -s -f /usr/java/jdk1.6.0_29/jre/bin/javaln -s -f /usr/java/jdk1.6.0_29/bin/javac
在命令行输入java -version屏幕输出:java version "jdk1.6.0_02"Java(TM) 2 Runtime Environment, Standard Edition (build jdk1.6.0_02)Java HotSpot(TM) Client VM (build jdk1.6.0_02, mixed mode)则表示安装JDK1.6完毕.
2、Hadoop安装
下载网址:http://www.apache.org/dyn/closer.cgi/hadoop/common/
如果本地有安装包,则用SecureCRT连接Linux机器,然后用rz指令进行上传文件;
下载后获得hadoop-0.21.0.tar.gz文件
解压 tar zxvf hadoop-0.21.0.tar.gz
压缩:tar zcvf hadoop-0.21.0.tar.gz 目录名
在命令行输入:vi /etc/profile在里面添加如下内容
export hadoop_home = /usr/george/dev/install/hadoop-0.21.0
export JAVA_HOME=/usr/java/jdk1.6.0_29export JAVA_BIN=/usr/java/jdk1.6.0_29/binexport PATH=$PATH:$JAVA_HOME/bin:$hadoop_home/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport JAVA_HOME JAVA_BIN PATH CLASSPATH
需要注销用户或重启vm,就可以直接输入hadoop指令了;
WordCount例子代码
3.1 Java代码:
package demo;
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
3.2 编译:
javac -classpath /usr/george/dev/install/hadoop-0.21.0/hadoop-hdfs-0.21.0.jar:/usr/george/dev/install/hadoop-0.21.0/hadoop-mapred-0.21.0.jar:/usr/george/dev/install/hadoop-0.21.0/hadoop-common-0.21.0.jar WordCount.java -d /usr/george/dev/wkspace/hadoop/wordcount/classes
在windows中,多个classpath参数值用;分割;在linux中用:分割;
编译后,会在/usr/george/dev/wkspace/hadoop/wordcount/classes目录下生成三个class文件:
WordCount.class WordCount$Map.class WordCount$Reduce.class
3.3将class文件打成jar包
到/usr/george/dev/wkspace/hadoop/wordcount/classes目录,运行jar cvf WordCount.jar *.class就会生成:
WordCount.class WordCount.jar WordCount$Map.class WordCount$Reduce.class
3.4 创建输入数据:
创建/usr/george/dev/wkspace/hadoop/wordcount/datas目录,在其下创建input1.txt和input2.txt文件:
Touch input1.txt
Vi input1.txt
文件内容如下:
i love chinaare you ok?
按照同样的方法创建input2.txt,内容如下:
hello, i love word
You are ok
创建成功后可以通过cat input1.txt 和 cat input2.txt查看内容;
3.5 创建hadoop输入与输出目录:
hadoop fs -mkdir wordcount/inputhadoop fs -mkdir wordcount/outputhadoop fs -put input1.txt wordcount/input/hadoop fs -put input2.txt wordcount/input/
Ps : 可以不创建out目录,要不运行WordCount程序时会报output文件已经存在,所以下面的命令行中使用了output1为输出目录;
3.6运行
到/usr/george/dev/wkspace/hadoop/wordcount/classes目录,运行
[root@localhost classes]# hadoop jar WordCount.jar WordCount wordcount/input wordcount/output1
11/12/02 05:53:59 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/12/02 05:53:59 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
11/12/02 05:53:59 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/12/02 05:53:59 INFO mapred.FileInputFormat: Total input paths to process : 2
11/12/02 05:54:00 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
11/12/02 05:54:00 INFO mapreduce.JobSubmitter: number of splits:2
11/12/02 05:54:00 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
11/12/02 05:54:00 INFO mapreduce.Job: Running job: job_201112020429_0003
11/12/02 05:54:01 INFO mapreduce.Job: map 0% reduce 0%
11/12/02 05:54:20 INFO mapreduce.Job: map 50% reduce 0%
11/12/02 05:54:23 INFO mapreduce.Job: map 100% reduce 0%
11/12/02 05:54:29 INFO mapreduce.Job: map 100% reduce 100%
11/12/02 05:54:32 INFO mapreduce.Job: Job complete: job_201112020429_0003
11/12/02 05:54:32 INFO mapreduce.Job: Counters: 33
FileInputFormatCounters
BYTES_READ=54
FileSystemCounters
FILE_BYTES_READ=132
FILE_BYTES_WRITTEN=334
HDFS_BYTES_READ=274
HDFS_BYTES_WRITTEN=65
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Job Counters
Data-local map tasks=2
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
SLOTS_MILLIS_MAPS=24824
SLOTS_MILLIS_REDUCES=6870
Launched map tasks=2
Launched reduce tasks=1
Map-Reduce Framework
Combine input records=12
Combine output records=12
Failed Shuffles=0
GC time elapsed (ms)=291
Map input records=4
Map output bytes=102
Map output records=12
Merged Map outputs=2
Reduce input groups=10
Reduce input records=12
Reduce output records=10
Reduce shuffle bytes=138
Shuffled Maps =2
Spilled Records=24
SPLIT_RAW_BYTES=220
3.7 查看输出目录
[root@localhost classes]# hadoop fs -ls wordcount/output1
11/12/02 05:54:59 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/12/02 05:55:00 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Found 2 items
-rw-r--r-- 1 root supergroup 0 2011-12-02 05:54 /user/root/wordcount/output1/_SUCCESS
-rw-r--r-- 1 root supergroup 65 2011-12-02 05:54 /user/root/wordcount/output1/part-00000
[root@localhost classes]# hadoop fs -cat /user/root/wordcount/output1/part-00000
11/12/02 05:56:05 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/12/02 05:56:05 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
You 1
are 2
china 1
hello,i 1
i 1
love 2
ok 1
ok? 1
word 1
you 1
分享到:
相关推荐
hadoop入门例子wordcount
基于之前“搭建hadoop-1.2.1环境”搭建的hadoop环境,运行wordcount例子
这是一个wordcount的一个简单实例jar包,仅仅用来做测试。...map类:org.apache.hadoop.wordcount.WordCountMapReduce$WordCountMapper reduce类 org.apache.hadoop.wordcount.WordCountMapReduce$WordCountReducer
hadoop 框架下 mapreduce源码例子 wordcount ,eclipse下,hadoop 2.2 可以运行
myeclipse +maven 搭建的hadoop mapreduce 例子项目,运行了单机wordcount
关于大数据开发Hadoop的入门实例WordCount,有详细的解释及完整代码
hadoop scala spark 例子项目,运行了单机wordcount
hadoop入门级的代码 Java编写 eclipse可运行 包含 hdfs的文件操作 rpc远程调用的简单示例 map-reduce的几个例子:wordcount 学生平均成绩 手机流量统计
hadoop一个入门例子,有改进,两种方法对比
Java实现的 Hadoop CountWord 例子,请各位多指教。
文档中详细阐述了如何在ubuntu10.04下安装Hadoop的过程,包括安装的一些准备工作,如JDK的安装,SSH的安装等。...最后,文档实验了hadoop自带的两个例子,一个是grep类的实现,一个是wordcount类的实现。
web工程调用hadoop集群的实例,包括一个wordcount例子。 输入输入和输出路径点击提交即可提交任务到hadoop集群,同时含有map和reduce过程的监控。 注意点:要把hadoop相关包放入WEB_INF/lib下面;
学习hadoop的比较全的中文资料。文中几乎综合了所有入门用户需要的内容,包括hadoop项目的单机,为分布式,分布式的搭建和环境配置,以及具体的hdfs的内部结构等。
hadoop入门例子实践 wordCount: MAP阶段:使用StringTokenizer 将一行String分离成不同的单词,输出, 例如 REDUCE阶段:<KEY> 例子如下<WORD>> 将VALUE的值进行相加,输出结果 remove duplication MAP阶段: MAP...
Hadoop 一键脚本使用方法脚本完成的是Hadoop分布式集群配置包含wordcount例子 (在wordcount文件夹下面)单机版配置方法请看下面的注意事项前提条件:登录用户具有sudo权限(root也可)具有网络连接安装了git如果...
hadoop入门例子
本实验描述了在hadoop下使用MapReduce编程,实现自带的例子wordcount。描述了编程流程
以Hadoop带的wordcount为例子(下面是启动行):用户提交一个任务以后,该任务由JobTracker协调,先执行Map阶段(图中M1,M2和M3),然后执行Reduce阶段(图中R1和R2)。Map阶段和Reduce阶段动作都受TaskTracker监控...
[hadoop@test Desktop]$ hadoop jar wordcount.jar \ > /user/hadoop/input/file* /user/hadoop/output 18/05/25 19:51:32 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/05/25 19:51:32...
一个自己写的Hadoop MapReduce实例源码,网上看到不少网友在学习MapReduce编程,但是除了wordcount范例外实例比较少,故上传自己的一个。包含完整实例源码,编译配置文件,测试数据,可执行jar文件,执行脚本及操作...