当前位置：首页 > 教程资讯 > 电脑教程 Hadoop控制输出文件命名

Hadoop控制输出文件命名

时间：2023-06-02 16:16:41 来源：人气：

　　在一般情况下，Hadoop 每一个 Reducer 产生一个输出文件，文件以,　　part-r-00000、part-r-00001 的方式进行命名。如果需要人为的控制输出文件的命,　　名或者每一个 Reducer 需要写出多个输出文件时，可以采用 MultipleOutputs 类来,　　完成。MultipleOutputs 采用输出记录的键值对(output Key 和 output Value)或者,　　任意字符串来生成输出文件的名字，文件一般以 name-r-nnnnn 的格式进行命名，,　　其中 name 是程序设置的任意名字;nnnnn 表示分区号。,　　MultipleOutputs 的使用方式的使用方式：：：：,　　想要使用 MultipeOutputs，需要完成以下四个步骤：,　　1. 在 Reducer 中声明 MultipleOutputs 的变量,　　private MultipleOutputs,　　2. 在 Reducer 的 setup 函数中进行 MultipleOutputs 的初始化,　　protected void setup(Context context)throws IOException, InterruptedException {,　　multipleOutputs = new MultipleOutputs,　　},　　3. 在 reduce 函数中进行输出控制,　　protected void reduce(Text key, Iterable values, Context context)throws IOException,,　　InterruptedException {,　　for (Text value : values) {,　　multipleOutputs.write(NullWritable.get(), value, key.toString());,　　},　　},　　4. 在 cleanup 函数中关闭输出 MultipleOutputs,　　protected void cleanup(Context context)throws IOException, InterruptedException {,　　multipleOutputs.close();,　　},　　注意：multipleOutputs.write(key, value, baseOutputPath)方法的第三个函数表明了该输出所在的目录(相对于用户指定的输出目录)。如果baseOutputPath不包含文件分隔符“/”，那么输出的文件格式为baseOutputPath-r-nnnnn(name-r-nnnnn);如果包含文件分隔符“/”，例如baseOutputPath=“029070-99999/1901/part”，那么输出文件则为,

作者

Hadoop控制输出文件命名

相关推荐

教程资讯

电脑教程排行

系统教程

系统主题

装机软件