线上job报错:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: /home/vipshop/hard_disk/1/yarn/local/usercache/hdfs/appcache/application_1420458339569_0548/container_1420458339569_0548_01_000005/Stage-5.tar.gz/MapJoin-mapfile12--.hashtable (No such file or directory) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:160) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:155)Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: /home/vipshop/hard_disk/1/yarn/local/usercache/hdfs/appcache/application_1420458339569_0548/container_1420458339569_0548_01_000005/Stage-5.tar.gz/MapJoin-mapfile12--.hashtable (No such file or directory) at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:104) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:178) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1029) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1033) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1033) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1033) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:505) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) ... 8 moreCaused by: java.io.FileNotFoundException: /home/vipshop/hard_disk/1/yarn/local/usercache/hdfs/appcache/application_1420458339569_0548/container_1420458339569_0548_01_000005/Stage-5.tar.gz/MapJoin-mapfile12--.hashtable (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:146) at java.io.FileInputStream. (FileInputStream.java:101) at org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:95) ... 16 more
这其实是mapjoin的一个bug,mapjoin时会通过小表生成hashtable,然后放到distributecache中,后面的task会通过distributecache下载到本地使用。
这里是由于job含有两个mapjoin但是在HashTableSinkOperator中只生成了第一个hashtable,导致在HashTableLoader中进行load hashtable时报错。 bug触发条件:1.两个以上的mapjoin2.其中一个表为空Bugid:https://issues.apache.org/jira/browse/HIVE-6913这个bug hive0.14已经fix解决方法:./ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.javaOperator forwardOp = work.getAliasToWork().get(alias);if (fetchOp.isEmptyTable()) { //generate empty hashtable for empty table this.generateDummyHashTable(alias, bigTableBucket); forwardOp.close(false); continue;}
关于mapjoin的整个流程和触发条件放在后面写。