일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
- Java
- hadoop
- psychology
- comic agile
- Software Engineering
- MySQL
- Python
- Kuala Lumpur
- Spain
- django
- ubuntu
- Book
- Italy
- QT
- programming_book
- France
- agile
- Malaysia
- essay
- RFID
- web
- program
- hbase
- Book review
- Programming
- management
- history
- Linux
- erlang
- leadership
- Today
- Total
Cascading on Ubuntu 10.04 본문
Prerequisite
- hadoop
- cascading library & wordcount example from http://www.cascading.org/downloads.html
$ wget http://files.cascading.org/cascading/1.2/cascading-1.2.4-hadoop-0.19.2%2B.tgz
$ wget http://files.cascading.org/samples/wordcount-20101201.tgz
$ tar xfz cascading-1.2.4-hadoop-0.19.2+.tgz
$ tar xfz wordcount-20101201.tgz
$ cd wordcount/
/wordcount$ ant -Dhadoop.home=/usr/lib/hadoop -Dcascading.home=/home/hchung/programming/CASCADING/cascading-1.2.4-hadoop-0.19.2+
Buildfile: build.xml
build:
[echo] building...
[mkdir] Created dir: /home/hchung/programming/CASCADING/wordcount/build/classes
[mkdir] Created dir: /home/hchung/programming/CASCADING/wordcount/lib
[javac] Compiling 1 source file to /home/hchung/programming/CASCADING/wordcount/build/classes
[copy] Copying 1 file to /home/hchung/programming/CASCADING/wordcount/build/classes
BUILD SUCCESSFUL
Total time: 1 second
/wordcount$ ant -Dhadoop.home=/usr/lib/hadoop -Dcascading.home=/home/hchung/programming/CASCADING/cascading-1.2.4-hadoop-0.19.2+ jar
Buildfile: build.xml
build:
[echo] building...
jar:
[copy] Copying 6 files to /home/hchung/programming/CASCADING/wordcount/build/classes/lib
[jar] Building jar: /home/hchung/programming/CASCADING/wordcount/build/wordcount.jar
BUILD SUCCESSFUL
Total time: 0 seconds
/wordcount$ hadoop jar ./build/wordcount.jar ./data/url+page.200.txt output local
11/08/12 16:16:44 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar
11/08/12 16:16:44 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar
11/08/12 16:16:44 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar
11/08/12 16:16:45 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar
11/08/12 16:16:45 INFO cascade.Cascade: Concurrent, Inc - Cascading 1.2.4 [hadoop-0.19.2+]
11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] starting
11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] parallel execution is enabled: true
11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] starting flows: 4
11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] allocating threads: 4
11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: import pages
11/08/12 16:16:45 INFO flow.Flow: [import pages] atleast one sink does not exist
11/08/12 16:16:45 INFO util.Util: unable to find and remove client hdfs shutdown hook, received exception: java.lang.NoSuchFieldException
11/08/12 16:16:45 INFO flow.Flow: [import pages] starting
11/08/12 16:16:45 INFO flow.Flow: [import pages] source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["./data/url+page.200.txt"]"]
11/08/12 16:16:45 INFO flow.Flow: [import pages] sink: Hfs["SequenceFile[['url', 'page']]"]["output/pages/"]"]
11/08/12 16:16:45 INFO tap.Hfs: forcing job to local mode, via source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["./data/url+page.200.txt"]"]
11/08/12 16:16:45 INFO flow.Flow: [import pages] parallel execution is enabled: true
11/08/12 16:16:45 INFO flow.Flow: [import pages] starting jobs: 1
11/08/12 16:16:45 INFO flow.Flow: [import pages] allocating threads: 1
11/08/12 16:16:45 INFO flow.FlowStep: [import pages] starting step: (1/1) Hfs["SequenceFile[['url', 'page']]"]["output/pages/"]"]
11/08/12 16:16:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/08/12 16:16:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/08/12 16:16:45 WARN flow.Flow: stopping jobs
11/08/12 16:16:45 INFO flow.FlowStep: [import pages] stopping: (1/1) Hfs["SequenceFile[['url', 'page']]"]["output/pages/"]"]
11/08/12 16:16:45 WARN flow.Flow: stopped jobs
11/08/12 16:16:45 WARN flow.Flow: shutting down job executor
11/08/12 16:16:45 WARN flow.Flow: shutdown complete
11/08/12 16:16:45 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: import pages
cascading.flow.FlowException: unhandled exception
at cascading.flow.Flow.complete(Flow.java:821)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)
at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)
... 5 more
11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export word
11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export url
11/08/12 16:16:46 INFO flow.Flow: [export word] atleast one sink does not exist
11/08/12 16:16:46 INFO flow.Flow: [export url] atleast one sink does not exist
11/08/12 16:16:46 INFO util.Util: unable to find and remove client hdfs shutdown hook, received exception: java.lang.NoSuchFieldException
11/08/12 16:16:46 INFO flow.Flow: [export word] starting
11/08/12 16:16:46 INFO flow.Flow: [export word] source: Hfs["SequenceFile[['word', 'count']]"]["output/words/"]"]
11/08/12 16:16:46 INFO flow.Flow: [export word] sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]
11/08/12 16:16:46 INFO tap.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]
11/08/12 16:16:46 INFO flow.Flow: [export word] parallel execution is enabled: true
11/08/12 16:16:46 INFO flow.Flow: [export word] starting jobs: 1
11/08/12 16:16:46 INFO flow.Flow: [export word] allocating threads: 1
11/08/12 16:16:46 INFO flow.FlowStep: [export word] starting step: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]
11/08/12 16:16:46 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
11/08/12 16:16:46 INFO util.Util: unable to find and remove client hdfs shutdown hook, received exception: java.lang.NoSuchFieldException
11/08/12 16:16:46 INFO flow.Flow: [export url] starting
11/08/12 16:16:46 INFO flow.Flow: [export url] source: Hfs["SequenceFile[['url', 'word', 'count']]"]["output/urls/"]"]
11/08/12 16:16:46 INFO flow.Flow: [export url] sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]
11/08/12 16:16:46 INFO tap.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]
11/08/12 16:16:46 INFO flow.Flow: [export url] parallel execution is enabled: true
11/08/12 16:16:46 INFO flow.Flow: [export url] starting jobs: 1
11/08/12 16:16:46 INFO flow.Flow: [export url] allocating threads: 1
11/08/12 16:16:46 INFO flow.FlowStep: [export url] starting step: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]
11/08/12 16:16:46 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
11/08/12 16:16:46 WARN flow.Flow: stopping jobs
11/08/12 16:16:46 INFO flow.FlowStep: [export word] stopping: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]
11/08/12 16:16:46 WARN flow.Flow: stopping jobs
11/08/12 16:16:46 WARN flow.Flow: stopped jobs
11/08/12 16:16:46 INFO flow.FlowStep: [export url] stopping: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]
11/08/12 16:16:46 WARN flow.Flow: shutting down job executor
11/08/12 16:16:46 WARN flow.Flow: stopped jobs
11/08/12 16:16:46 WARN flow.Flow: shutdown complete
11/08/12 16:16:46 WARN flow.Flow: shutting down job executor
11/08/12 16:16:46 WARN flow.Flow: shutdown complete
11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export word
cascading.flow.FlowException: unhandled exception
at cascading.flow.Flow.complete(Flow.java:821)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)
at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)
... 5 more
11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export url
cascading.flow.FlowException: unhandled exception
at cascading.flow.Flow.complete(Flow.java:821)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)
at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)
... 5 more
11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] stopping flows
11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export url
11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export word
11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: url pipe+word pipe
11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: import pages
11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] stopped flows
11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] shutting down flow executor
11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] shutdown complete
Exception in thread "main" cascading.cascade.CascadeException: flow failed: import pages
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:714)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: cascading.flow.FlowException: unhandled exception
at cascading.flow.Flow.complete(Flow.java:821)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)
... 6 more
Caused by: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)
at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)
at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)
... 5 more
The problem might be caused by permission stuff, but I don't know what exactly it is because of lack of documentation.