Cascading on Ubuntu 10.04 본문

Programming

Cascading on Ubuntu 10.04

halatha 2011. 8. 13. 05:19

Prerequisite

$ wget http://files.cascading.org/cascading/1.2/cascading-1.2.4-hadoop-0.19.2%2B.tgz

$ wget http://files.cascading.org/samples/wordcount-20101201.tgz

$ tar xfz cascading-1.2.4-hadoop-0.19.2+.tgz

$ tar xfz wordcount-20101201.tgz

$ cd wordcount/

/wordcount$ ant -Dhadoop.home=/usr/lib/hadoop -Dcascading.home=/home/hchung/programming/CASCADING/cascading-1.2.4-hadoop-0.19.2+

Buildfile: build.xml

 

build:

[echo] building...

[mkdir] Created dir: /home/hchung/programming/CASCADING/wordcount/build/classes

[mkdir] Created dir: /home/hchung/programming/CASCADING/wordcount/lib

[javac] Compiling 1 source file to /home/hchung/programming/CASCADING/wordcount/build/classes

[copy] Copying 1 file to /home/hchung/programming/CASCADING/wordcount/build/classes

 

BUILD SUCCESSFUL

Total time: 1 second

/wordcount$ ant -Dhadoop.home=/usr/lib/hadoop -Dcascading.home=/home/hchung/programming/CASCADING/cascading-1.2.4-hadoop-0.19.2+ jar

Buildfile: build.xml

 

build:

[echo] building...

 

jar:

[copy] Copying 6 files to /home/hchung/programming/CASCADING/wordcount/build/classes/lib

[jar] Building jar: /home/hchung/programming/CASCADING/wordcount/build/wordcount.jar

 

BUILD SUCCESSFUL

Total time: 0 seconds

/wordcount$ hadoop jar ./build/wordcount.jar ./data/url+page.200.txt output local

11/08/12 16:16:44 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar

11/08/12 16:16:44 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar

11/08/12 16:16:44 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar

11/08/12 16:16:45 INFO flow.MultiMapReducePlanner: using application jar: /home/hchung/programming/CASCADING/wordcount/./build/wordcount.jar

11/08/12 16:16:45 INFO cascade.Cascade: Concurrent, Inc - Cascading 1.2.4 [hadoop-0.19.2+]

11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] starting

11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] parallel execution is enabled: true

11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] starting flows: 4

11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] allocating threads: 4

11/08/12 16:16:45 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: import pages

11/08/12 16:16:45 INFO flow.Flow: [import pages] atleast one sink does not exist

11/08/12 16:16:45 INFO util.Util: unable to find and remove client hdfs shutdown hook, received exception: java.lang.NoSuchFieldException

11/08/12 16:16:45 INFO flow.Flow: [import pages] starting

11/08/12 16:16:45 INFO flow.Flow: [import pages] source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["./data/url+page.200.txt"]"]

11/08/12 16:16:45 INFO flow.Flow: [import pages] sink: Hfs["SequenceFile[['url', 'page']]"]["output/pages/"]"]

11/08/12 16:16:45 INFO tap.Hfs: forcing job to local mode, via source: Lfs["TextLine[['offset', 'line']->[ALL]]"]["./data/url+page.200.txt"]"]

11/08/12 16:16:45 INFO flow.Flow: [import pages] parallel execution is enabled: true

11/08/12 16:16:45 INFO flow.Flow: [import pages] starting jobs: 1

11/08/12 16:16:45 INFO flow.Flow: [import pages] allocating threads: 1

11/08/12 16:16:45 INFO flow.FlowStep: [import pages] starting step: (1/1) Hfs["SequenceFile[['url', 'page']]"]["output/pages/"]"]

11/08/12 16:16:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

11/08/12 16:16:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library

11/08/12 16:16:45 WARN flow.Flow: stopping jobs

11/08/12 16:16:45 INFO flow.FlowStep: [import pages] stopping: (1/1) Hfs["SequenceFile[['url', 'page']]"]["output/pages/"]"]

11/08/12 16:16:45 WARN flow.Flow: stopped jobs

11/08/12 16:16:45 WARN flow.Flow: shutting down job executor

11/08/12 16:16:45 WARN flow.Flow: shutdown complete

11/08/12 16:16:45 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: import pages

cascading.flow.FlowException: unhandled exception

at cascading.flow.Flow.complete(Flow.java:821)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

Caused by: ENOENT: No such file or directory

at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)

at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)

at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)

at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)

at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)

... 5 more

11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export word

11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] starting flow: export url

11/08/12 16:16:46 INFO flow.Flow: [export word] atleast one sink does not exist

11/08/12 16:16:46 INFO flow.Flow: [export url] atleast one sink does not exist

11/08/12 16:16:46 INFO util.Util: unable to find and remove client hdfs shutdown hook, received exception: java.lang.NoSuchFieldException

11/08/12 16:16:46 INFO flow.Flow: [export word] starting

11/08/12 16:16:46 INFO flow.Flow: [export word] source: Hfs["SequenceFile[['word', 'count']]"]["output/words/"]"]

11/08/12 16:16:46 INFO flow.Flow: [export word] sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]

11/08/12 16:16:46 INFO tap.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]

11/08/12 16:16:46 INFO flow.Flow: [export word] parallel execution is enabled: true

11/08/12 16:16:46 INFO flow.Flow: [export word] starting jobs: 1

11/08/12 16:16:46 INFO flow.Flow: [export word] allocating threads: 1

11/08/12 16:16:46 INFO flow.FlowStep: [export word] starting step: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]

11/08/12 16:16:46 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized

11/08/12 16:16:46 INFO util.Util: unable to find and remove client hdfs shutdown hook, received exception: java.lang.NoSuchFieldException

11/08/12 16:16:46 INFO flow.Flow: [export url] starting

11/08/12 16:16:46 INFO flow.Flow: [export url] source: Hfs["SequenceFile[['url', 'word', 'count']]"]["output/urls/"]"]

11/08/12 16:16:46 INFO flow.Flow: [export url] sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]

11/08/12 16:16:46 INFO tap.Hfs: forcing job to local mode, via sink: Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]

11/08/12 16:16:46 INFO flow.Flow: [export url] parallel execution is enabled: true

11/08/12 16:16:46 INFO flow.Flow: [export url] starting jobs: 1

11/08/12 16:16:46 INFO flow.Flow: [export url] allocating threads: 1

11/08/12 16:16:46 INFO flow.FlowStep: [export url] starting step: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]

11/08/12 16:16:46 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized

11/08/12 16:16:46 WARN flow.Flow: stopping jobs

11/08/12 16:16:46 INFO flow.FlowStep: [export word] stopping: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/words/"]"]

11/08/12 16:16:46 WARN flow.Flow: stopping jobs

11/08/12 16:16:46 WARN flow.Flow: stopped jobs

11/08/12 16:16:46 INFO flow.FlowStep: [export url] stopping: (1/1) Lfs["TextLine[['offset', 'line']->[ALL]]"]["local/urls/"]"]

11/08/12 16:16:46 WARN flow.Flow: shutting down job executor

11/08/12 16:16:46 WARN flow.Flow: stopped jobs

11/08/12 16:16:46 WARN flow.Flow: shutdown complete

11/08/12 16:16:46 WARN flow.Flow: shutting down job executor

11/08/12 16:16:46 WARN flow.Flow: shutdown complete

11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export word

cascading.flow.FlowException: unhandled exception

at cascading.flow.Flow.complete(Flow.java:821)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

Caused by: ENOENT: No such file or directory

at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)

at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)

at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)

at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)

at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)

... 5 more

11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] flow failed: export url

cascading.flow.FlowException: unhandled exception

at cascading.flow.Flow.complete(Flow.java:821)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

Caused by: ENOENT: No such file or directory

at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)

at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)

at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)

at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)

at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)

... 5 more

11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] stopping flows

11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export url

11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: export word

11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: url pipe+word pipe

11/08/12 16:16:46 INFO cascade.Cascade: [import pages+url pipe+...] stopping flow: import pages

11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] stopped flows

11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] shutting down flow executor

11/08/12 16:16:46 WARN cascade.Cascade: [import pages+url pipe+...] shutdown complete

Exception in thread "main" cascading.cascade.CascadeException: flow failed: import pages

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:714)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:653)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

Caused by: cascading.flow.FlowException: unhandled exception

at cascading.flow.Flow.complete(Flow.java:821)

at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:705)

... 6 more

Caused by: ENOENT: No such file or directory

at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319)

at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)

at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)

at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)

at cascading.flow.FlowStepJob.blockOnJob(FlowStepJob.java:164)

at cascading.flow.FlowStepJob.start(FlowStepJob.java:140)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:129)

at cascading.flow.FlowStepJob.call(FlowStepJob.java:39)

... 5 more

 

The problem might be caused by permission stuff, but I don't know what exactly it is because of lack of documentation.

Comments