일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
- history
- Python
- QT
- management
- Spain
- web
- comic agile
- Malaysia
- erlang
- Java
- Programming
- essay
- leadership
- MySQL
- programming_book
- Book
- hbase
- France
- Italy
- RFID
- hadoop
- Book review
- psychology
- program
- Kuala Lumpur
- django
- agile
- ubuntu
- Linux
- Software Engineering
- Today
- Total
목록Programming/Hadoop (12)
1.ssh passwordless login, /etc/hosts, 필요하면 /etc/hosts.allow 설정 2.repository file 준비. rhel5, rhel6별로 다름. /etc/yum.repos.d에 복사외부 접속이 가능해야 하므로 /etc/yum.conf에 proxy 추가 3.postgresql 기존 버전 삭제; yum erase postgresql -y * 기존 버전 삭제 안 하면 이전 정보가 충돌하는 걸로 보임 다시 설치 * 8.1은 cloudera-manager와 충돌 rhel5; yum install postgresql84 -yrhel6; yum install postgresql -y 4.chmod u+x cloudera-manager-installer.bin./cloudera..
http://impactcore.blogspot.kr/2011/03/can-not-remove-package-with-yum.html 발단; cloudera package 설치 중 cloudera-scm-agent가 설치가 제대로 되지 않아 결국 cloudera-manager 설치 중 오류 발생메시지에는 /etc/cloudera-scm-agent/config.ini가 존재하지 않는다고 나옴 원인; cloudera package를 여러 번 설치 삭제 하던 중 yum 설정 부분에 문제가 생긴 걸로 추정 수십번 재설치를 했으나 진행이 되지 않던 중, cloudera-manager-agent나 cloudera-manager-daemons가 제대로 제거되지 않았다는 점을 알게 됨# yum erase clouder..
https://github.com/alanfgates/programmingpig/blob/master/udfs/java/com/acme/math/Pow.java $ hadoop version Hadoop 2.0.0-cdh4.3.0 Subversion file:///data/1/jenkins/workspace/generic-package-centos64-5-5/topdir/BUILD/hadoop-2.0.0-cdh4.3.0/src/hadoop-common-project/hadoop-common -r 48a9315b342ca16de92fcc5be95ae3650629155a Compiled by jenkins on Mon May 27 19:45:28 PDT 2013 From source with checksum..
2013/09/13 - [Programming/Hadoop] - read json from pig http://pig.apache.org/docs/r0.8.1/udf.html#Load+Functions http://gethue.tumblr.com/post/60376973455/hadoop-tutorials-ii-1-prepare-the-data-for-analysis http://opensource.xhaus.com/projects/jyson/wiki/JysonFaq https://github.com/romainr/yelp-data-analysis http://stackoverflow.com/questions/16705259/parsing-text-file-of-one-line-json-objects-u..
어떤 size 단위로 hdfs에 file write을 하는 경우 FSDataOutputStream 사용(http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample) FileSystem#append를 사용하는 건 아직 불안정하다고 함. Xmx option을 사용해도 조금 더 오래 동작하다가 오류 발생 http://stackoverflow.com/questions/15609909/error-java-heap-space StringBuffer의 append로 String을 모아 일정 size를 넘으면 한 번 file write을 하고, 기존의 object를 재활용하는 방식을 했더니 StringBuffer에서 heap OutOfMemory가 발생sb.length(0)를 ..
$ hadoop versionHadoop 2.0.0-cdh4.3.0Subversion file:///data/1/jenkins/workspace/generic-package-centos64-5-5/topdir/BUILD/hadoop-2.0.0-cdh4.3.0/src/hadoop-common-project/hadoop-common -r 48a9315b342ca16de92fcc5be95ae3650629155aCompiled by jenkins on Mon May 27 19:45:28 PDT 2013From source with checksum a4218d77f9b12df4e3e49ef96f9d357dThis command was run using /usr/lib/hadoop/hadoop-common-2.0...
# sudo -u hdfs hdfs fsck /.../user/oozie/share/lib/sqoop/libthrift-0.9.0.jar: MISSING 1 blocks of total size 347531 B../user/oozie/share/lib/sqoop/metrics-core-2.1.2.jar: CORRUPT blockpool BP-766882569-10.15.86.206-1376438928219 block blk_-3018521587264545106 /user/oozie/share/lib/sqoop/metrics-core-2.1.2.jar: MISSING 1 blocks of total size 82445 B.../user/oozie/share/lib/sqoop/oozie-sharelib-sq..
기본 replication factor가 3이므로 3배수의 차이가 발생하므로 hadoop fs -df는 여유가 있어도 df로 local disk usage를 보면 그렇지 않은 경우가 있다. replication factor 보기; hdfs fsck 명령을 사용 # sudo -u hdfs hdfs fsck /... .................................................................................................... ............Status: HEALTHY Total size: 1151862045171 B Total dirs: 148 Total files: 16312 Total blocks (validated): 1742..
jobread hdfs files line by line(=json string) -> extract some key - value.pig + python error messagewhich: no hbase in (:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin)2013-09-06 10:26:43,316 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:40:512013-09-06 10:26:43,316 [main] INFO org.apache.pig.Main - Logging error messages to: /data1/han..
Missing time synchronization, scm console cannot start properly. Because of time sync, I installed ntpdate like below; 2011/10/07 - [Programming] - CentOS: time syncronization in local network / setting timezone However time sync is broken after 2 months. So, to prohibit this problem, I wrote 'crontab' on every server. 0 0 * * * /sbin/ntpdate [ntp server] > /dev/null 2>&1