read json from pig -

Recent Posts

Tags more

Archives

관리 메뉴

read json from pig 본문

Programming/Hadoop

halatha 2013. 9. 13. 14:15

$ hadoop version

$ pig -version

$ hadoop fs -cat basic.json

{"a":0,"b":{"b1":1,"b2":"test"},"c":2}

grunt> A = LOAD '/user/nlp/basic.json' USING JsonLoader('a:int,b:map[],c:int');

grunt> B = FOREACH A GENERATE (int) a, b#'b1', b#'b2', (int) c;

grunt> DUMP B;

(0,1,test,2)

grunt>

간단한 json은 잘 되지만 여러 단계의 nested json이나 데이터가 깔끔하지 않은(string에 각종 공백 등이 들어간) 경우 잘 못찾는 값이 생긴다.

grunt> fs -cat basic.json

{"a":0,"b":{"b1":1,"b2":{"b21":11,"b22":"inner"},"b3":"test"},"c":2}

grunt> A = LOAD '/user/nlp/basic.json' USING JsonLoader('a:int,b:map[],c:int');

grunt> B = FOREACH A GENERATE (int) a, (int) b#'b1', (int) c;

grunt> DUMP B;

error

* +

.gz으로 압축된 파일도 똑같이 LOAD ... USING JsonLoader(...);로 읽으면 됨

간단한 json인 경우 양이 많아도 처리 속도가 빠름

* -

nested json 처리가 잘 안됨

exception이 발생하면 그대로 job fail

공유하기 링크

'Programming/Hadoop' Related Articles

Comments