앞서 만든 deserialize 메소드를 테스트 해보겠습니다.
1. 샘플데이터 만들기
샘플데이터는 Mysql의 sakila를 활용해서 만들어 보도록 하겠습니다.
skila download http://dev.mysql.com/doc/index-other.html
SELECT concat(X.customer_id, '|', X.first_name, '|', X.last_name, '|', x.rental, '|', Y.rental_sum, '|', Z.address) AS DAT from ( select A.customer_id, A.first_name, A.last_name, A.address_id, group_concat(D.title) as rental from customer A, rental B, inventory C, film D where A.customer_id = B.customer_id and B.inventory_id = C.inventory_id AND C.film_id = D.film_id group by A.customer_id, A.first_name, A.last_name, A.address_id ) X, ( select X.customer_id, group_concat(MON) as rental_sum from ( select customer_id, concat(DATE_FORMAT(payment_date, '%m'), ':', sum(amount)) AS MON from payment group by customer_id, DATE_FORMAT(payment_date, '%m') ) X group by X.customer_id ) Y, ( select address_id, concat(address, ',', district, ',', postal_code) AS address from address ) Z where X.customer_id = Y.customer_id AND X.address_id = Z.address_id ;
위 쿼리를 실행하여 샘플 데이터를 생성하여 파일로 저장 후 HDFS로 업로드 합니다.
2. Hive 테이블을 생성
CREATE EXTERNAL TABLE sakila( customer_id int, first_name string, last_name string, rental array<string>, rental_sum map<string,float>, address struct<address:string,district:string,postal_code:int>') ROW FORMAT SERDE 'org.serde.CustomSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION ‘/sakila'
3. Hive Table에 데이터 넣기
LOAD DATA INPATH '/sample.dat' OVERWRITE INTO TABLE sakila;
4. 데이터 확인해보기
select * from sakila;
'Hive' 카테고리의 다른 글
Hive Serde 만들기 2 (0) | 2014.03.21 |
---|---|
Hive Serde 만들기 1 (0) | 2014.03.21 |