본문 바로가기

Hive

Hive Serde 만들기 3

앞서 만든 deserialize 메소드를 테스트 해보겠습니다.

1. 샘플데이터 만들기

샘플데이터는 Mysql의 sakila를 활용해서 만들어 보도록 하겠습니다.

skila download http://dev.mysql.com/doc/index-other.html

SELECT concat(X.customer_id, '|', X.first_name, '|', X.last_name, '|', x.rental, '|', Y.rental_sum, '|', Z.address) AS DAT from ( select A.customer_id, A.first_name, A.last_name, A.address_id, group_concat(D.title) as rental from customer A, rental B, inventory C, film D where A.customer_id = B.customer_id and B.inventory_id = C.inventory_id AND C.film_id = D.film_id group by A.customer_id, A.first_name, A.last_name, A.address_id ) X, ( select X.customer_id, group_concat(MON) as rental_sum from ( select customer_id, concat(DATE_FORMAT(payment_date, '%m'), ':', sum(amount)) AS MON from payment group by customer_id, DATE_FORMAT(payment_date, '%m') ) X group by X.customer_id ) Y, ( select address_id, concat(address, ',', district, ',', postal_code) AS address from address ) Z where X.customer_id = Y.customer_id AND X.address_id = Z.address_id ;

위 쿼리를 실행하여 샘플 데이터를 생성하여 파일로 저장 후 HDFS로 업로드 합니다.

2. Hive 테이블을 생성

CREATE EXTERNAL TABLE sakila(
  customer_id int,
  first_name string,
  last_name string,
  rental array<string>,
  rental_sum map<string,float>,
  address struct<address:string,district:string,postal_code:int>')
ROW FORMAT SERDE
  'org.serde.CustomSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  ‘/sakila'


3. Hive Table에 데이터 넣기

LOAD DATA INPATH '/sample.dat' OVERWRITE INTO TABLE sakila;


4. 데이터 확인해보기

select * from sakila;


'Hive' 카테고리의 다른 글

Hive Serde 만들기 2  (0) 2014.03.21
Hive Serde 만들기 1  (0) 2014.03.21