Tuesday, November 17, 2015

load csv data to hive partitioned tables

1.Create an External partitioned Table to some location

CREATE EXTERNAL TABLE `csvtable`( 
`col1` string COMMENT , 
`col2` string COMMENT , 
`col3` string COMMENT , 
`col4` string COMMENT ) 
PARTITIONED BY ( 
`year` string) 
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
STORED AS INPUTFORMAT 
'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
LOCATION 
'/mycsvtable' 

2.create a directory under it with the name of your partition
hadoop fs -mkdir /mycsvtable/year=2001

3.copy some csv files to the directory
hadoop fs -put sample.csv /mycsvtable/year=2001

4. hive still does not know about this partition as metastore has not been updated yet, if you try to query
csvtable it will fetch you no result.to resolve this issue alter the table so that metastore updated accordingly
alter table csvtable add partition (year='2001')

Now you are good to Go.

No comments: