Friday, November 20, 2015

Apache Drill : Saving query output to csv, parquet and json output

Quick way to save query result in different format (json,parquet,csv) depending upon the needs
0: jdbc:drill:zk=> alter session set `store.format`='csv';
+-------+------------------------+
|  ok   |        summary         |
+-------+------------------------+
| true  | store.format updated.  |
+-------+------------------------+
1 row selected (0.13 seconds)
0: jdbc:drill:zk=> create table dfs.tmp.`emp_csv` as select * from cp.`default`.`employee.json`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1155                       |
+-----------+----------------------------+
1 row selected (0.444 seconds)
0: jdbc:drill:zk=> alter session set `store.format`='parquet';
+-------+------------------------+
|  ok   |        summary         |
+-------+------------------------+
| true  | store.format updated.  |
+-------+------------------------+
1 row selected (0.133 seconds)
0: jdbc:drill:zk=> create table dfs.tmp.`emp_parquet` as select * from cp.`default`.`employee.json`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1155                       |
+-----------+----------------------------+
1 row selected (0.826 seconds)
0: jdbc:drill:zk=> alter session set `store.format`='json';
+-------+------------------------+
|  ok   |        summary         |
+-------+------------------------+
| true  | store.format updated.  |
+-------+------------------------+
1 row selected (0.116 seconds)
0: jdbc:drill:zk=> create table dfs.tmp.`emp_json` as select * from cp.`default`.`employee.json`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1155                       |
+-----------+----------------------------+
1 row selected (0.322 seconds)
0: jdbc:drill:zk=> 

//output
root@RKS ~]# hadoop fs -ls /tmp/emp*
Found 1 items
-rwxr-xr-x   3 mapr mapr     175674 2015-11-21 01:58 /tmp/emp_csv/0_0_0.csv
Found 1 items
-rwxr-xr-x   3 mapr mapr     563564 2015-11-21 02:02 /tmp/emp_json/0_0_0.json
Found 1 items
-rwxr-xr-x   3 mapr mapr      52179 2015-11-21 02:01 /tmp/emp_parquet/0_0_0.parquet

1 comment:

azri said...

hello, thank you for this example its helpful for me.
i wanna now check the location of my genrated file.
i cant find it.

any help please