Saturday, January 24, 2015

Apache Hive : Indexes

Indexes are special lookup tables that the database search engine can use to speed up data retrieval.an index is a pointer to data in a table.the Indexes are implemented by using various data structure e.g. BTree, BitMap etc.Hive has limited indexing capabilities. you can speed up some of the data retrival operation using the indexes, the index data is stored in the another table in the Hive

The other way of speed up your query other than the Indexing are partitioning and bucketing which we will cover in coming posts.
Indexing is also a good alternative to partitioning when the logical partitions would.Indexes in Hive, like those in relational databases, need to be evaluated carefully.

Maintaining an index requires extra disk space and building an index has a processing cost.

Creating an Index
hive> create index customer_index on table customers(custid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with deferred rebuild
> in table customer_index_table
> COMMENT 'Customer Index on custid';
OK
Time taken: 5.713 seconds

Creating an bitmap Index
hive> create index customer_index_bitmap on table customers(custid)
> as 'BITMAP'
> with deferred rebuild
> COMMENT 'Index on customer table on custid using bitmap';
OK
Time taken: 0.603 seconds

View an Index
hive> show formatted index on customers;                       
OK
idx_name tab_name col_names idx_tab_name idx_type comment


customer_index customers custid customer_index_table compact Customer Index on custid
customer_index_bitmap customers custid orderdb__customers_customer_index_bitmap__ bitmap Index on customer table on custid using bitmap
Time taken: 0.988 seconds, Fetched: 5 row(s)

Rebuild an Index
hive> alter index customer_index on customers rebuild;      
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201501241609_0022, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0022
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job -kill job_201501241609_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-01-24 18:32:39,926 Stage-1 map = 0%, reduce = 0%
2015-01-24 18:32:41,975 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:42,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:43,982 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:45,011 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:46,071 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:47,078 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:48,093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:49,159 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.81 sec
2015-01-24 18:32:50,245 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 0.81 sec
2015-01-24 18:32:51,305 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.24 sec
2015-01-24 18:32:52,309 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.24 sec
2015-01-24 18:32:53,385 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.24 sec
MapReduce Total cumulative CPU time: 2 seconds 240 msec
Ended Job = job_201501241609_0022
Loading data to table orderdb.customer_index_table
Deleted hdfs://RKS-PC:54310/user/hive/warehouse/orderdb.db/customer_index_table
Table orderdb.customer_index_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 8567, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.24 sec HDFS Read: 8569 HDFS Write: 8567 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 240 msec
OK
Time taken: 42.379 seconds

Drop an Index
hive> drop index if exists customer_index on customers;
OK
Time taken: 3.267 seconds




Apache Hive : HiveQL View

view is a virtual table based on the result-set of an SQL statement.A view allows a query to be saved and treated like a table.

A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.Currently Hive does not support materialized views.

lets create a view on the join condition of customers and oreders tables

hive> create view customer_order as select c.custid,c.custname,o.orderid,o.tracking_id from customers c inner join orders o on c.custid=o.fk_cust_id;
OK
Time taken: 1.352 seconds
hive> select * from customer_order;
Total MapReduce jobs = 1
setting HADOOP_USER_NAME rks
Execution log at: /tmp/rks/.log
2015-01-24 05:40:55 Starting to launch local task to process map join; maximum memory = 932118528
2015-01-24 05:40:56 Processing rows: 101 Hashtable size: 101 Memory usage: 7985904 rate: 0.009
2015-01-24 05:40:56 Dump the hashtable into file: file:/tmp/rks/hive_2015-01-24_17-40-47_808_6049446203223532344/-local-10002/HashTable-Stage-3/MapJoin-mapfile31--.hashtable
2015-01-24 05:40:56 Upload 1 File to: file:/tmp/rks/hive_2015-01-24_17-40-47_808_6049446203223532344/-local-10002/HashTable-Stage-3/MapJoin-mapfile31--.hashtable File size: 6249
2015-01-24 05:40:56 End of local task; Time Taken: 0.895 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501241609_0021, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0021
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job -kill job_201501241609_0021
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-01-24 17:41:02,863 Stage-3 map = 0%, reduce = 0%
2015-01-24 17:41:05,872 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2015-01-24 17:41:06,875 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
2015-01-24 17:41:07,883 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 0.94 sec
MapReduce Total cumulative CPU time: 940 msec
Ended Job = job_201501241609_0021
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 0.94 sec HDFS Read: 8569 HDFS Write: 425 SUCCESS
Total MapReduce CPU Time Spent: 940 msec
OK
1009 Declan Hooper 745651 ULQ37MGX7MW
1018 Nathan Mcintyre 745652 NJJ84QEM7GO
1027 Troy Griffith 745653 UHX76SFB1EP
1036 Clark Frazier 745654 ZZH74UDJ6IC
1045 Tad Cross 745655 TFV46VBX1ZI
1054 Gannon Bradshaw 745656 EHJ68BHA6UU
1063 Walter Figueroa 745657 BNU38NFJ6FO
1072 Brady Mcclure 745658 NBK17XMP9XC
1081 Porter Bowers 745659 XHB61DLY6IK
1090 Jakeem Knight 745660 WNN67FXM2NC
1099 Macaulay Armstrong 745661 VXI39DIZ3HU
Time taken: 20.208 seconds, Fetched: 11 row(s)
hive>


Apache Hive : Joining datasets

Hive Supports the SQL Join but only equi-joins are supported, A SQL join is used to combine two or more than two table based on some criteria.the most used join is the Inner Join which return the all the rows between two datasets where join condition met.
lets see in the example how Inner join works in Hive. to demonstrate it we have two data sets the first one is the customers dataset which hold the information of customers.
customers.csv
cust_id,cust_name,ship_address,phone,email
1001,Sawyer Thompson,"-16.44456  115.92975",1-808-310-6814,faucibus@lacinia.net
1002,Xenos Campos,"5.69702  -164.57551",1-872-151-8210,dolor.Fusce@Nunc.com
1003,Brandon Castro,"-25.12774  -151.86179",1-283-827-7635,parturient@aliquameu.org
1004,Evan Gordon,"-20.12089  -85.3661",1-801-885-3833,Fusce.fermentum@Integereu.ca
1005,Macon Hopper,"22.30371  134.10815",1-354-448-6576,est.congue@acturpisegestas.net
1006,Christian Tucker,"73.86819  114.71156",1-106-627-3799,at.egestas.a@Fuscealiquam.net
the other dataset is orders dataset which hold the information about the order placed by the customers.
orders.csv
orderid,cust_id,item,order_dt,track_id
745651,1009,Cephalexin,08/09/2013,ULQ37MGX7MW
745652,1018,Hydrochlorothiazide,01/01/2015,NJJ84QEM7GO
745653,1027,Sertraline HCl,07/13/2014,UHX76SFB1EP
745654,1036,Simvastatin,01/05/2014,ZZH74UDJ6IC
745655,1045,Lisinopril,04/22/2014,TFV46VBX1ZI
745656,1054,Ibuprofen (Rx),08/22/2015,EHJ68BHA6UU
745657,1063,Suboxone,12/10/2014,BNU38NFJ6FO

we have already created the two tables in the hive named as customers and orders which hold the data for the customers and the orders.
customers table
hive> describe customers;
OK
custid              	int                 	customer id         
custname            	string              	customer name       
address             	string              	customer Address    
phone               	string              	customer phone      
email               	string              	customer email      
Time taken: 0.524 seconds, Fetched: 5 row(s)
orders table
hive> describe orders; 
OK
orderid             	int                 	Order ID            
fk_cust_id          	int                 	Cust ID reffering to customers
item                	string              	Order Item          
order_dt            	string              	Order Date          
tracking_id         	string              	Tracking ID for Order
Time taken: 0.732 seconds, Fetched: 5 row(s)

INNER JOIN
The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns in both tables.
hive> select c.custid,c.custname,o.orderid,o.tracking_id from customers c inner join orders o on c.custid=o.fk_cust_id
> ;
Total MapReduce jobs = 1
setting HADOOP_USER_NAME	rks
Execution log at: /tmp/rks/.log
2015-01-24 05:03:25	Starting to launch local task to process map join;	maximum memory = 932118528
2015-01-24 05:03:25	Processing rows:	101	Hashtable size:	101	Memory usage:	8029040	rate:	0.009
2015-01-24 05:03:25	Dump the hashtable into file: file:/tmp/rks/hive_2015-01-24_17-03-19_651_4336759746543942005/-local-10002/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2015-01-24 05:03:25	Upload 1 File to: file:/tmp/rks/hive_2015-01-24_17-03-19_651_4336759746543942005/-local-10002/HashTable-Stage-3/MapJoin-mapfile01--.hashtable File size: 6249
2015-01-24 05:03:25	End of local task; Time Taken: 0.751 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501241609_0017, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0017
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job  -kill job_201501241609_0017
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-01-24 17:03:31,889 Stage-3 map = 0%,  reduce = 0%
2015-01-24 17:03:34,902 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.92 sec
2015-01-24 17:03:35,906 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.92 sec
2015-01-24 17:03:36,916 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 0.92 sec
MapReduce Total cumulative CPU time: 920 msec
Ended Job = job_201501241609_0017
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.92 sec   HDFS Read: 8569 HDFS Write: 425 SUCCESS
Total MapReduce CPU Time Spent: 920 msec
OK
1009	Declan Hooper	745651	ULQ37MGX7MW
1018	Nathan Mcintyre	745652	NJJ84QEM7GO
1027	Troy Griffith	745653	UHX76SFB1EP
1036	Clark Frazier	745654	ZZH74UDJ6IC
1045	Tad Cross	745655	TFV46VBX1ZI
1054	Gannon Bradshaw	745656	EHJ68BHA6UU
1063	Walter Figueroa	745657	BNU38NFJ6FO
1072	Brady Mcclure	745658	NBK17XMP9XC
1081	Porter Bowers	745659	XHB61DLY6IK
1090	Jakeem Knight	745660	WNN67FXM2NC
1099	Macaulay Armstrong	745661	VXI39DIZ3HU
Time taken: 17.391 seconds, Fetched: 11 row(s)
hive>

LEFT OUTER JOIN
The LEFT JOIN keyword returns all rows from the left table with the matching rows in the right table. The result is NULL in the right side when there is no match.
hive> select c.custid,c.custname,o.orderid,o.tracking_id from customers c left outer join orders o on c.custid=o.fk_cust_id;
Total MapReduce jobs = 1
setting HADOOP_USER_NAME	rks
Execution log at: /tmp/rks/.log
2015-01-24 05:08:40	Starting to launch local task to process map join;	maximum memory = 932118528
2015-01-24 05:08:41	Processing rows:	101	Hashtable size:	101	Memory usage:	8133752	rate:	0.009
2015-01-24 05:08:41	Dump the hashtable into file: file:/tmp/rks/hive_2015-01-24_17-08-34_361_1900203016678725125/-local-10002/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2015-01-24 05:08:41	Upload 1 File to: file:/tmp/rks/hive_2015-01-24_17-08-34_361_1900203016678725125/-local-10002/HashTable-Stage-3/MapJoin-mapfile11--.hashtable File size: 6249
2015-01-24 05:08:41	End of local task; Time Taken: 0.908 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501241609_0018, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0018
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job  -kill job_201501241609_0018
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-01-24 17:08:48,387 Stage-3 map = 0%,  reduce = 0%
2015-01-24 17:08:51,396 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.88 sec
2015-01-24 17:08:52,400 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.88 sec
2015-01-24 17:08:53,408 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 0.88 sec
MapReduce Total cumulative CPU time: 880 msec
Ended Job = job_201501241609_0018
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 8569 HDFS Write: 2629 SUCCESS
Total MapReduce CPU Time Spent: 880 msec
OK
NULL	cust_name	NULL	NULL
1001	Sawyer Thompson	NULL	NULL
1002	Xenos Campos	NULL	NULL
1003	Brandon Castro	NULL	NULL
1004	Evan Gordon	NULL	NULL
1005	Macon Hopper	NULL	NULL
1006	Christian Tucker	NULL	NULL
1007	Rafael Erickson	NULL	NULL
1008	Brent Roth	NULL	NULL
1009	Declan Hooper	745651	ULQ37MGX7MW
1010	Neil Leon	NULL	NULL

RIGHT OUTER JOIN
The RIGHT JOIN keyword returns all rows from the right table , with the matching rows in the left table. The result is NULL in the left side when there is no match.
hive> select c.custid,c.custname,o.orderid,o.tracking_id from customers c right outer join orders o on c.custid=o.fk_cust_id;
Total MapReduce jobs = 1
setting HADOOP_USER_NAME	rks
Execution log at: /tmp/rks/.log
2015-01-24 05:10:50	Starting to launch local task to process map join;	maximum memory = 932118528
2015-01-24 05:10:51	Processing rows:	101	Hashtable size:	101	Memory usage:	7971568	rate:	0.009
2015-01-24 05:10:51	Dump the hashtable into file: file:/tmp/rks/hive_2015-01-24_17-10-44_697_521683568687053567/-local-10002/HashTable-Stage-3/MapJoin-mapfile20--.hashtable
2015-01-24 05:10:51	Upload 1 File to: file:/tmp/rks/hive_2015-01-24_17-10-44_697_521683568687053567/-local-10002/HashTable-Stage-3/MapJoin-mapfile20--.hashtable File size: 6317
2015-01-24 05:10:51	End of local task; Time Taken: 0.712 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201501241609_0019, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0019
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job  -kill job_201501241609_0019
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-01-24 17:10:58,019 Stage-3 map = 0%,  reduce = 0%
2015-01-24 17:11:01,064 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.91 sec
2015-01-24 17:11:02,067 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.91 sec
2015-01-24 17:11:03,073 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 0.91 sec
MapReduce Total cumulative CPU time: 910 msec
Ended Job = job_201501241609_0019
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.91 sec   HDFS Read: 5205 HDFS Write: 2668 SUCCESS
Total MapReduce CPU Time Spent: 910 msec
OK
NULL	NULL	NULL	track_id
1009	Declan Hooper	745651	ULQ37MGX7MW
1018	Nathan Mcintyre	745652	NJJ84QEM7GO
1027	Troy Griffith	745653	UHX76SFB1EP
1036	Clark Frazier	745654	ZZH74UDJ6IC
1045	Tad Cross	745655	TFV46VBX1ZI
1054	Gannon Bradshaw	745656	EHJ68BHA6UU
1063	Walter Figueroa	745657	BNU38NFJ6FO
1072	Brady Mcclure	745658	NBK17XMP9XC
1081	Porter Bowers	745659	XHB61DLY6IK
1090	Jakeem Knight	745660	WNN67FXM2NC
1099	Macaulay Armstrong	745661	VXI39DIZ3HU
NULL	NULL	745662	DKP00ZCS6FU
NULL	NULL	745663	YSJ42ZXP5ZG
NULL	NULL	745664	OBT90SWM3FN
NULL	NULL	745665	YVJ22BYO5DT
NULL	NULL	745666	DXY85QAL1BE
NULL	NULL	745667	THJ12NCF3KR

FULL OUTER JOIN
The FULL OUTER JOIN returns all rows from the left table and from the right table.The FULL OUTER JOIN combines the result of both LEFT and RIGHT joins.
hive> 
> select c.custid,c.custname,o.orderid,o.tracking_id from customers c full outer join orders o on c.custid=o.fk_cust_id; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201501241609_0020, Tracking URL = http://RKS-PC:50030/jobdetails.jsp?jobid=job_201501241609_0020
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job  -kill job_201501241609_0020
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2015-01-24 17:12:46,443 Stage-1 map = 0%,  reduce = 0%
2015-01-24 17:12:50,465 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.08 sec
2015-01-24 17:12:51,470 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.08 sec
2015-01-24 17:12:52,478 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
2015-01-24 17:12:53,488 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
2015-01-24 17:12:54,498 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
2015-01-24 17:12:55,504 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
2015-01-24 17:12:56,512 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
2015-01-24 17:12:57,521 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
2015-01-24 17:12:58,531 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 2.89 sec
2015-01-24 17:12:59,538 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 2.89 sec
2015-01-24 17:13:00,545 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.85 sec
2015-01-24 17:13:01,551 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.85 sec
2015-01-24 17:13:02,560 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.85 sec
MapReduce Total cumulative CPU time: 3 seconds 850 msec
Ended Job = job_201501241609_0020
MapReduce Jobs Launched: 
Job 0: Map: 2  Reduce: 1   Cumulative CPU: 3.85 sec   HDFS Read: 13774 HDFS Write: 4872 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 850 msec
OK
NULL	cust_name	NULL	NULL
NULL	NULL	NULL	track_id
1001	Sawyer Thompson	NULL	NULL
1002	Xenos Campos	NULL	NULL
1003	Brandon Castro	NULL	NULL
1004	Evan Gordon	NULL	NULL
1005	Macon Hopper	NULL	NULL
1006	Christian Tucker	NULL	NULL
1007	Rafael Erickson	NULL	NULL
1008	Brent Roth	NULL	NULL
1009	Declan Hooper	745651	ULQ37MGX7MW
1010	Neil Leon	NULL	NULL
1011	Lionel Vaughan	NULL	NULL
1012	Dillon Johns	NULL	NULL
1013	Davis Fisher	NULL	NULL
1014	Isaac Fields	NULL	NULL
1015	Micah Figueroa	NULL	NULL
1016	Burke Merrill	NULL	NULL
1017	Felix Ward	NULL	NULL
1018	Nathan Mcintyre	745652	NJJ84QEM7GO
1019	Perry Bullock	NULL	NULL
1020	Ali Kramer	NULL	NULL
1021	Timothy Avila	NULL	NULL
1022	Jason Wolfe	NULL	NULL

Monday, January 12, 2015

Apache Hive : HiveQL loading a data into the table and Query it

Hive has no row-level insert, update, and delete operations, the only way to put
data into an table is to use one of the “bulk” load operations. Or you can just write files
in the correct directories by other means.in this example we will see how you can load data into the hive table.

Create a managed table first defining ROW FORMAT and FIELD TERMINATED BY ',' to pre-process the data before loading into the table. then load the data using LOAD DATA LOCAL INPATH.
run the query with some where predicate will launch a mapreduce job and give you the results.

Sunday, January 11, 2015

Apache Hive : Getting started with HiveQL

HiveQL is the Hive query language.it does not conform to ANSI SQL like any other databases query languages. Hive do not support row level insert,update and delete and also do not support the transactions.HiveQL supports creation, alteration of databases and do support the drop DDL too. create,alter and drop DDL can be applied to the other HIVE database objects e.g. Table, views, Indexes and function.
In this post we will try to run some of the DDL statement on the HIVE Database.

Create Database: (Click Image to Enlarge)



Managed Tables Vs External Tables

Hive controls the life cycle of Managed table's data,hive store the data of the managed table under the directory /user/hive/warehouse by default.as soon as we drop the manged table,hive deletes the data inside of the table. theoretically hive has ownership of the data in case of manged table.
Hive provides you the flexibility to the user to define an External table that points to the data but do not take the ownership of the data. its a handy way to share data among various tools to do analytic over it.the External table can be defined using the 'EXTERNAL' keyword and LOCATION keyword to locate the table.

Create Table : Managed Table (Click Image to Enlarge)




Create Table : External Table (Click Image to Enlarge)



Create Table : Partitioned Table (Click Image to Enlarge)
To tune up database query Hive uses the concept of Partitioning in which database is partitioned among multiple part horizontally. Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks so that user select predicate can only look into the target partition only.

Apache Hive : Configuring Hive MetaStore using JDBC

Hive requires metastore that stores metadata (e.g. table schema,partition,SerDe) information so that user can run DDL and DML commands on the hive.
hive also have embedded metastore in the form of derby database but it is not suitable for the concurrent access or heavy usages.
Hive gives you flexibility to configure any of the JDBC complaint database to use as MetaStore such as MySQL,PostGress etc.As MySQL database is popular among the most used Hive MetaStore so in this post I will demonstrate you to configure MySQL as Hive MetaStore.

To configure MySQL as a Hive metastore, first install and start MySQL server, here is my first step to install MySQL server on Ubuntu

sudo apt-get install mysql-server

download the MySQL connector jar and place it into the $HIVE_HOME/lib

now edit conf/hive-site.xml as follows

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://hostname/hivedb?createDatabaseIfNotExist=true</value>
 </property>
<property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>username</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>password</value>
 </property>
</configuration>

if your metastore is running on remote machine do add the following property to conf/hive-site.xml

<property>
        <name>hive.metastore.uris</name>
        <value>thrift://remotehostname:9083</value>
    </property>

now you are good to go to use MySQL MetaStore.to start an external Hive metastore service use the command
hive --service metastore &


Saturday, January 10, 2015

Apache Hive : Installation and configuration of Hive


To install Hive you can follow my old post to build hive from source if you want to do some customization with the source code. if you are newbie to hive then i will recommend you to download some stable version of hive from apache hive website and extract the tarball to your preferred location.

I have downloaded the hive-.11 tarball from the apache hive website and extracted the tarball in the folder /usr/local/hive/hive-0.11.0, thats what you need to install hive on your machine make sure that Hadoop is already installed your machine.Hive uses the environment variable HADOOP_HOME to locate Hadoop jars and its configuration files.

$cd /usr/local
$mkdir hive
$cd hive
$curl -o http://archive.apache.org/dist/hive/hive-0.11.0/hive-0.11.0-bin.tar.gz
$tar -xzf hive-0.11.0.tar.gz

now set environment variable HADOOP_HOME and HIVE_HOME. open .bashrc in some editor e.g. vi and add these lines to the file
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME=/usr/local/hive/hive-0.11.0
export PATH=$PATH:$HIVE_HOME/bin
The $HIVE_HOME/bin directory contains executable scripts that launch various Hive services, including the hive command-line interface.
Hive binaries also contains the Trift services which provides access to other agents or the processes.on the top of the Thrift hive provides access using JDBC and ODBC.
one most of the important thing in hive installation is the metastore, metastore is used by hive to store table schema and metadata information.Hive has in built-in Derby database to store metadata.you are also free to configure and RDBMS of your choice to provide metastore service,typical hive installation includes MySQL and Postgress databases.

now you can check your installation by starting hive CLI.
$cd $HIVE_HOME
$hive
Logging initialized using configuration in jar:file:/usr/local/hive/hive-0.11.0/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/rks/hive_job_log_rks_2867@RKS-PC_201501101811_648770952.txt
hive> 






Apache Hive : Building Hive from source

It will be good if you download the Apache hive release from the Apache website and install on your development environment but some times it good to experiment with the apache hive source and then install it on your development environment.

In this post I will help you to build hive from source, the minimum requirement for building hive from source are you need to have SVN installation on your machine, Apart from this you should have JDK 6 or 7 and Apache maven installation.

just follow the instruction below to build the hive from source.

# checkout some stable version of hive from svn branch
$svn co http://svn.apache.org/repos/asf/hive/branches/branch-0.11 hive-11

# change location to directory hive-11
$cd hive-11

# now build the source with the help of maven command, I have skipped the testcases to run and enable the debug and error mode to check the problem in build if it occurs
$mvn clean install -Phadoop-2,dist -Dmaven.test.skip=true -e -X

# thats all what you need to build hive through source code, look into the package directory to find the build
$cd packaging/target/apache-hive-0.13.1-bin/apache-hive-0.13.1-bin
$ls
bin  conf  examples  hcatalog  lib  LICENSE  NOTICE  README.txt  RELEASE_NOTES.txt  scripts


Sunday, January 4, 2015

Bloom filter using Google Guava

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set.it does not actually store the element in the set to confirm availablity of the element.it is a probabilistic data structure which tells us that the element either definitely is not in the set or may be in the set.

The base data structure of a Bloom filter is a Bit Vector.
How BloomFilter works
Adding an element in the BloomFilter
Add the element to the filter.
Hash it a several times on some hashing techniques and set the bits to 1 where the index matches the results of the hash in the Bit Vector.

Test whether and element in the set or not
if an element is in the set, you follow the same hashing techniques and check if the bits are set to 1 or 0.
BloomFilter can guarantee an element does not exist. If the bits are not set, it’s simply impossible for the element to be in the set. However, a positive answer means the element is in the set or a hashing collision occurred.
/**
* 
*/
package com.rajkrrsingh.test.guava;

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnel;
import com.google.common.hash.Sink;

/**
* @author rks
* @03-Jan-2015
*/
public class GuavaBloomFilterTest {

private static BloomFilter<Employee> bloomFilter = BloomFilter.create(new Funnel<Employee>() {

@Override
public void funnel(Employee emp, Sink into) {
into.putString(emp.getEmpid())
.putString(emp.getEmpName())
.putInt(emp.getAge());
}
}, 100);

public static void main(String[] args) {

for(Employee e : Employee.getEmployeeList()){
GuavaBloomFilterTest.bloomFilter.put(e);
}

boolean mightContain = GuavaBloomFilterTest.bloomFilter.mightContain(new Employee("101", "RKS", 10000, 31));
System.out.println(mightContain);

// negative test
boolean mightContain1 = GuavaBloomFilterTest.bloomFilter.mightContain(new Employee("102", "RKS", 10000, 31));
System.out.println(mightContain1);

}

}

Google Guava Cache Implementation

Caches are very useful in a range of use cases. for frequent data you can have it in cache instead of doing some computing intensive operation that will help you to achieve a good performance.

you can build a cache using a simple hashmap implementation where you put your data in the key/value pair form. putIfAbsent() method of hashmap is very handy to update the hashmap if lookup for the key fails.

in the multithreaded environment you can suffer a dirty read in case you are using a simple hashmap implementation. for the multithreaded environment you need of focus on the ConcurrentHashMap which will provide you concurrency.

ConcurrentMap based caching is good but you need to adopt a use defined eviction policy to remove element which are not used frequently.
Google guava api provide you more flexible caching based on the ConcurrentMap with more cleaner implementation with a limitation that it not store data in files or on some server e.g. Memcached.Apart from the eviction policy it also support the putIfAbsent scenerio well.

in the coming example lets see how simple is to create and use Guava based cache.
package com.rajkrrsingh.test.guava;

import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;

public class GuavaCache {
 
  private static final long MAX_SIZE = 100;
 
  private final LoadingCache<String, String> cache;
 
  public GuavaCache() {
    cache = CacheBuilder.newBuilder().maximumSize( MAX_SIZE ).build( new CacheLoader<String, String>() {
        @Override
        public String load( String key ) throws Exception {
          return key.toUpperCase();
        }
      }
    );
  }
 
  public String getEntry( String key ) {
    return cache.getUnchecked( key );
  }
 
  
  public static void main(String[] args) {
 GuavaCache gchache = new GuavaCache();
 System.out.println(gchache.getEntry("hello"));
 System.out.println(gchache.cache.size());
 for (int i = 0; i < 150; i++) {
  System.out.println(gchache.getEntry("hello"+i));
  System.out.println(gchache.cache.size());
 }
 // checking the eviction policy
 System.out.println(gchache.cache.getIfPresent("hello"));
}
}

Google Gauva API in a one glance: Throwables

public final class Throwables
extends Object
Static utility methods pertaining to instances of Throwable.
static List<Throwable>getCausalChain(Throwable throwable)
Gets a Throwable cause chain as a list.
static ThrowablegetRootCause(Throwable throwable)
Returns the innermost cause of throwable.
static StringgetStackTraceAsString(Throwable throwable)
Returns a string containing the result of toString(), followed by the full, recursive stack trace of throwable.
static RuntimeExceptionpropagate(Throwable throwable)
Propagates throwable as-is if it is an instance of RuntimeException or Error, or else as a last resort, wraps it in a RuntimeException then propagates.
static <X extends Throwable>
void
propagateIfInstanceOf(Throwable throwable, Class<X> declaredType)
Propagates throwable exactly as-is, if and only if it is an instance of declaredType.
static voidpropagateIfPossible(Throwable throwable)
Propagates throwable exactly as-is, if and only if it is an instance of RuntimeException or Error.
static <X extends Throwable>
void
propagateIfPossible(Throwable throwable, Class<X> declaredType)
Propagates throwable exactly as-is, if and only if it is an instance of RuntimeExceptionError, or declaredType.
static <X1 extends Throwable,X2 extends Throwable>
void
propagateIfPossible(Throwable throwable, Class<X1> declaredType1, Class<X2> declaredType2)
Propagates throwable exactly as-is, if and only if it is an instance of RuntimeExceptionErrordeclaredType1, or declaredType2.
let's see how throwables work with the help of sample code
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.sql.SQLException;
import java.util.Iterator;
import java.util.List;

import com.google.common.base.Throwables;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaBiMapAndMultiMapDemo {

public static void main(String[] args) throws Exception {
throwablesDemo();
}
// working with throwables
public static void throwablesDemo() throws Exception {

try {
try {
try {
throw new RuntimeException("Root exception");
}
catch(Exception e) {
throw new SQLException("Middle tier exception", e);
}
}
catch(Exception e) {
throw new IllegalStateException("outer exception", e);
}
}
catch(Exception e) {
// getting the root exception
System.out.println(Throwables.getRootCause(e).getMessage());
// list of exceptions
List<Throwable> list = Throwables.getCausalChain(e);
Iterator<Throwable> itr= list.iterator();
while(itr.hasNext()){
System.out.println(itr.next().getMessage());
}
// get stacktrace as string
System.out.println(Throwables.getStackTraceAsString(e));
}
}

}

Google Gauva API in a one glance: BiMap,Multimap

@GwtCompatible
public interface BiMap<K,V>
extends Map<K,V>
A bimap (or "bidirectional map") is a map that preserves the uniqueness of its values as well as that of its keys. This constraint enables bimaps to support an "inverse view", which is another bimap containing the same entries as this bimap but with reversed keys and values.

@GwtCompatible
public interface Multimap<K,V>
A collection that maps keys to values, similar to Map, but in which each key may be associated with multiple values. You can visualize the contents of a multimap either as a map from keys to nonempty collections of values:
  • a → 1, 2
  • b → 3
... or as a single "flattened" collection of key-value pairs:
  • a → 1
  • a → 2
  • b → 3
Important: although the first interpretation resembles how most multimaps are implemented, the design of the Multimap API is based on the second form. So, using the multimap shown above as an example, the size() is 3, not 2, and thevalues() collection is [1, 2, 3], not [[1, 2], [3]]. For those times when the first style is more useful, use the multimap's asMap() view (or create a Map<K, Collection<V>> in the first place).



lets see sample code to understand how BiMap and MultiMap works

/**
* 
*/
package com.rajkrrsingh.test.guava;

import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.BiMap;
import com.google.common.collect.HashBiMap;
import com.google.common.collect.Multimap;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaBiMapAndMultiMapDemo {

public static void main(String[] args) {
biMapDemo();
multiMapDemo();
}

// BiMap unique with both keys and value, will throw an exception if try to add a duplicate value
public static void biMapDemo(){
BiMap<String, String> map = HashBiMap.create();
map.put("key1", "value1");
map.put("key2", "value2");
map.put("key3", "value3");

//map.put("key4", "value1");
//forceput to put forcefully
map.forcePut("key4", "value1");

// inverse a BiMap
BiMap<String, String> inverseMap = map.inverse();
System.out.println(inverseMap.get("value1"));
}

// MultiMap to substitute object similar to Map<String,List<String>>, do not enforce a list existance before adding an element to the list for given key
public static void multiMapDemo(){
Multimap<String, String> multimap = ArrayListMultimap.create();
multimap.put("key1","value1");
multimap.put("key1","value2");
multimap.put("key1","value3");
multimap.put("key1","value4");
multimap.put("key2","value1");
multimap.put("key2","value2");
multimap.put("key2","value3");
multimap.put("key3","value1");

System.out.println(multimap.get("key1").size());
}

}

Google Gauva API in a one glance: ObjectArrays, Ranges, Stopwatch

ObjectArrays : operate on arrays of any type and allows to concatenate arrays and add single element before first or after last position.
Ranges : Ranges and Range classes are used to define ranges and then checking if a given object is contained within defined range.
Stopwatch : to measure the elapsed time between a operation.
lets test these out using sample code
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Random;
import java.util.concurrent.TimeUnit;

import com.google.common.base.Predicate;
import com.google.common.base.Stopwatch;
import com.google.common.collect.Iterables;
import com.google.common.collect.ObjectArrays;
import com.google.common.collect.Range;
import com.google.common.collect.Ranges;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaTestDemo {

public static void main(String[] args) {
objectArrayDemo();
rangesDemo();
stopWatchDemo();
}

// ObjectArrays to operate on array of given type,concanate two arrays, add element on first and last index
public static void objectArrayDemo(){
String[] arr1 = new String[]{"ele1","ele2","ele3"};
String[] arr2 = new String[]{"ele11","ele22"};

String[] conArray = ObjectArrays.concat(arr1, arr2, String.class);
System.out.println(Arrays.toString(conArray));

String[] arr3= ObjectArrays.concat(arr1, "ele4");
System.out.println(Arrays.toString(arr3));

String[] arr4= ObjectArrays.concat("ele00", arr2);
System.out.println(Arrays.toString(arr4));
}

// Ranges to define ranges and then identify element is contained within the range or not
public static void rangesDemo(){
Range<Integer> closedRange = Ranges.closed(1, 10);
System.out.println(closedRange.contains(10));
System.out.println(closedRange.contains(11));

Range<Integer> rightOpenRange = Ranges.closedOpen(1, 10);
System.out.println(rightOpenRange.contains(10));
}

// demonstation of stopwatch
public static void stopWatchDemo(){
final Stopwatch stopwatch = new Stopwatch();
stopwatch.start();

//Sleep for few random milliseconds.
try {
Thread.sleep(new Random().nextInt(1000));
} catch (final InterruptedException interruptedException) {
interruptedException.printStackTrace();
}

stopwatch.stop(); //optional

System.out.println("Elapsed time ==> " + stopwatch);
System.out.println("Elapsed time in Nanoseconds ==> " + stopwatch.elapsedTime(TimeUnit.NANOSECONDS));
System.out.println("Elapsed time in Microseconds ==> " + stopwatch.elapsedTime(TimeUnit.MICROSECONDS));
System.out.println("Elapsed time in Milliseconds ==> " + stopwatch.elapsedTime(TimeUnit.MILLISECONDS));
}

}

Google Gauva API in a one glance: Iterables


@GwtCompatible(emulated=true)
public final class Iterables
extends Object
This class contains static utility methods that operate on or return objects of type Iterable. Except as noted, each method has a corresponding Iterator-based method in the Iterators class.Performance notes: Unless otherwise noted, all of the iterables produced in this class are lazy, which means that their iterators only advance the backing iteration when absolutely necessary.

lets see how iterables work using the following sample code, you can follow the blog entry for Employee class implementation.
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.Iterator;
import java.util.List;

import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaIterablesDemo {

public static void main(String[] args) {
iterablesDemo();
}

// working with iterables
public static void iterablesDemo(){

//all() method allows to check if defined condition is satisfied by all elements from Collection
boolean flag = Iterables.all(Employee.getEmployeeList(), new Predicate<Employee>() {

@Override
public boolean apply(Employee emp) {
return emp.getEmpid().length()==3;
}
});
if(flag){
System.out.println("all emp of empid of length 3");
}else{
System.out.println("not all emp of empid of length 3");
}

// iterate in infinite cycle
int iterateCount=0;
Iterator<Employee> itr = Iterables.cycle(Employee.getEmployeeList()).iterator();
while(itr.hasNext() && iterateCount<20){
System.out.println(itr.next());
iterateCount++;

}

// counts number of elemens in the Collection
int count = Iterables.frequency(Employee.getEmployeeList(), null);
System.out.println(count);

//getFirst
System.out.println(Iterables.getFirst(Employee.getEmployeeList(), new Employee()));

// getLast
System.out.println(Iterables.getLast(Employee.getEmployeeList(), new Employee()));

// partition a list
Iterable<List<Employee>> iterables = Iterables.partition(Employee.getEmployeeList(), 2);
System.out.println(Iterables.size(iterables));

// iterables to array
Employee[] empArr = Iterables.toArray(Employee.getEmployeeList(), Employee.class);
System.out.println(empArr.length);
}
}

Google Gauva API in a one glance: Preconditions and Constraints

allows to check correctness of parameters passed to our method and throw an appropriate exception when necessary.
public final class Preconditions
extends Object
Simple static methods to be called at the start of your own methods to verify correct arguments and state. This allows constructs such as
if (count <= 0) {
       throw new IllegalArgumentException("must be positive: " + count);
     }
to be replaced with the more compact
checkArgument(count > 0, "must be positive: %s", count);
Deprecated. 
Use Preconditions for basic checks. In place of constrained collections, we encourage you to check your preconditions explicitly instead of leaving that work to the collection implementation. For the specific case of rejecting null, consider the immutable collections. This class is scheduled for removal in Guava 16.0.
@Beta
@Deprecated
@GwtCompatible
public final class Constraints
extends Object
Factories and utilities pertaining to the Constraint interface.

let's see how precondition and Constraints works
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.List;

import com.google.common.base.Preconditions;
import com.google.common.collect.Constraint;
import com.google.common.collect.Constraints;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaPreCoditionsDemo {

public static void main(String[] args) {
constrainDemo();
preConditionDemo("rks");
}

// create a constrain on the list which will check when you try to add an element on the list, throw exception in constrained voilation
public static void constrainDemo(){
List<Employee> list = Constraints.constrainedList(Employee.getEmployeeList(), new Constraint<Employee>() {

@Override
public Employee checkElement(Employee emp) {
Preconditions.checkArgument(emp.getAge()>30);
return emp;
}
});
System.out.println(list.size());
// allow to add the element in the list
list.add(new Employee("109", "empName", 10210, 38));
System.out.println(list.size());
}

// validate argument passed to the method and throw required exceptions
public static void preConditionDemo(String arg){
System.out.println(Preconditions.checkNotNull(arg).toUpperCase());
}
}



Google Gauva API in a one glance: Splitter and Strings class

Splitter Class

@GwtCompatible(emulated=true)
public final class Splitter
extends Object
Extracts non-overlapping substrings from an input string, typically by recognizing appearances of a separator sequence. This separator can be specified as a single character, fixed stringregular expression or CharMatcher instance. Or, instead of using a separator at all, a splitter can extract adjacent substrings of a given fixed length.

work opposite to the joiner, Split string into the elements of collection using delimeter which can be sequence of chars,CharMatcher or a regex.
let's see how splitter work using sample code.
/**
 * 
 */
package com.rajkrrsingh.test.guava;

import java.util.Iterator;

import com.google.common.base.Splitter;

/**
 * @author rks
 * @04-Jan-2015
 */
public class GuavaSplitterDemo {

 public static void main(String[] args) {
  splitterDemo();
 }

 // Split a string into a collection of elements
 public static void splitterDemo(){
  Iterator<String> itr = Splitter.on("#").split("RKS#John#Harry#Tom").iterator();
  while(itr.hasNext()){
   System.out.println(itr.next());
  }
  // split on pattern
  Iterator<String> itr1 = Splitter.onPattern("\\d+").split("RKS1John2Harry3Tom").iterator();
  while(itr1.hasNext()){
   System.out.println(itr1.next());
  }
  // omit empty string
  Iterator<String> itr2 = Splitter.on("#").omitEmptyStrings().trimResults().split("RKS# #John#   #Harry#Tom").iterator();
  while(itr2.hasNext()){
   System.out.println(itr2.next());
  }
  // fixed length string extractor
  Iterator<String> itr3 = Splitter.fixedLength(3).split("RKS#John#Harry#Tom").iterator();
  while(itr3.hasNext()){
   System.out.println(itr3.next());
  }
 }
}


Strings Class

@GwtCompatible
public final class Strings
extends Object
Static utility methods pertaining to String or CharSequence instances.

lets see how splitter work using the following sample code.
/**
* 
*/
package com.rajkrrsingh.test.guava;

import com.google.common.base.Strings;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaStringsDemo {

public static void main(String[] args) {
stringHelper();
}

// helper class to String 
public static void stringHelper(){
String str = ""; 
if(Strings.isNullOrEmpty(str)){
System.out.println("String is null or empty");
}
System.out.println(Strings.nullToEmpty(str));
System.out.println(Strings.emptyToNull(Strings.nullToEmpty(str)));
//repeat a string 
System.out.println(Strings.repeat("RKS", 3));
// string padding
System.out.println(Strings.padEnd("RKS", 10, '*'));
System.out.println(Strings.padStart("RKS", 10, '*'));
}
}


Google Gauva API in a one glance: Objects Class to Implement hashcode, equals, toString and compareTo methods

Objects class helps the developer accurately and easily implement the equals(), hashCode(),toString(), and compareTo() methods. it gives the developer to provide implementation of these utility method in more clean and compact way.Developer need not to worry about the clumsy implementation involving variables and nulls.
lets demonstrate it by taking out Employee Class and override these methods with the help of Objects class
package com.rajkrrsingh.test.guava;

import java.util.ArrayList;
import java.util.List;

import com.google.common.base.Objects;
import com.google.common.collect.ComparisonChain;

public class Employee implements Comparable<Employee>{

private String empid;
private String empName;
private int salary;
private int age;
private static List<Employee> list;

static{
list = new ArrayList<Employee>();
list.add(new Employee("101", "RKS", 10000, 31));
list.add(new Employee("102", "Derek", 10500, 35));
list.add(new Employee("103", "Jack", 9000, 29));
list.add(new Employee("104", "Nick", 9600, 35));
}

public  static List<Employee> getEmployeeList(){
return list;
}

public Employee(){}

public Employee(String empid,String empName,int salary,int age){
this.empid = empid;
this.empName = empName;
this.salary = salary;
this.age = age;
}

public String getEmpid() {
return empid;
}

public void setEmpid(String empid) {
this.empid = empid;
}

public String getEmpName() {
return empName;
}

public void setEmpName(String empName) {
this.empName = empName;
}

public int getSalary() {
return salary;
}

public void setSalary(int salary) {
this.salary = salary;
}

public int getAge() {
return age;
}

public void setAge(int age) {
this.age = age;
}


@Override
public int hashCode() {
// TODO Auto-generated method stub
return Objects.hashCode(empid,empName);
}


@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(obj instanceof Employee)) {
return false;
}
Employee emp = (Employee) obj;
return Objects.equal(this.empid, emp.empid) && Objects.equal(this.empName, emp.empName); 
}


@Override
public String toString() {
return Objects.toStringHelper(this).add("empid", empid).add("empName", empName).toString();
}


@Override
public int compareTo(Employee o) {
// ComparisionChain 
return ComparisonChain.start().compare(empid, o.empid)
.compare(empName, o.empName)
.compare(salary, o.salary)
.compare(age, o.age)
.result();
}


}
now test our implementation using out tester class
/**
* 
*/
package com.rajkrrsingh.test.guava;

import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;

import com.google.common.base.CharMatcher;
import com.google.common.base.Function;
import com.google.common.base.Functions;
import com.google.common.base.Joiner;
import com.google.common.base.Predicate;
import com.google.common.base.Predicates;
import com.google.common.collect.Collections2;
import com.google.common.collect.Iterables;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaObjectsDemo {

public static void main(String[] args) {
objectClass();
cleanComapreToTest();
}


// see the compareTo implementation in Employee class
public static void cleanComapreToTest(){
Set<Employee> set = new HashSet<Employee>();
// print true
System.out.println(set.add(new Employee("101", "RKS", 10000, 31)));
// print false - duplicate object
System.out.println(set.add(new Employee("101", "RKS", 10000, 31)));
}

// Object class with the helper method to implement hashcode() equals() and toString() methods
public static void objectClass(){
Employee e = new Employee("105", "Tom", 80000, 24);
// toString test
System.out.println(e);
Employee e1 = new Employee("105", "Tomm", 80000, 24);
System.out.println(e1.equals(e));
}
}

Google Gauva API in a one glance: CharMatcher

It's a class similar to Predicate,apply a boolean check on the chars, have methods to work on the char sequence e.g.
removeFrom(), replaceFrom(), trimFrom(), collapseFrom(), retainFrom()
lets see a simple example involving CharMatcher, follow along the code here
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import com.google.common.base.CharMatcher;
import com.google.common.base.Function;
import com.google.common.base.Functions;
import com.google.common.base.Joiner;
import com.google.common.base.Predicate;
import com.google.common.base.Predicates;
import com.google.common.collect.Collections2;
import com.google.common.collect.Iterables;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaCharMatcherDemo {

public static void main(String[] args) {
charMatcher();
}


public static void charMatcher(){
// remove char occurance in the range of 1 and 4
System.out.println(CharMatcher.inRange('1', '4').removeFrom(" 981 654 239"));
// negate the result obtained from previous statement
System.out.println(CharMatcher.inRange('1', '4').negate().removeFrom(" 981 654 239"));
// count no of digit in the string
System.out.println(CharMatcher.DIGIT.countIn(" 981 654 239 "));
// collapse a matching digits with the provided chars
System.out.println(CharMatcher.DIGIT.collapseFrom("collapse from 981 654 239", 'X'));
// replace digit in the string with the provide char
System.out.println(CharMatcher.DIGIT.replaceFrom("collapse from 981 654 239", 'X'));
// trim a string on matching char
System.out.println(CharMatcher.is(' ').trimFrom(" 981 654 239 "));
System.out.println(CharMatcher.is(' ').trimLeadingFrom(" 981 654 239 "));
System.out.println(CharMatcher.is(' ').trimTrailingFrom(" 981 654 239 "));
System.out.println(CharMatcher.is(' ').trimAndCollapseFrom(" 981 654 239 ",'X'));
}
}
Output
 98 65 9
1423
9
collapse from  X X X
collapse from  XXX XXX XXX
981 654 239
981 654 239 
 981 654 239
981X654X239
please follow the comment in the code to relate output.

Google Gauva API in a one glance: Joiner



@GwtCompatible
public class Joiner
extends Object
An object which joins pieces of text (specified as an array, Iterable, varargs or even a Map) with a separator. It either appends the results to an Appendable or returns them as a String.

In the following example, we join a List of String into one String using the “#” as a separator:
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import com.google.common.base.CharMatcher;
import com.google.common.base.Function;
import com.google.common.base.Functions;
import com.google.common.base.Joiner;
import com.google.common.base.Predicate;
import com.google.common.base.Predicates;
import com.google.common.collect.Collections2;
import com.google.common.collect.Iterables;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaJoinerDemo {

public static void main(String[] args) {
joinerDemo();
}


// tranform a collection into a sting
public static void joinerDemo(){
List<String> list = Arrays.asList("RKS","John","Nick","Harry");
System.out.println(Joiner.on("#").join(list));

List<String> list1 = Arrays.asList("RKS","John",null,"Nick","Harry");
//skip nulls
System.out.println(Joiner.on("#").skipNulls().join(list1));
//defualt value for null
System.out.println(Joiner.on("#").useForNull("BLANK").join(list1));

// joiner on Map
Map<String, String> map = new HashMap<String, String>();
map.put("key1", "value1");
map.put("key2", "value2");
map.put("key3", "value3");
map.put("key4", "value4");

System.out.println(Joiner.on("#").withKeyValueSeparator(":").join(map));
}
}
Please follow my comments in the code to relate with the output
RKS#John#Nick#Harry
RKS#John#Nick#Harry
RKS#John#BLANK#Nick#Harry
key4:value4#key3:value3#key2:value2#key1:value1

Google Gauva API in a one glance: Predicate

  • Predicate<T>, which has the single method boolean apply(T input). Instances of Predicate are generally expected to be side-effect-free and consistent with equals.

@GwtCompatible
public interface Predicate<T>
Determines a true or false value for a given input.The Predicates class provides common predicates and related utilities.
lets see how to use predicate to filter out some elements from the given collection. create a tester class as follows and follow along my blog post to get Employee Class
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.Collection;
import java.util.Iterator;
import java.util.List;

import com.google.common.base.Function;
import com.google.common.base.Functions;
import com.google.common.base.Predicate;
import com.google.common.base.Predicates;
import com.google.common.collect.Collections2;
import com.google.common.collect.Iterables;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaPredicateDemo {

public static void main(String[] args) {
pridcateDemo();
}

public static void pridcateDemo(){
List<Employee> empList = Employee.getEmployeeList();

Predicate<Employee> ageOver30 = new Predicate<Employee>() {

@Override
public boolean apply(Employee emp) {
if(emp.getAge()>30){
return true;
}
return false;
}
};

Predicate<Employee> slryLt10000 = new Predicate<Employee>() {

@Override
public boolean apply(Employee emp) {
if(emp.getSalary()<10000)
return true;
return false;
}
};

System.out.println("**** print emp name whose age is greater than 30 ****");
Iterator<Employee> filterAgeIterator= Iterables.filter(empList, ageOver30).iterator();
while(filterAgeIterator.hasNext()){
System.out.println(filterAgeIterator.next().getEmpName());
}

System.out.println("**** print emp name whose age is greater than 30 and salary is less than 10000 ****");
Iterator<Employee> filterOnAgeAndSal= Iterables.filter(empList, Predicates.and(ageOver30, slryLt10000)).iterator();
while(filterOnAgeAndSal.hasNext()){
System.out.println(filterOnAgeAndSal.next().getEmpName());
}

System.out.println("**** print emp name whose age is greater than 30 OR salary is less than 10000 ****");
Iterator<Employee> filterOnAgeORSal= Iterables.filter(empList, Predicates.or(ageOver30, slryLt10000)).iterator();
while(filterOnAgeORSal.hasNext()){
System.out.println(filterOnAgeORSal.next().getEmpName());
}
}
}
Please follow my comments in the code to relate with the output
**** print emp name whose age is greater than 30 ****
RKS
Derek
Nick
**** print emp name whose age is greater than 30 and salary is less than 10000 ****
Nick
**** print emp name whose age is greater than 30 OR salary is less than 10000 ****
RKS
Derek
Jack
Nick


Google Gauva API in a one glance: Composite Function


Guava API provide a way to apply series of functions on the given collection using Functions.compose

public static <A,B,C> Function<A,C> compose(Function<B,C> g,
                                            Function<A,? extends B> f)
Returns the composition of two functions. For f: A->B and g: B->C, composition is defined as the function h such that h(a) == g(f(a)) for each a.
Parameters:
g - the second function to apply
f - the first function to apply
Returns:
the composition of f and g

to test it further please follow my blog entry and add one method to the GuavaFunctionDemo class and run it
/**
*
*/
package com.rajkrrsingh.test.guava;

import java.util.Collection;
import java.util.Iterator;

import com.google.common.base.Function;
import com.google.common.base.Functions;
import com.google.common.collect.Collections2;

/**
* @author rks
* @04-Jan-2015
*/
public class GuavaFunctionDemo {

public static void main(String[] args) {
//printEmployeeInUpperCase();
printResultofCompositeFunction();
}

// Transformation of one collection to the other collection using Guava function
public static void printEmployeeInUpperCase(){
Collection<String> upperCaseEmpName = Collections2.transform(Employee.getEmployeeList(),new Function<Employee, String>() {

@Override
public String apply(Employee emp) {
if(emp != null)
return emp.getEmpName().toUpperCase();
return "";

}
});

Iterator<String> itr = upperCaseEmpName.iterator();
while(itr.hasNext()){
System.out.println(itr.next());
}
}

// Apply a series of transformation by compostion of function
public static void printResultofCompositeFunction(){
Function<Employee, String> upperCaseEmpName = new Function<Employee, String>() {

@Override
public String apply(Employee emp) {
if(emp!=null)
return emp.getEmpName().toUpperCase();
return "";
}
};

Function<String, String> reverseEmpName = new Function<String, String>() {

@Override
public String apply(String empName) {
if (empName!=null) {
return new StringBuilder(empName).reverse().toString();
}
return "";
}

};
;
Collection<String> upperReverseCollection = Collections2.transform(Employee.getEmployeeList(), Functions.compose(reverseEmpName, upperCaseEmpName));

Iterator<String> itr = upperReverseCollection.iterator();
while(itr.hasNext()){
System.out.println(itr.next());
}
}

}

you can see here we have applied two function here, the first on is to change element of the collection in uppercase and then reverse it.

Result:
SKR
KERED
KCAJ
KCIN