Sunday, January 11, 2015

Apache Hive : Configuring Hive MetaStore using JDBC

Hive requires metastore that stores metadata (e.g. table schema,partition,SerDe) information so that user can run DDL and DML commands on the hive.
hive also have embedded metastore in the form of derby database but it is not suitable for the concurrent access or heavy usages.
Hive gives you flexibility to configure any of the JDBC complaint database to use as MetaStore such as MySQL,PostGress etc.As MySQL database is popular among the most used Hive MetaStore so in this post I will demonstrate you to configure MySQL as Hive MetaStore.

To configure MySQL as a Hive metastore, first install and start MySQL server, here is my first step to install MySQL server on Ubuntu

sudo apt-get install mysql-server

download the MySQL connector jar and place it into the $HIVE_HOME/lib

now edit conf/hive-site.xml as follows

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://hostname/hivedb?createDatabaseIfNotExist=true</value>
 </property>
<property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>username</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>password</value>
 </property>
</configuration>

if your metastore is running on remote machine do add the following property to conf/hive-site.xml

<property>
        <name>hive.metastore.uris</name>
        <value>thrift://remotehostname:9083</value>
    </property>

now you are good to go to use MySQL MetaStore.to start an external Hive metastore service use the command
hive --service metastore &


No comments: