Blog::: JvmNotFoundException: May 2017

Saturday, May 27, 2017

Spark LLAP Setup for Spark Thrift Server

ENV HDP-2.6.0.3-8

Download spark-llap assembly jar from http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/

Add following in Custom spark-thrift-sparkconf

spark_thrift_cmd_opts --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.executor.extraClassPath /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.sql.hive.hiveserver2.url jdbc:hive2://hostname1.hwxblr.com:10500/;principal=hive/_HOST@EXAMPLE.COM;hive.server2.proxy.user=${user}
spark.hadoop.hive.zookeeper.quorum hostname1.hwxblr.com:2181

Add following to Custom-spark-defaults

spark.sql.hive.hiveserver2.url jdbc:hive2://hostname1.hwxblr.com:10500/;principal=hive/_HOST@EXAMPL
spark.jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assembly.jar
spark.hadoop.hive.zookeeper.quorum hostname1.hwxblr.com:2181
spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.executor.extraClassPath /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.6.0.3-8-assemb

start thirft server from ambari and run query as follows

beeline -u "jdbc:hive2://hostname3.hwxblr.com:10015/;principal=hive/_HOST@EXAMPLE.COM" -e "select * from test;"

if your query failed with following exception then please check you spark-llap-assembly is available on executors classpath (revisit spark.executor.extraClassPath )

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, hostname1.hwxblr.com): java.lang.NullPointerException
	at org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy.<init>(LlapProtocolClientProxy.java:94)
	at org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient.<init>(LlapTaskUmbilicalExternalClient.java:119)
	at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:143)
	at org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:51)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:240)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.rdd.HadoopRDD$HadoopMapPartitionsWithSplitRDD.compute(HadoopRDD.scala:388)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

view raw Spark LLAP Setup for Thrift server.md hosted with ❤ by GitHub

Sunday, May 21, 2017

Steps to setup kdc before installing kerberos through ambari on hortonworks cluster Raw

ENV

#### OS centos7
#### REALM EXAMPLE.COM (update accordingly)
#### AS and KDC are running on hostname rks253secure.hdp.local (update accordingly)

install required packages

yum install -y krb5-server krb5-workstation pam_krb5
cd  /var/kerberos/krb5kdc

modify kadm acls

cat kadm5.acl 
*/admin@EXAMPLE.COM	*

modify kdc conf

cat kdc.conf 
[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
 EXAMPLE.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

modify krb5.conf on node where ambari server is running.

cat /etc/krb5.conf

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = EXAMPLE.COM
  ticket_lifetime = 24h
  dns_lookup_realm = false
  dns_lookup_kdc = false
  default_ccache_name = /tmp/krb5cc_%{uid}
  #default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
  #default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5

[logging]
  default = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
  kdc = FILE:/var/log/krb5kdc.log

[realms]
  EXAMPLE.COM = {
    admin_server = rks253secure.hdp.local
    kdc = rks253secure.hdp.local
  }

create KDC database

kdb5_util create -s -r EXAMPLE.COM

Start and Enable Kerberos

systemctl start krb5kdc kadmin
systemctl enable krb5kdc kadmin

create principal root/admin@EXAMPLE.COM

# kadmin.local
kadmin.local: addprinc root/admin
kadmin.local: quit

test if you are able to get TGT after supplying password.

kinit root/admin@EXAMPLE.COM

now start ambari-server enable kerberos wizard which will ask you to supply KDC and AS host name and REALM to start

view raw kerberos_installation_on_hdp_centos7.md hosted with ❤ by GitHub

Thursday, May 18, 2017

How to Configure and Run Storm AutoHDFS plugin (sample Application)

Add these configuration to custom storm-site.

nimbus.autocredential.plugins.classes ["org.apache.storm.hdfs.common.security.AutoHDFS"]
nimbus.credential.renewers.classes ["org.apache.storm.hdfs.common.security.AutoHDFS"]
hdfs.keytab.file  /etc/security/keytabs/hdfs.headless.keytab
hdfs.kerberos.principal hdfs-s253_kerb@LAB.HORTONWORKS.NET
nimbus.credential.renewers.freq.secs 518400

nimbus.childopts -Xmx1024m _JAAS_PLACEHOLDER -javaagent:/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8649,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/conf/jmxetric-conf.xml:/etc/hadoop/conf/hdfs-site.xml:/etc/hadoop/conf/core-site.xml:/etc/hbase/conf/hbase-site.xml,process=Nimbus_JVM

add the following in storm-env

export STORM_EXT_CLASSPATH=/usr/hdp/current/hbase-client/lib/:/usr/hdp/current/hadoop-mapreduce-client/:/usr/hdp/current/hadoop-client

remove the hadoop-aws.*.jar from classpath.

update core-site,core-default and hdfs-site.xml to storm-hdfs jar

add following snippet in core-site.xml

<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>

[root@r253secure contrib]# jar uvf storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar core-site.xml 
[root@r253secure contrib]# jar uvf storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar hdfs-site.xml
[root@r253secure contrib]# jar uvf storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar core-default.xml

jar tvf storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar | egrep 'core-site|hdfs-site'
  5538 Wed May 03 14:25:04 UTC 2017 core-site.xml
  8047 Tue Jan 17 14:14:10 UTC 2017 hdfs-site.xml

copy updated storm-hdfs to storm lib directory

[root@r253secure contrib]# cp storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar ../lib/

copy core-site.xml and hdfs-site.xml to lib folder

copy hadoop-hdfs.jar to storm lib folder.

Sample code to test your topology

https://github.com/rajkrrsingh/sample-storm-hdfs-app

view raw Storm_AutoHDFS_configuration.md hosted with ❤ by GitHub

Monday, May 8, 2017

oozie spark shell action example

workflow dir @hdfs

 hadoop fs -ls /tmp/sparkOozieShellAction/
Found 4 items
-rw-r--r--   3 oozie hdfs        178 2017-05-08 07:00 /tmp/sparkOozieShellAction/job.properties
drwxr-xr-x   - oozie hdfs          0 2017-05-08 07:01 /tmp/sparkOozieShellAction/lib
-rw-r--r--   3 oozie hdfs        279 2017-05-08 07:12 /tmp/sparkOozieShellAction/spark-pi-job.sh
-rw-r--r--   3 oozie hdfs        712 2017-05-08 07:34 /tmp/sparkOozieShellAction/workflow.xml

oozie- spark share lib

[oozie@rk253 ~]$ hadoop fs -ls /user/oozie/share/lib/lib_20170508043956/spark
Found 8 items
-rw-r--r--   3 oozie hdfs     339666 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-api-jdo-3.2.6.jar
-rw-r--r--   3 oozie hdfs    1890075 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-core-3.2.10.jar
-rw-r--r--   3 oozie hdfs    1809447 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-rdbms-3.2.9.jar
-rw-r--r--   3 oozie hdfs        167 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/hive-site.xml
-rw-r--r--   3 oozie hdfs      22440 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/oozie-sharelib-spark-4.2.0.2.5.3.0-37.jar
-rw-r--r--   3 oozie hdfs      44846 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/py4j-0.9-src.zip
-rw-r--r--   3 oozie hdfs     357563 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/pyspark.zip
-rw-r--r--   3 oozie hdfs  188897932 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar

job.properties

[oozie@rk253 ~]$ cat job.properties 
nameNode= hdfs://rk253.openstack:8020 
jobTracker= rk253.openstack:8050 
oozie.wf.application.path=/tmp/sparkOozieShellAction/ 
oozie.use.system.libpath=true

workflow.xml

[oozie@rk253 ~]$ cat job.properties 
nameNode= hdfs://rk253.openstack:8020 
jobTracker= rk253.openstack:8050 
oozie.wf.application.path=/tmp/sparkOozieShellAction/ 
oozie.use.system.libpath=true 
master=yarn-client
[oozie@rk253 ~]$ cat workflow.xml 
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.4">
    <start to="shellAction"/>
    <action name="shellAction">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>spark-pi-job.sh
            </exec>
            <env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
		<file>/tmp/sparkOozieShellAction/spark-pi-job.sh#spark-pi-job.sh</file>
	    <capture-output/>
        </shell>
    <ok to="end"/>
    <error to="killAction"/>
    </action>
    <kill name="killAction">
        <message>"Killed job due to error"</message>
    </kill>
    <end name="end"/>
</workflow-app>

spark-pi-job.sh

[oozie@rk253 ~]$ cat spark-pi-job.sh 
/usr/hdp/2.5.3.0-37/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/hdp/2.5.3.0-37/spark/lib/spark-examples-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar 10

run oozie job

oozie job -oozie http://rk253:11000/oozie/ -config job.properties -run

view raw oozie_spark_shell_action.md hosted with ❤ by GitHub

Sunday, May 7, 2017

oozie spark action example

directory structure at hdfs

[oozie@rk253 ~]$ hadoop fs -lsr /tmp/sparkOozieAction
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxrwxrwx   3 oozie hdfs        167 2017-05-08 05:01 /tmp/sparkOozieAction/job.properties
drwxrwxrwx   - oozie hdfs          0 2017-05-08 05:04 /tmp/sparkOozieAction/lib
-rwxrwxrwx   3 oozie hdfs  110488188 2017-05-08 04:58 /tmp/sparkOozieAction/lib/spark-examples-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar
-rw-r--r--   3 oozie hdfs       1571 2017-05-08 05:46 /tmp/sparkOozieAction/workflow.xml

oozie share lib

[oozie@rk253 ~]$ hadoop fs -ls /user/oozie/share/lib/lib_20170508043956/spark
Found 8 items
-rw-r--r--   3 oozie hdfs     339666 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-api-jdo-3.2.6.jar
-rw-r--r--   3 oozie hdfs    1890075 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-core-3.2.10.jar
-rw-r--r--   3 oozie hdfs    1809447 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-rdbms-3.2.9.jar
-rw-r--r--   3 oozie hdfs        167 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/hive-site.xml
-rw-r--r--   3 oozie hdfs      22440 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/oozie-sharelib-spark-4.2.0.2.5.3.0-37.jar
-rw-r--r--   3 oozie hdfs      44846 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/py4j-0.9-src.zip
-rw-r--r--   3 oozie hdfs     357563 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/pyspark.zip
-rw-r--r--   3 oozie hdfs  188897932 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar

job.properties

[oozie@rk253 ~]$ cat job.properties 
nameNode= hdfs://rk253.openstack:8020 
jobTracker= rk253.openstack:8050 
oozie.wf.application.path=/tmp/sparkOozieAction/ 
oozie.use.system.libpath=true 
master=yarn-client

workflow.xml

[oozie@rk253 ~]$ cat job.properties 
nameNode= hdfs://rk253.openstack:8020 
jobTracker= rk253.openstack:8050 
oozie.wf.application.path=/tmp/sparkOozieAction/ 
oozie.use.system.libpath=true 
master=yarn-client
[oozie@rk253 ~]$ cat workflow.xml 
<workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5"> 
        <start to="spark-action"/> 
        <action name="spark-action"> 
                <spark xmlns="uri:oozie:spark-action:0.1"> 
                        <job-tracker>${jobTracker}</job-tracker> 
                        <name-node>${nameNode}</name-node> 
                        <configuration> 
                        </configuration> 
                        <master>${master}</master> 
                        <name>spark pi job</name> 
                        <class>org.apache.spark.examples.SparkPi</class> 
                        <jar>${nameNode}/tmp/sparkOozieAction/lib/spark-examples-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar</jar> 
                        <spark-opts>--driver-memory 512m --executor-memory 512m --num-executors 1</spark-opts> 
                        <arg>10</arg> 
                </spark> 
                <ok to="end"/> 
                <error to="kill"/> 
        </action> 
        <kill name="kill"> 
                <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 
        </kill> 
        <end name="end"/> 
</workflow-app>

run

oozie job -oozie http://rk253:11000/oozie/ -config job.properties -run

view raw oozie_spark_action_example.md hosted with ❤ by GitHub