Sunday, October 11, 2015

Apache Spark : Setup Eclipse (using maven) to Build Spark-Scala project

1. Create maven scala quick start project
mvn archetype:generate -B -DarchetypeArtifactId=scala-quickstart-archetype -DarchetypeVersion=0.8.2  -DgroupId=com.example -DartifactId=spark-project -Dversion=1.0 -Dpackage=com.example.project  -DsourceFolders=scala-only

2. Import spark-project into Eclipse using the maven project import wizard.
3. update pom.xml as follows
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="" xmlns:xsi="" xsi:schemaLocation="">



<!-- this is so we don't end with a compile error in maven-compiler-plugin -->     

4. Add the sample scala program to src/main/scala(I have taken the sample from apache spark website)
package com.example.project

* @author rsingh
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
def main(args: Array[String]) {
val logFile = "/root/test.txt" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

5.Now build your project using maven package command
mvn clean package

6.the above command will create a fat jar in the target directory,run the target jar using the spark-submit
$bin/spark-submit --class com.example.project.SimpleApp --master spark://maprdemo:7077 /root/spark-project-1.0-jar-with-dependencies.jar

