Sunday, March 20, 2016

Apache Flink is a streaming data flow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. this is a sample application to consume output of vmstat command as a stream, so lets get hands dirty
mkdir  flink-steaming-example
cd flink-steaming-example/
mkdir -p src/main/scala
cd src/main/scala/
vim FlinkStreamingExample.scala

import org.apache.flink.streaming.api.scala._
object FlinkStreamingExample{
    def main(args:Array[String]){
        val env = StreamExecutionEnvironment.getExecutionEnvironment
        val socketVmStatStream = env.socketTextStream("ip-10-0-0-233",9000);
        socketVmStatStream.print
        env.execute()
    }    
}

cd -
vim build.sbt
name := "flink-streaming-examples"

version := "1.0"
scalaVersion := "2.10.4"

libraryDependencies ++= Seq("org.apache.flink" %% "flink-scala" % "1.0.0",
  "org.apache.flink" %% "flink-clients" % "1.0.0",
    "org.apache.flink" %% "flink-streaming-scala" % "1.0.0")

sbt clean package

stream some test data
vmstat 1 | nc -l 9000

now submit flink job
bin/flink run /root/flink-steaming-example/target/scala-2.10/flink-streaming-examples_2.10-1.0.jar 
03/20/2016 08:04:12 Job execution switched to status RUNNING.
03/20/2016 08:04:12 Source: Socket Stream -> Sink: Unnamed(1/1) switched to SCHEDULED 
03/20/2016 08:04:12 Source: Socket Stream -> Sink: Unnamed(1/1) switched to DEPLOYING 
03/20/2016 08:04:12 Source: Socket Stream -> Sink: Unnamed(1/1) switched to RUNNING 

let see output of the job
tailf flink-1.0.0/log/flink-root-jobmanager-0-ip-10-0-0-233.out -- will see the vmstat output