Apache Flink is a streaming data flow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. this is a sample application to consume output of vmstat command as a stream, so lets get hands dirty
mkdir flink-steaming-example cd flink-steaming-example/ mkdir -p src/main/scala cd src/main/scala/ vim FlinkStreamingExample.scala import org.apache.flink.streaming.api.scala._ object FlinkStreamingExample{ def main(args:Array[String]){ val env = StreamExecutionEnvironment.getExecutionEnvironment val socketVmStatStream = env.socketTextStream("ip-10-0-0-233",9000); socketVmStatStream.print env.execute() } } cd - vim build.sbt name := "flink-streaming-examples" version := "1.0" scalaVersion := "2.10.4" libraryDependencies ++= Seq("org.apache.flink" %% "flink-scala" % "1.0.0", "org.apache.flink" %% "flink-clients" % "1.0.0", "org.apache.flink" %% "flink-streaming-scala" % "1.0.0") sbt clean package stream some test data vmstat 1 | nc -l 9000 now submit flink job bin/flink run /root/flink-steaming-example/target/scala-2.10/flink-streaming-examples_2.10-1.0.jar 03/20/2016 08:04:12 Job execution switched to status RUNNING. 03/20/2016 08:04:12 Source: Socket Stream -> Sink: Unnamed(1/1) switched to SCHEDULED 03/20/2016 08:04:12 Source: Socket Stream -> Sink: Unnamed(1/1) switched to DEPLOYING 03/20/2016 08:04:12 Source: Socket Stream -> Sink: Unnamed(1/1) switched to RUNNING let see output of the job tailf flink-1.0.0/log/flink-root-jobmanager-0-ip-10-0-0-233.out -- will see the vmstat output