Saturday, August 8, 2015

Apache Drill : Creating Simple UDF

In this example I will demonstrate you how to create a simple Custom UDF in Apache Drill.
Env : Apache Drill 1.1.0
Java : Jdk-1.7_75

objective of the function is to get the row as an input and return the length of the string.in this example we are getting employee full name as input and returning the length of input employee name.

Step 1 : Create a simple java project using maven and add the drill-1.1.0 dependencies in the pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mapr.simplefn.test</groupId>
<artifactId>my-simple-drill-fn</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>my-simple-drill-fn</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-java-exec</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>attach-sources</id>
<phase>package</phase>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

Step 2: Create a java class with the name of EmpNameLength which implements DrillSimpleFunc interface.

package com.mapr.simplefn.test;


import javax.inject.Inject;

import io.netty.buffer.DrillBuf;

import org.apache.drill.exec.expr.DrillSimpleFunc;
import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
import org.apache.drill.exec.expr.holders.VarCharHolder;

@FunctionTemplate(name="empnlen",scope = FunctionTemplate.FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
public class EmpNameLength implements DrillSimpleFunc{

@Param
NullableVarCharHolder input;

@Output
VarCharHolder out;

@Inject
DrillBuf buffer;

@Override
public void setup() {
// TODO Auto-generated method stub

}

@Override
public void eval() {
String stringValue = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start, input.end, input.buffer);
int outValue = stringValue.length();
String outputValue = String.valueOf(outValue);
out.buffer = buffer;
out.start = 0;
out.end = outputValue.getBytes().length;
buffer.setBytes(0,outputValue.getBytes());

}

}

Step 3: Add a empty drill-override.conf in the resources folder of the project.
Step 4: run mvn package which will build the jar and created in the target folder,copy the jars to the on each node of the drill cluster on the location DRILL_HOME/jars/3rdparty/
Step 5: restart the Drillbits and run the query as follows

Expected Result:
0: jdbc:drill:zk=local> select empnlen(full_name) from cp.`employee.json` limit 5;
+---------+
| EXPR$0  |
+---------+
| 12      |
| 15      |
| 14      |
| 14      |
| 15      |
+---------+

No comments: