Added Kafka Data Store quickstart

Alex Tatusko · Alex Tatusko · commit 8c3435e7c9d7 · 2015-06-18T12:40:11.000-04:00
diff --git a/_config.yml b/_config.yml
@@ -4,8 +4,8 @@ description: Website for GeoMesa
 #url: "http://geomesa.org"
 url: "http://localhost:4000"
 repoUrl: "http://github.com/locationtech/geomesa"
-stableVersion: "1.1.0-rc.1"
-developmentVersion: "1.1.0-rc.2-SNAPSHOT"
+stableVersion: "1.1.0-rc.2"
+developmentVersion: "1.1.0-rc.3-SNAPSHOT"
 
 paginate: 10
 
@@ -34,6 +34,8 @@ authors:
         display_name: Michael Ronquest
     jake:
         display_name: Jake Kenneally
+    atatusko:
+        display_name: Alex Tatusko
     andrew-and-jake:
         display_name: Andrew Annex and Jake Kenneally
     chris-and-aannex:
diff --git a/_posts/2014-04-10-geomesa-quickstart.md b/_posts/2014-04-10-geomesa-quickstart.md
@@ -13,7 +13,7 @@ redirect_from:
 
 1. write custom Java code using GeoMesa to do the following:
     1.  create a custom ```FeatureType```
-    2.  prepare a GeoMesa-managed table to accept your new type
+    2.  prepare a GeoMesa-managed table backed by Accumulo to accept your new type
     3.  create a collection of new records
     4.  write these new records to the GeoMesa-managed table
     5.  query your data
@@ -27,33 +27,26 @@ redirect_from:
 
 #### Other prerequisites
 
-Before you begin, it is assumed that at this point you have successfully completed the [GeoMesa-Deployment](/geomesa-deployment/) tutorial.
-The deployment tutorial provides instructions for building and deploying GeoMesa to Accumulo and GeoServer, and is a prerequisite to the quickstart.  
+You must go through the [GeoMesa Deployment tutorial](http://geomesa.org/geomesa-deployment/) first, completing the tasks relevant to Accumulo.
 
-### DOWNLOAD AND BUILD THE TUTORIAL CODE
+Afterwards, it may be necessary to change the versions of Accumulo and Hadoop that the quickstart tutorial uses.
 
-Pick a reasonable directory on your machine, and run:
+The ```pom.xml``` file in the root geomesa directory contains an explicit list of dependent libraries that will be bundled together into the final tutorial. You should confirm that the versions of Accumulo and Hadoop match what you are running; if it does not match, change the value in the POM. (NB: The only reason these libraries are bundled into the final JAR is that this is easier for most people to do this than it is to set the classpath when running the tutorial. If you would rather not bundle these dependencies, mark them as provided in the POM, and update your classpath as appropriate.)
 
-```
-git clone https://github.com/geomesa/geomesa-quickstart.git
-```
-
-The ```pom.xml``` file contains an explicit list of dependent libraries that will be bundled together into the final tutorial. You should confirm that the versions of Accumulo and Hadoop match what you are running; if it does not match, change the value in the POM. (NB: The only reason these libraries are bundled into the final JAR is that this is easier for most people to do this than it is to set the classpath when running the tutorial. If you would rather not bundle these dependencies, mark them as provided in the POM, and update your classpath as appropriate.)
-
-From within the root of the cloned tutorial, run:
+Navigate to the directory where GeoMesa was installed and run:
 
 ```
-mvn clean install
+mvn clean install -f geomesa-examples/geomesa-accumulo-quickstart/pom.xml
 ```
 
-When this is complete, it should have built a JAR file that contains all of the code you need to run the tutorial.
+When this is complete, it should have built a JAR file that contains all of the code you need to run the tutorial with the correct dependencies.
 
 ### RUN THE TUTORIAL
 
 On the command-line, run:
 
 {% highlight bash %}
-java -cp ./target/geomesa-quickstart-1.0-SNAPSHOT.jar org.geomesa.QuickStart -instanceId somecloud -zookeepers "zoo1:2181,zoo2:2181,zoo3:2181" -user someuser -password somepwd -tableName sometable
+java -cp ./geomesa-examples/geomesa-accumulo-quickstart/target/geomesa-accumulo-quickstart-{{ site.stableVersion }}.jar org.locationtech.geomesa.examples.AccumuloQuickStart -instanceId somecloud -zookeepers "zoo1:2181,zoo2:2181,zoo3:2181" -user someuser -password somepwd -tableName sometable
 {% endhighlight %}
 
 where you provide your own values for the following place-holder arguments:
@@ -67,7 +60,7 @@ where you provide your own values for the following place-holder arguments:
 You should see output similar to the following (not including some of Maven's output and log4j's warnings):
 
 {% highlight bash %}
-Creating feature-type (schema):  QuickStart
+Creating feature-type (schema):  AccumuloQuickStart
 Creating new features
 Inserting new features
 Submitting query
diff --git a/_posts/2015-03-30-geomesa-feature-level-visibility.md b/_posts/2015-03-30-geomesa-feature-level-visibility.md
@@ -24,7 +24,7 @@ From the Accumulo user guide:
 
 In this tutorial, you'll be guided through ingesting data with varying levels of visibility and querying that data as different users through GeoServer.
 
-## Prerequistes
+## Prerequisites
 
 If you haven't already read through both the [GeoMesa Deployment Tutorial](/geomesa-deployment/) and the [Quickstart tutorial](/geomesa-quickstart/) and
 make sure you have gone through the initial setup of GeoMesa. We'll be using a customized version of the data generated by the GeoMesa Quickstart project.
diff --git a/_posts/2015-05-05-geomesa-deployment.md b/_posts/2015-05-05-geomesa-deployment.md
@@ -11,32 +11,33 @@ redirect_from:
 ### This tutorial will introduce how to:
 
 1. Install GeoMesa Command Line Tools
-2. Deploy the Distributed Runtime Jar to your Accumulo Cluster
-3. Deploy the GeoServer Plugin
+2. Deploy the Distributed Runtime Jar to your Accumulo Cluster.
+3. Deploy the GeoServer Plugin.
+4. Deploy necessary dependencies for GeoMesa's GeoServer plugin for Accumulo and/or Kafka.
 <!--more-->
 
 <div class="callout callout-warning">
     <span class="glyphicon glyphicon-exclamation-sign"></span>
-    You will need access to a Hadoop 2.2 installation as well as an Accumulo 1.5.x database.
+    For Accumulo deployment, you will need access to a Hadoop 2.2 installation as well as an Accumulo 1.5.x database.
 </div>
 
 #### Other prerequisites
 
 Before you begin, you should also have these:
 
-* basic knowledge of GeoTools, GeoServer, and Accumulo
+* basic knowledge of [GeoTools](http://www.geotools.org), [GeoServer](http://geoserver.org), [Accumulo](http://accumulo.apache.org), and/or [Kafka](http://kafka.apache.org)
 * an Accumulo user that has both create-table and write permissions
 * a Java 1.7 or higher runtime 
 
 ### DOWNLOAD GEOMESA
 
-GeoMesa artifacts are available for download or can be build from source. The easiest way to get started is to [download the most recent stable version ({{ site.stableVersion }})](http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-assemble/{{ site.stableVersion }}/geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz) and untar it somewhere convenient:
+GeoMesa artifacts are available for download or can be built from source. The easiest way to get started is to [download the most recent stable version ({{ site.stableVersion }})](http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-assemble/{{ site.stableVersion }}/geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz) and untar it somewhere convenient:
 
 {% highlight bash %}
-# cd to a directory convenient for installing geomesa 
+# cd to a convenient directory for installing geomesa 
 $ cd ~/tools
 
-# download and unpackge the most recent distribution
+# download and unpackage the most recent distribution
 $ wget http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-assemble/{{ site.stableVersion }}/geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz
 $ tar xvf geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz
 $ cd geomesa-{{ site.stableVersion }}
@@ -119,7 +120,7 @@ You should have an instance of GeoServer, version 2.5.2, running somewhere that
 
 In addition to our GeoServer plugin, you will also need to install the WPS plugin to your GeoServer instance. The [WPS Plugin](http://docs.geoserver.org/stable/en/user/extensions/wps/install.html) must also match the version of GeoServer instance.
 
-Copy the the `geomesa-plugin-{{ site.stableVersion }}-geoserver-plugin.jar` jar file from the GeoMesa dist directory into your GeoServer's library directory.
+Copy the `geomesa-plugin-accumulo1.5-{{ site.stableVersion }}-geoserver-plugin.jar` jar file from the GeoMesa dist directory into your GeoServer's library directory.
 
 If you are using tomcat:
 
@@ -165,11 +166,46 @@ There are also GeoServer JARs that need to be updated for Accumulo (also in the
 * commons-configuration: Accumulo requires commons-configuration 1.6 and previous versions should be replaced [[download]](https://search.maven.org/remotecontent?filepath=commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar)
 * commons-lang: GeoServer ships with commons-lang 2.1, but Accumulo requires replacing that with version 2.4 [[download]](https://search.maven.org/remotecontent?filepath=commons-lang/commons-lang/2.4/commons-lang-2.4.jar)
 
-Once all of the dependencies for the GeoServer plugin are in place you will need to restart GeoServer for the changes to take effect.
+Once all of the dependencies for the GeoServer plugin are in place you will need to restart GeoServer for the changes to take effect. 
 
 #### Verify Deployment
 
-To verify that the deployment worked you can follow the [Quickstart tutorial](/geomesa-quickstart/) to ingest test data and view the data in GeoServer.  
+To verify that the deployment worked you can follow the [GeoMesa Quick Start tutorial](/geomesa-quickstart/) to ingest test data and view the data in GeoServer.  
+
+### KAFKA DEPLOYMENT
+
+Getting GeoMesa set up with Kafka is a bit easier than with Accumulo (see the [Kafka Quickstart tutorial](/geomesa-kafka-quickstart/) to see what GeoMesa can do with Kafka). First build GeoMesa. GeoMesa's capabilities using Kafka were recently added features so be sure to build the latest branch.
+
+{% highlight bash %}
+git clone https://github.com/locationtech/geomesa/ && cd geomesa && mvn clean install -DskipTests
+{% endhighlight %}
+
+Copy the GeoMesa Kafka plugin jar files from the GeoMesa directory you built into your GeoServer's library directory.
+
+Tomcat:
+{% highlight bash %}
+cp geomesa/geomesa-kafka/geomesa-kafka-geoserver-plugin/target/geomesa-kafka-geoserver-plugin-{{ site.developmentVersion }}-geoserver-plugin.jar /path/to/tomcat/webapps/geoserver/WEB-INF/lib/
+{% endhighlight %}
+
+Jetty:
+
+{% highlight bash %}
+cp geomesa/geomesa-kafka/geomesa-kafka-geoserver-plugin/target/geomesa-kafka* ~/dev/geoserver-2.5.2/webapps/geoserver/WEB-INF/lib/
+{% endhighlight %}
+
+Then copy these dependencies to your `WEB-INF/lib` directory.
+
+* Kafka
+    * kafka-clients-0.8.2.1.jar
+    * kafka_2.10-0.8.2.1.jar
+    * metrics-core-2.2.0.jar
+    * zkclient-0.3.jar
+* Zookeeper
+    * zookeeper-3.4.5.jar
+
+Note: when using the Kafka Data Store with GeoServer in Tomcat it will most likely be necessary to increase the memory settings for Tomcat, `export CATALINA_OPTS="-Xms512M -Xmx1024M -XX:PermSize=256m -XX:MaxPermSize=256m"`.
+
+After placing the dependencies in the correct folder, be sure to restart GeoServer for changes to take place.
 
 ### Configuring Geoserver
 Depending on your hardware, it may be important to set the limits for your WMS plugin to be higher or disable them completely by clicking "WMS" under "Services" on the left side of the admin page of Geoserver. Check with your server administrator to determine the correct settings. For massive queries, the standard 60 second timeout may be too short.
diff --git a/_posts/2015-06-09-geomesa-kafka-quickstart.md b/_posts/2015-06-09-geomesa-kafka-quickstart.md
@@ -0,0 +1,128 @@
+---
+title: GeoMesa Kafka Quick Start
+author: atatusko
+layout: tutorial
+redirect_from:
+    - /2015/06/09/geomesa-kafka-quickstart/
+---
+
+{% include tutorial-header.html %}
+
+## Background
+
+[Apache Kafka](http://kafka.apache.org/) is "publish-subscribe messaging rethought as a distributed commit log." In the context of GeoMesa, Kafka is a useful tool for working with streams of geospatial data. Interaction with Kafka in GeoMesa occurs with the KafkaDataStore which implements the [GeoTools DataStore interface](http://docs.geotools.org/latest/userguide/library/data/datastore.html).
+
+This quickstart tutorial is bundled as a Java program which will introduce how to produce and consume messages in Kafka using GeoMesa. The tutorial will also show how to query the data and replay the messages in a Kafka topic to achieve an earlier state. The tutorial uses GeoServer as a quick way to visualize the changes being made in Kafka.
+
+## Prerequisites
+
+* basic knowledge of [GeoTools](http://www.geotools.org), [GeoServer](http://geoser!!ver.org), and Kafka
+* access to a Kafka 0.8.2.x server with an appropriate Zookeeper instance(s)
+* access to GeoServer version 2.5.2
+* a local copy of the Java Development Kit 1.7.x
+* Apache Maven installed
+* a GitHub client installed
+
+## Setup
+
+You must go through the [GeoMesa Deployment tutorial](http://geomesa.org/geomesa-deployment/) first, completing the tasks relevant to Kafka. The deployment tutorial prioritizes Accumulo deployment so scroll down for the Kafka deployment section.
+
+## Run the code
+
+Ensure your Kafka and Zookeeper instances are running. You can use [Kafka's quickstart](http://kafka.apache.org/documentation.html#quickstart) to get Kafka/Zookeeper instances up and running quickly.
+
+Navigate to your geomesa directory. On the command-line run the quickstart program:
+
+{% highlight bash %}
+java -cp ./geomesa-examples/geomesa-kafka-quickstart/target/geomesa-kafka-quickstart-{{ site.stableVersion }}.jar org.locationtech.geomesa.examples.KafkaQuickStart -brokers "localhost:9092" -zookeepers "localhost:2181"
+{% endhighlight %}
+
+where you provide your own values for the following arguments:
+
+* ```brokers```: your Kafka broker instances, seperated by commas.
+* ```zookeepers```: your Zookeeper nodes, separated by commas.
+* ```zkPath```: Zookeeper's path where metadata is stored. Defaults to /geomesa/ds/kafka.
+
+The program will create some metadata in zookeeper and an associated topic in your Kafka instance and pause execution to let you add the newly created KafkaDataStore to GeoServer.
+
+## Visualize with GeoServer
+
+#### Register the GeoMesa store with GeoServer
+
+Log into GeoServer using your user and password credentials. Click “Stores” in the left-hand gutter and “Add new Store”. If you do not see the Kafka Data Store listed under Vector Data Sources, ensure the plugin and dependencies are in the right directory and restart GeoServer.
+
+Select the `Kafka Data Store` vector data source and enter the following parameters:
+
+* basic store info
+    * `workspace`:  this is dependent upon your GeoServer installation
+    * `data source name`:  pick a sensible name, such as, `geomesa_kafka_quickstart`
+    * `description`:  this is strictly decorative; `GeoMesa Kafka quick start`
+* connection parameters:  these are the same parameter values that you supplied on the command-line when you ran the tutorial; they describe how to connect to the Kafka instance where your data resides
+
+Note: If you left out the zkPath command line argument when running the quickstart program, you can leave the zkPath connection parameter in GeoServer empty. 
+
+!["Inputting parameters into geoserver"](/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore1.png)
+
+Click "Save" and GeoServer will search your Kafka instance for any GeoMesa-managed feature types.
+
+#### Publish the layer
+
+GeoServer should recognize the `KafkaQuickStart` feature type that should be presented as a layer that can be published. Click on the "Publish" link. You will be taken to the Edit Layer screen. 
+
+In the Data pane, you'll need to enter values for the bounding boxes. In this case, you can click on the links to compute these values from the data. Click "Save".
+
+#### View the layer
+
+Click on the "Layer Preview" link in the left-hand gutter.  If you don't see the quick-start layer on the first page of results, enter the name of the layer you just created into the search box, and press &lt;Enter&gt;.
+ 
+Once you see your layer, click on the "OpenLayers" link, which will open a new tab. At this point, there are no messages in Kafka so nothing will be shown.
+
+## Produce some SimpleFeatures
+
+Resume the program's execution by inputting &lt;Enter&gt; in the terminal now that the KafkaDataStore is registered in GeoServer. The program will create two SimpleFeatures and additionally write a stream of updates to the two SimpleFeatures over the course of about a minute. 
+
+You should refresh the GeoServer page repeatedly to visualize the updates being written to Kafka.
+
+#### What's happening in GeoServer
+
+The layer preview of GeoServer uses the LiveKafkaConsumerFeatureSource to show a real time view of the current state of the data stream. Two SimpleFeatures are being updated over time in Kafka which is reflected in the GeoServer display. 
+
+You should see two SimpleFeatures that start on the left side gradually move to the right side while crossing each other in the middle, as the page is refreshed. As the two SimpleFeatures get updated, the older SimpleFeatures disappear from the display.
+
+!["GeoServer view"](/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore2.png)
+
+#### Consumers explained
+
+GeoMesa's wraps Kafka consumers in two different ways; as a LiveKafkaConsumerFeatureSource or ReplayKafkaConsumerFeatureSource (which implement GeoTools' [FeatureSource](http://docs.geotools.org/latest/javadocs/org/geotools/data/FeatureSource.html) API).
+
+The LiveKafkaConsumerFeatureSource will consume messages as they are being produced and maintain the real time state of SimpleFeatures pertaining to a Kafka topic. 
+
+The ReplayKafkaConsumerFeatureSource allows users to specify any range of time in order to obtain the state of SimpleFeatures from any previous moment.
+
+## View the consumer output
+
+The program will construct the live and replay consumers and log SimpleFeatures to the console after all the messages are sent to Kafka and therefore after all the updates are made.
+
+The live consumer will log the state of the two SimpleFeatures after all updates are finished. The replay consumer will log the state of the two SimpleFeatures five seconds earlier than the last update. The replay consumer will create a new SimpleFeatureType with an additional attribute `KafkaLogTime`. By preserving the `KafkaLogTime` as an attribute, we can create the state of SimpleFeatures at time *x* by querying for when `KafkaLogTime` equals *x*.
+
+{% highlight bash %}
+Consuming with the live consumer...
+2 features were written to Kafka
+Here are the two SimpleFeatures that were obtained with the live consumer:
+fid:1 | name:James | age:20 | dtg:Mon Dec 14 19:08:23 EST 2015 | geom:POINT (180 90)
+fid:2 | name:John | age:62 | dtg:Fri Oct 02 09:56:49 EDT 2015 | geom:POINT (180 -90)
+
+Consuming with the replay consumer...
+2 features were written to Kafka
+Here are the two SimpleFeatures that were obtained with the replay consumer:
+fid:2 | name:John | age:52 | dtg:Thu May 21 21:27:19 EDT 2015 | geom:POINT (132 -66) | KafkaLogTime:Tue Jun 09 13:33:47 EDT 2015
+fid:1 | name:James | age:59 | dtg:Sat Jan 24 06:26:44 EST 2015 | geom:POINT (132 66) | KafkaLogTime:Tue Jun 09 13:33:47 EDT 2015
+{% endhighlight %}
+
+## Conclusion
+
+Since the source code for this quickstart is accessible, it is advised to follow along in the code to get a deeper understanding of what's really going on.
+
+Given a stream of geospatial data, GeoMesa's integration with Kafka enables users to maintain a real time state of SimpleFeatures or retrieve any arbitrary state preserved in history. One can additionally process and analyze streams of data by integrating a data processing system like [Storm](https://storm.apache.org/) or [Samza](http://samza.apache.org).
+
+For additional information about the KafkaDataStore, see the [readme](https://github.com/locationtech/geomesa/blob/master/geomesa-kafka/geomesa-kafka-datastore/README.md) on github.
diff --git a/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore1.png b/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore1.png
diff --git a/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore2.png b/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore2.png