Skip to content

Commit 8c3435e

Browse files
author
Alex Tatusko
committed
Added Kafka Data Store quickstart
1 parent c59de99 commit 8c3435e

File tree

7 files changed

+188
-29
lines changed

7 files changed

+188
-29
lines changed

_config.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ description: Website for GeoMesa
44
#url: "http://geomesa.org"
55
url: "http://localhost:4000"
66
repoUrl: "http://github.com/locationtech/geomesa"
7-
stableVersion: "1.1.0-rc.1"
8-
developmentVersion: "1.1.0-rc.2-SNAPSHOT"
7+
stableVersion: "1.1.0-rc.2"
8+
developmentVersion: "1.1.0-rc.3-SNAPSHOT"
99

1010
paginate: 10
1111

@@ -34,6 +34,8 @@ authors:
3434
display_name: Michael Ronquest
3535
jake:
3636
display_name: Jake Kenneally
37+
atatusko:
38+
display_name: Alex Tatusko
3739
andrew-and-jake:
3840
display_name: Andrew Annex and Jake Kenneally
3941
chris-and-aannex:

_posts/2014-04-10-geomesa-quickstart.md

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ redirect_from:
1313

1414
1. write custom Java code using GeoMesa to do the following:
1515
1. create a custom ```FeatureType```
16-
2. prepare a GeoMesa-managed table to accept your new type
16+
2. prepare a GeoMesa-managed table backed by Accumulo to accept your new type
1717
3. create a collection of new records
1818
4. write these new records to the GeoMesa-managed table
1919
5. query your data
@@ -27,33 +27,26 @@ redirect_from:
2727

2828
#### Other prerequisites
2929

30-
Before you begin, it is assumed that at this point you have successfully completed the [GeoMesa-Deployment](/geomesa-deployment/) tutorial.
31-
The deployment tutorial provides instructions for building and deploying GeoMesa to Accumulo and GeoServer, and is a prerequisite to the quickstart.
30+
You must go through the [GeoMesa Deployment tutorial](http://geomesa.org/geomesa-deployment/) first, completing the tasks relevant to Accumulo.
3231

33-
### DOWNLOAD AND BUILD THE TUTORIAL CODE
32+
Afterwards, it may be necessary to change the versions of Accumulo and Hadoop that the quickstart tutorial uses.
3433

35-
Pick a reasonable directory on your machine, and run:
34+
The ```pom.xml``` file in the root geomesa directory contains an explicit list of dependent libraries that will be bundled together into the final tutorial. You should confirm that the versions of Accumulo and Hadoop match what you are running; if it does not match, change the value in the POM. (NB: The only reason these libraries are bundled into the final JAR is that this is easier for most people to do this than it is to set the classpath when running the tutorial. If you would rather not bundle these dependencies, mark them as provided in the POM, and update your classpath as appropriate.)
3635

37-
```
38-
git clone https://github.com/geomesa/geomesa-quickstart.git
39-
```
40-
41-
The ```pom.xml``` file contains an explicit list of dependent libraries that will be bundled together into the final tutorial. You should confirm that the versions of Accumulo and Hadoop match what you are running; if it does not match, change the value in the POM. (NB: The only reason these libraries are bundled into the final JAR is that this is easier for most people to do this than it is to set the classpath when running the tutorial. If you would rather not bundle these dependencies, mark them as provided in the POM, and update your classpath as appropriate.)
42-
43-
From within the root of the cloned tutorial, run:
36+
Navigate to the directory where GeoMesa was installed and run:
4437

4538
```
46-
mvn clean install
39+
mvn clean install -f geomesa-examples/geomesa-accumulo-quickstart/pom.xml
4740
```
4841

49-
When this is complete, it should have built a JAR file that contains all of the code you need to run the tutorial.
42+
When this is complete, it should have built a JAR file that contains all of the code you need to run the tutorial with the correct dependencies.
5043

5144
### RUN THE TUTORIAL
5245

5346
On the command-line, run:
5447

5548
{% highlight bash %}
56-
java -cp ./target/geomesa-quickstart-1.0-SNAPSHOT.jar org.geomesa.QuickStart -instanceId somecloud -zookeepers "zoo1:2181,zoo2:2181,zoo3:2181" -user someuser -password somepwd -tableName sometable
49+
java -cp ./geomesa-examples/geomesa-accumulo-quickstart/target/geomesa-accumulo-quickstart-{{ site.stableVersion }}.jar org.locationtech.geomesa.examples.AccumuloQuickStart -instanceId somecloud -zookeepers "zoo1:2181,zoo2:2181,zoo3:2181" -user someuser -password somepwd -tableName sometable
5750
{% endhighlight %}
5851

5952
where you provide your own values for the following place-holder arguments:
@@ -67,7 +60,7 @@ where you provide your own values for the following place-holder arguments:
6760
You should see output similar to the following (not including some of Maven's output and log4j's warnings):
6861

6962
{% highlight bash %}
70-
Creating feature-type (schema): QuickStart
63+
Creating feature-type (schema): AccumuloQuickStart
7164
Creating new features
7265
Inserting new features
7366
Submitting query

_posts/2015-03-30-geomesa-feature-level-visibility.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ From the Accumulo user guide:
2424
2525
In this tutorial, you'll be guided through ingesting data with varying levels of visibility and querying that data as different users through GeoServer.
2626

27-
## Prerequistes
27+
## Prerequisites
2828

2929
If you haven't already read through both the [GeoMesa Deployment Tutorial](/geomesa-deployment/) and the [Quickstart tutorial](/geomesa-quickstart/) and
3030
make sure you have gone through the initial setup of GeoMesa. We'll be using a customized version of the data generated by the GeoMesa Quickstart project.

_posts/2015-05-05-geomesa-deployment.md

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,32 +11,33 @@ redirect_from:
1111
### This tutorial will introduce how to:
1212

1313
1. Install GeoMesa Command Line Tools
14-
2. Deploy the Distributed Runtime Jar to your Accumulo Cluster
15-
3. Deploy the GeoServer Plugin
14+
2. Deploy the Distributed Runtime Jar to your Accumulo Cluster.
15+
3. Deploy the GeoServer Plugin.
16+
4. Deploy necessary dependencies for GeoMesa's GeoServer plugin for Accumulo and/or Kafka.
1617
<!--more-->
1718

1819
<div class="callout callout-warning">
1920
<span class="glyphicon glyphicon-exclamation-sign"></span>
20-
You will need access to a Hadoop 2.2 installation as well as an Accumulo 1.5.x database.
21+
For Accumulo deployment, you will need access to a Hadoop 2.2 installation as well as an Accumulo 1.5.x database.
2122
</div>
2223

2324
#### Other prerequisites
2425

2526
Before you begin, you should also have these:
2627

27-
* basic knowledge of GeoTools, GeoServer, and Accumulo
28+
* basic knowledge of [GeoTools](http://www.geotools.org), [GeoServer](http://geoserver.org), [Accumulo](http://accumulo.apache.org), and/or [Kafka](http://kafka.apache.org)
2829
* an Accumulo user that has both create-table and write permissions
2930
* a Java 1.7 or higher runtime
3031

3132
### DOWNLOAD GEOMESA
3233

33-
GeoMesa artifacts are available for download or can be build from source. The easiest way to get started is to [download the most recent stable version ({{ site.stableVersion }})](http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-assemble/{{ site.stableVersion }}/geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz) and untar it somewhere convenient:
34+
GeoMesa artifacts are available for download or can be built from source. The easiest way to get started is to [download the most recent stable version ({{ site.stableVersion }})](http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-assemble/{{ site.stableVersion }}/geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz) and untar it somewhere convenient:
3435

3536
{% highlight bash %}
36-
# cd to a directory convenient for installing geomesa
37+
# cd to a convenient directory for installing geomesa
3738
$ cd ~/tools
3839

39-
# download and unpackge the most recent distribution
40+
# download and unpackage the most recent distribution
4041
$ wget http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-assemble/{{ site.stableVersion }}/geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz
4142
$ tar xvf geomesa-assemble-{{ site.stableVersion }}-bin.tar.gz
4243
$ cd geomesa-{{ site.stableVersion }}
@@ -119,7 +120,7 @@ You should have an instance of GeoServer, version 2.5.2, running somewhere that
119120

120121
In addition to our GeoServer plugin, you will also need to install the WPS plugin to your GeoServer instance. The [WPS Plugin](http://docs.geoserver.org/stable/en/user/extensions/wps/install.html) must also match the version of GeoServer instance.
121122

122-
Copy the the `geomesa-plugin-{{ site.stableVersion }}-geoserver-plugin.jar` jar file from the GeoMesa dist directory into your GeoServer's library directory.
123+
Copy the `geomesa-plugin-accumulo1.5-{{ site.stableVersion }}-geoserver-plugin.jar` jar file from the GeoMesa dist directory into your GeoServer's library directory.
123124

124125
If you are using tomcat:
125126

@@ -165,11 +166,46 @@ There are also GeoServer JARs that need to be updated for Accumulo (also in the
165166
* commons-configuration: Accumulo requires commons-configuration 1.6 and previous versions should be replaced [[download]](https://search.maven.org/remotecontent?filepath=commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar)
166167
* commons-lang: GeoServer ships with commons-lang 2.1, but Accumulo requires replacing that with version 2.4 [[download]](https://search.maven.org/remotecontent?filepath=commons-lang/commons-lang/2.4/commons-lang-2.4.jar)
167168

168-
Once all of the dependencies for the GeoServer plugin are in place you will need to restart GeoServer for the changes to take effect.
169+
Once all of the dependencies for the GeoServer plugin are in place you will need to restart GeoServer for the changes to take effect.
169170

170171
#### Verify Deployment
171172

172-
To verify that the deployment worked you can follow the [Quickstart tutorial](/geomesa-quickstart/) to ingest test data and view the data in GeoServer.
173+
To verify that the deployment worked you can follow the [GeoMesa Quick Start tutorial](/geomesa-quickstart/) to ingest test data and view the data in GeoServer.
174+
175+
### KAFKA DEPLOYMENT
176+
177+
Getting GeoMesa set up with Kafka is a bit easier than with Accumulo (see the [Kafka Quickstart tutorial](/geomesa-kafka-quickstart/) to see what GeoMesa can do with Kafka). First build GeoMesa. GeoMesa's capabilities using Kafka were recently added features so be sure to build the latest branch.
178+
179+
{% highlight bash %}
180+
git clone https://github.com/locationtech/geomesa/ && cd geomesa && mvn clean install -DskipTests
181+
{% endhighlight %}
182+
183+
Copy the GeoMesa Kafka plugin jar files from the GeoMesa directory you built into your GeoServer's library directory.
184+
185+
Tomcat:
186+
{% highlight bash %}
187+
cp geomesa/geomesa-kafka/geomesa-kafka-geoserver-plugin/target/geomesa-kafka-geoserver-plugin-{{ site.developmentVersion }}-geoserver-plugin.jar /path/to/tomcat/webapps/geoserver/WEB-INF/lib/
188+
{% endhighlight %}
189+
190+
Jetty:
191+
192+
{% highlight bash %}
193+
cp geomesa/geomesa-kafka/geomesa-kafka-geoserver-plugin/target/geomesa-kafka* ~/dev/geoserver-2.5.2/webapps/geoserver/WEB-INF/lib/
194+
{% endhighlight %}
195+
196+
Then copy these dependencies to your `WEB-INF/lib` directory.
197+
198+
* Kafka
199+
* kafka-clients-0.8.2.1.jar
200+
* kafka_2.10-0.8.2.1.jar
201+
* metrics-core-2.2.0.jar
202+
* zkclient-0.3.jar
203+
* Zookeeper
204+
* zookeeper-3.4.5.jar
205+
206+
Note: when using the Kafka Data Store with GeoServer in Tomcat it will most likely be necessary to increase the memory settings for Tomcat, `export CATALINA_OPTS="-Xms512M -Xmx1024M -XX:PermSize=256m -XX:MaxPermSize=256m"`.
207+
208+
After placing the dependencies in the correct folder, be sure to restart GeoServer for changes to take place.
173209

174210
### Configuring Geoserver
175211
Depending on your hardware, it may be important to set the limits for your WMS plugin to be higher or disable them completely by clicking "WMS" under "Services" on the left side of the admin page of Geoserver. Check with your server administrator to determine the correct settings. For massive queries, the standard 60 second timeout may be too short.
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
title: GeoMesa Kafka Quick Start
3+
author: atatusko
4+
layout: tutorial
5+
redirect_from:
6+
- /2015/06/09/geomesa-kafka-quickstart/
7+
---
8+
9+
{% include tutorial-header.html %}
10+
11+
## Background
12+
13+
[Apache Kafka](http://kafka.apache.org/) is "publish-subscribe messaging rethought as a distributed commit log." In the context of GeoMesa, Kafka is a useful tool for working with streams of geospatial data. Interaction with Kafka in GeoMesa occurs with the KafkaDataStore which implements the [GeoTools DataStore interface](http://docs.geotools.org/latest/userguide/library/data/datastore.html).
14+
15+
This quickstart tutorial is bundled as a Java program which will introduce how to produce and consume messages in Kafka using GeoMesa. The tutorial will also show how to query the data and replay the messages in a Kafka topic to achieve an earlier state. The tutorial uses GeoServer as a quick way to visualize the changes being made in Kafka.
16+
17+
## Prerequisites
18+
19+
* basic knowledge of [GeoTools](http://www.geotools.org), [GeoServer](http://geoser!!ver.org), and Kafka
20+
* access to a Kafka 0.8.2.x server with an appropriate Zookeeper instance(s)
21+
* access to GeoServer version 2.5.2
22+
* a local copy of the Java Development Kit 1.7.x
23+
* Apache Maven installed
24+
* a GitHub client installed
25+
26+
## Setup
27+
28+
You must go through the [GeoMesa Deployment tutorial](http://geomesa.org/geomesa-deployment/) first, completing the tasks relevant to Kafka. The deployment tutorial prioritizes Accumulo deployment so scroll down for the Kafka deployment section.
29+
30+
## Run the code
31+
32+
Ensure your Kafka and Zookeeper instances are running. You can use [Kafka's quickstart](http://kafka.apache.org/documentation.html#quickstart) to get Kafka/Zookeeper instances up and running quickly.
33+
34+
Navigate to your geomesa directory. On the command-line run the quickstart program:
35+
36+
{% highlight bash %}
37+
java -cp ./geomesa-examples/geomesa-kafka-quickstart/target/geomesa-kafka-quickstart-{{ site.stableVersion }}.jar org.locationtech.geomesa.examples.KafkaQuickStart -brokers "localhost:9092" -zookeepers "localhost:2181"
38+
{% endhighlight %}
39+
40+
where you provide your own values for the following arguments:
41+
42+
* ```brokers```: your Kafka broker instances, seperated by commas.
43+
* ```zookeepers```: your Zookeeper nodes, separated by commas.
44+
* ```zkPath```: Zookeeper's path where metadata is stored. Defaults to /geomesa/ds/kafka.
45+
46+
The program will create some metadata in zookeeper and an associated topic in your Kafka instance and pause execution to let you add the newly created KafkaDataStore to GeoServer.
47+
48+
## Visualize with GeoServer
49+
50+
#### Register the GeoMesa store with GeoServer
51+
52+
Log into GeoServer using your user and password credentials. Click “Stores” in the left-hand gutter and “Add new Store”. If you do not see the Kafka Data Store listed under Vector Data Sources, ensure the plugin and dependencies are in the right directory and restart GeoServer.
53+
54+
Select the `Kafka Data Store` vector data source and enter the following parameters:
55+
56+
* basic store info
57+
* `workspace`: this is dependent upon your GeoServer installation
58+
* `data source name`: pick a sensible name, such as, `geomesa_kafka_quickstart`
59+
* `description`: this is strictly decorative; `GeoMesa Kafka quick start`
60+
* connection parameters: these are the same parameter values that you supplied on the command-line when you ran the tutorial; they describe how to connect to the Kafka instance where your data resides
61+
62+
Note: If you left out the zkPath command line argument when running the quickstart program, you can leave the zkPath connection parameter in GeoServer empty.
63+
64+
!["Inputting parameters into geoserver"](/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore1.png)
65+
66+
Click "Save" and GeoServer will search your Kafka instance for any GeoMesa-managed feature types.
67+
68+
#### Publish the layer
69+
70+
GeoServer should recognize the `KafkaQuickStart` feature type that should be presented as a layer that can be published. Click on the "Publish" link. You will be taken to the Edit Layer screen.
71+
72+
In the Data pane, you'll need to enter values for the bounding boxes. In this case, you can click on the links to compute these values from the data. Click "Save".
73+
74+
#### View the layer
75+
76+
Click on the "Layer Preview" link in the left-hand gutter. If you don't see the quick-start layer on the first page of results, enter the name of the layer you just created into the search box, and press &lt;Enter&gt;.
77+
78+
Once you see your layer, click on the "OpenLayers" link, which will open a new tab. At this point, there are no messages in Kafka so nothing will be shown.
79+
80+
## Produce some SimpleFeatures
81+
82+
Resume the program's execution by inputting &lt;Enter&gt; in the terminal now that the KafkaDataStore is registered in GeoServer. The program will create two SimpleFeatures and additionally write a stream of updates to the two SimpleFeatures over the course of about a minute.
83+
84+
You should refresh the GeoServer page repeatedly to visualize the updates being written to Kafka.
85+
86+
#### What's happening in GeoServer
87+
88+
The layer preview of GeoServer uses the LiveKafkaConsumerFeatureSource to show a real time view of the current state of the data stream. Two SimpleFeatures are being updated over time in Kafka which is reflected in the GeoServer display.
89+
90+
You should see two SimpleFeatures that start on the left side gradually move to the right side while crossing each other in the middle, as the page is refreshed. As the two SimpleFeatures get updated, the older SimpleFeatures disappear from the display.
91+
92+
!["GeoServer view"](/img/tutorials/2015-06-09-geomesa-kafka-quickstart/kafkadatastore2.png)
93+
94+
#### Consumers explained
95+
96+
GeoMesa's wraps Kafka consumers in two different ways; as a LiveKafkaConsumerFeatureSource or ReplayKafkaConsumerFeatureSource (which implement GeoTools' [FeatureSource](http://docs.geotools.org/latest/javadocs/org/geotools/data/FeatureSource.html) API).
97+
98+
The LiveKafkaConsumerFeatureSource will consume messages as they are being produced and maintain the real time state of SimpleFeatures pertaining to a Kafka topic.
99+
100+
The ReplayKafkaConsumerFeatureSource allows users to specify any range of time in order to obtain the state of SimpleFeatures from any previous moment.
101+
102+
## View the consumer output
103+
104+
The program will construct the live and replay consumers and log SimpleFeatures to the console after all the messages are sent to Kafka and therefore after all the updates are made.
105+
106+
The live consumer will log the state of the two SimpleFeatures after all updates are finished. The replay consumer will log the state of the two SimpleFeatures five seconds earlier than the last update. The replay consumer will create a new SimpleFeatureType with an additional attribute `KafkaLogTime`. By preserving the `KafkaLogTime` as an attribute, we can create the state of SimpleFeatures at time *x* by querying for when `KafkaLogTime` equals *x*.
107+
108+
{% highlight bash %}
109+
Consuming with the live consumer...
110+
2 features were written to Kafka
111+
Here are the two SimpleFeatures that were obtained with the live consumer:
112+
fid:1 | name:James | age:20 | dtg:Mon Dec 14 19:08:23 EST 2015 | geom:POINT (180 90)
113+
fid:2 | name:John | age:62 | dtg:Fri Oct 02 09:56:49 EDT 2015 | geom:POINT (180 -90)
114+
115+
Consuming with the replay consumer...
116+
2 features were written to Kafka
117+
Here are the two SimpleFeatures that were obtained with the replay consumer:
118+
fid:2 | name:John | age:52 | dtg:Thu May 21 21:27:19 EDT 2015 | geom:POINT (132 -66) | KafkaLogTime:Tue Jun 09 13:33:47 EDT 2015
119+
fid:1 | name:James | age:59 | dtg:Sat Jan 24 06:26:44 EST 2015 | geom:POINT (132 66) | KafkaLogTime:Tue Jun 09 13:33:47 EDT 2015
120+
{% endhighlight %}
121+
122+
## Conclusion
123+
124+
Since the source code for this quickstart is accessible, it is advised to follow along in the code to get a deeper understanding of what's really going on.
125+
126+
Given a stream of geospatial data, GeoMesa's integration with Kafka enables users to maintain a real time state of SimpleFeatures or retrieve any arbitrary state preserved in history. One can additionally process and analyze streams of data by integrating a data processing system like [Storm](https://storm.apache.org/) or [Samza](http://samza.apache.org).
127+
128+
For additional information about the KafkaDataStore, see the [readme](https://github.com/locationtech/geomesa/blob/master/geomesa-kafka/geomesa-kafka-datastore/README.md) on github.
Loading
Loading

0 commit comments

Comments
 (0)