Skip to content

bytefish/FlinkExperiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 

Repository files navigation

FlinkExperiments

Project

This project is a sample project for Apache Flink. The application parses the Quality Controlled Local Climatological Data (QCLCD) of March 2015, calculates the maximum daily temperature of the stream by using Apache Flink and writes the results back into an Elasticsearch and PostgreSQL database.

Dataset

The data is the Quality Controlled Local Climatological Data (QCLCD):

Quality Controlled Local Climatological Data (QCLCD) consist of hourly, daily, and monthly summaries for approximately 1,600 U.S. locations. Daily Summary forms are not available for all stations. Data are available beginning January 1, 2005 and continue to the present. Please note, there may be a 48-hour lag in the availability of the most recent data.

The data is available at:

Result

Kibana Visualization of the Average Temperature

Running the Examples

The records in the Quality Controlled Local Climatological Data (QCLCD) dataset are not sorted by the timestamp. The dataset needs to be prepared first, so that all records are sorted ascending by the time of measurement.

I have written a small application, that sorts the original CSV data by the measurement time:

The result is a sorted CSV file, which can be used to run the examples.

Further Reading

I have written several blog posts on Apache Flink:

About

Experiments with Apache Flink.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •