Skip to content

ChicoQ/spark-camus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Brief Overview

Spark-Camus is a Kafka -> HDFS pipeline on Spark platform. It includes the following features:

  • using low level Kafka consumer API
  • recording offsets of all partitions
  • convert the Kafka message to com.tresata.spark.kafka.KafkaRDD
  • call saveAsSequenceFile and save the messages to HDFS

Configure file

# kafka topic information
topic=
partNum=
brokerPort=
metadata.broker.list=
auto.offset.reset=largest
fetch.message.max.bytes=31457280

# HDFS path
outputBaseDir=
lastPath=
currOffPath=

# offset record path
offsetDir=

# Kafka message format
reader=bytebuffer

Running Spark-Camus

  • first, running the com.td.kafka.offset.OffsetMonitor and recording the real-time offset
  • create a new config.properties
  • call CamusJob or overwrite the CamusJob

About

rewrite the camus on Spark using Scala

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published