A streaming tutorial for Kafka
Kafka Processing
- Start Zookeeper (zookeeper-server-start.sh config/zookeeper.properties)
- Start Kafka Server (kafka-server-start.sh config/server.properties)
- Create a Kafka Topic (kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ) #One Time
Optional (to check whether we are getting the messages from producer to consumer)
- Start the Producer (kafka_2.11-0.11.0.2$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic )
- Start the consumer (kafka_2.11-0.11.0.2$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic --from-beginning)
For Getting RealTime Twitter Data:-
- Run the first 3 steps from Kakfa Processing mentioned above and take a note of the kafka topic you have created, because that is going to be the topic for running the python code.
- Run KafkaPython_TwitterStreaming.py
- Run Kafka_SparkStreaming.py to load twitter data into HDFS