Skip to content

Commit dd77e27

Browse files
ontarionicktdas
authored andcommitted
[SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD
tdas koeninger This updates the Spark Streaming + Kafka Integration Guide doc with a working method to access the offsets of a `KafkaRDD` through Python. Author: Nick Evans <[email protected]> Closes apache#9289 from manygrams/update_kafka_direct_python_docs.
1 parent a9a6b80 commit dd77e27

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

docs/streaming-kafka-integration.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,20 @@ Next, we discuss how to use this approach in your streaming application.
181181
);
182182
</div>
183183
<div data-lang="python" markdown="1">
184-
Not supported yet
184+
offsetRanges = []
185+
186+
def storeOffsetRanges(rdd):
187+
global offsetRanges
188+
offsetRanges = rdd.offsetRanges()
189+
return rdd
190+
191+
def printOffsetRanges(rdd):
192+
for o in offsetRanges:
193+
print "%s %s %s %s" % (o.topic, o.partition, o.fromOffset, o.untilOffset)
194+
195+
directKafkaStream\
196+
.transform(storeOffsetRanges)\
197+
.foreachRDD(printOffsetRanges)
185198
</div>
186199
</div>
187200

0 commit comments

Comments
 (0)