|
1 | 1 | # Kafka Connect FileSystem Connector [](https://travis-ci.org/mmolimar/kafka-connect-fs)[](https://coveralls.io/github/mmolimar/kafka-connect-fs?branch=master)
|
2 | 2 |
|
3 |
| -Kafka Connect FileSystem is a Source Connector for reading data from any file system which implements |
4 |
| -``org.apache.hadoop.fs.FileSystem`` class from [Hadoop-Common](https://github.com/apache/hadoop-common) and writing to Kafka. |
| 3 | +**kafka-connect-fs** is a [Kafka Connector](http://kafka.apache.org/documentation.html#connect) |
| 4 | +for reading records from files in the file systems specified and load them into Kafka. |
5 | 5 |
|
6 |
| -## Prerequisites |
| 6 | +Documentation for this connector can be found [here](http://kafka-connect-fs.readthedocs.io/). |
7 | 7 |
|
8 |
| -- Confluent 3.1.1 |
9 |
| -- Java 8 |
| 8 | +## Development |
10 | 9 |
|
11 |
| -## Getting started |
| 10 | +To build a development version you'll need a recent version of Kafka. You can build |
| 11 | +kafka-connect-fs with Maven using the standard lifecycle phases. |
12 | 12 |
|
13 |
| -### Building source ### |
14 |
| - mvn clean package |
| 13 | +## FAQ |
15 | 14 |
|
16 |
| -### Config the connector ### |
17 |
| - name=FsSourceConnector |
18 |
| - connector.class=com.github.mmolimar.kafka.connect.fs.FsSourceConnector |
19 |
| - tasks.max=1 |
20 |
| - fs.uris=file:///data,hdfs://localhost:9001/data |
21 |
| - topic=mytopic |
22 |
| - policy.class=com.github.mmolimar.kafka.connect.fs.policy.SimplePolicy |
23 |
| - policy.recursive=true |
24 |
| - policy_regexp=^[0-9]*\.txt$ |
25 |
| - file_reader.class=com.github.mmolimar.kafka.connect.fs.file.reader.TextFileReader |
26 |
| -The ``kafka-connect-fs.properties`` file defines: |
| 15 | +Some frequently asked questions on Kafka Connect FileSystem Connector can be found here - |
| 16 | +http://kafka-connect-fs.readthedocs.io/en/latest/faq.html |
27 | 17 |
|
28 |
| -1. The connector name. |
29 |
| -2. The class containing the connector. |
30 |
| -3. The number of tasks the connector is allowed to start. |
31 |
| -4. Comma-separated URIs of the FS(s). They can be URIs pointing directly to a file in the FS. |
32 |
| -5. Topic in which copy data to. |
33 |
| -6. Policy class to apply. |
34 |
| -7. Flag to activate traversed recursion in subdirectories when listing files. |
35 |
| -8. File reader class to read files from the FS. |
36 |
| -9. Regular expression to filter files from the FS. |
| 18 | +## Contribute |
37 | 19 |
|
38 |
| -#### Policies #### |
39 |
| - |
40 |
| -##### SimplePolicy ##### |
41 |
| - |
42 |
| -Just list files included in the corresponding URI. |
43 |
| - |
44 |
| -##### SleepyPolicy ##### |
45 |
| - |
46 |
| -Simple policy with an custom sleep on each execution. |
47 |
| - |
48 |
| -``` |
49 |
| - policy.sleepy.sleep=200000 |
50 |
| - policy.sleepy.fraction=100 |
51 |
| - policy.sleepy.max_execs=-1 |
52 |
| -``` |
53 |
| -1. Max sleep time (in ms) to wait to look for files in the FS. |
54 |
| -2. Sleep fraction to divide the sleep time to allow interrupt the policy. |
55 |
| -3. Max sleep times allowed (negative to disable). |
56 |
| - |
57 |
| -##### HdfsFileWatcherPolicy ##### |
58 |
| - |
59 |
| -It uses Hadoop notifications events (since Hadoop 2.6.0) and all create/append/close events will be reported as new files to be ingested. |
60 |
| -Just use it when your URIs start with ``hdfs://`` |
61 |
| - |
62 |
| -#### File readers #### |
63 |
| - |
64 |
| -##### AvroFileReader ##### |
65 |
| - |
66 |
| -Read files with [Avro](http://avro.apache.org/) format. |
67 |
| - |
68 |
| -##### ParquetFileReader ##### |
69 |
| - |
70 |
| -Read files with [Parquet](https://parquet.apache.org/) format. |
71 |
| - |
72 |
| -##### SequenceFileReader ##### |
73 |
| - |
74 |
| -Read [Sequence files](https://wiki.apache.org/hadoop/SequenceFile). |
75 |
| - |
76 |
| -##### DelimitedTextFileReader ##### |
77 |
| - |
78 |
| -Text file reader using custom tokens to distinguish different columns on each line. |
79 |
| - |
80 |
| -``` |
81 |
| - file_reader.delimited.token=, |
82 |
| - file_reader.delimited.header=true |
83 |
| -``` |
84 |
| -1. If the file contains header or not (default false). |
85 |
| -2. The token delimiter for columns. |
86 |
| - |
87 |
| -##### TextFileReader ##### |
88 |
| - |
89 |
| -Read plain text files. Each line represents one record. |
90 |
| - |
91 |
| -### Running in development ### |
92 |
| -``` |
93 |
| -mvn clean package |
94 |
| -export CLASSPATH="$(find target/ -type f -name '*.jar'| grep '\-package' | tr '\n' ':')" |
95 |
| -$CONFLUENT_HOME/bin/connect-standalone $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties config/kafka-connect-fs.properties |
96 |
| -``` |
97 |
| - |
98 |
| -## TODO's |
99 |
| - |
100 |
| -- [ ] Add more file readers. |
101 |
| -- [ ] Add more policies. |
102 |
| -- [ ] Manages FS blocks. |
103 |
| -- [ ] Improve documentation. |
104 |
| -- [ ] Include a FS Sink Connector. |
105 |
| - |
106 |
| -## Contributing |
107 |
| - |
108 |
| -If you would like to add/fix something to this connector, you are welcome to do so! |
| 20 | +- Source Code: https://github.com/mmolimar/kafka-connect-fs |
| 21 | +- Issue Tracker: https://github.com/mmolimar/kafka-connect-fs/issues |
109 | 22 |
|
110 | 23 | ## License
|
111 | 24 |
|
112 | 25 | Released under the Apache License, version 2.0.
|
113 |
| - |
0 commit comments