Skip to content

Commit 8dc8453

Browse files
committed
Updating README
1 parent 8b37b67 commit 8dc8453

File tree

1 file changed

+12
-100
lines changed

1 file changed

+12
-100
lines changed

README.md

Lines changed: 12 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,25 @@
11
# Kafka Connect FileSystem Connector [![Build Status](https://travis-ci.org/mmolimar/kafka-connect-fs.svg?branch=master)](https://travis-ci.org/mmolimar/kafka-connect-fs)[![Coverage Status](https://coveralls.io/repos/github/mmolimar/kafka-connect-fs/badge.svg?branch=master)](https://coveralls.io/github/mmolimar/kafka-connect-fs?branch=master)
22

3-
Kafka Connect FileSystem is a Source Connector for reading data from any file system which implements
4-
``org.apache.hadoop.fs.FileSystem`` class from [Hadoop-Common](https://github.com/apache/hadoop-common) and writing to Kafka.
3+
**kafka-connect-fs** is a [Kafka Connector](http://kafka.apache.org/documentation.html#connect)
4+
for reading records from files in the file systems specified and load them into Kafka.
55

6-
## Prerequisites
6+
Documentation for this connector can be found [here](http://kafka-connect-fs.readthedocs.io/).
77

8-
- Confluent 3.1.1
9-
- Java 8
8+
## Development
109

11-
## Getting started
10+
To build a development version you'll need a recent version of Kafka. You can build
11+
kafka-connect-fs with Maven using the standard lifecycle phases.
1212

13-
### Building source ###
14-
mvn clean package
13+
## FAQ
1514

16-
### Config the connector ###
17-
name=FsSourceConnector
18-
connector.class=com.github.mmolimar.kafka.connect.fs.FsSourceConnector
19-
tasks.max=1
20-
fs.uris=file:///data,hdfs://localhost:9001/data
21-
topic=mytopic
22-
policy.class=com.github.mmolimar.kafka.connect.fs.policy.SimplePolicy
23-
policy.recursive=true
24-
policy_regexp=^[0-9]*\.txt$
25-
file_reader.class=com.github.mmolimar.kafka.connect.fs.file.reader.TextFileReader
26-
The ``kafka-connect-fs.properties`` file defines:
15+
Some frequently asked questions on Kafka Connect FileSystem Connector can be found here -
16+
http://kafka-connect-fs.readthedocs.io/en/latest/faq.html
2717

28-
1. The connector name.
29-
2. The class containing the connector.
30-
3. The number of tasks the connector is allowed to start.
31-
4. Comma-separated URIs of the FS(s). They can be URIs pointing directly to a file in the FS.
32-
5. Topic in which copy data to.
33-
6. Policy class to apply.
34-
7. Flag to activate traversed recursion in subdirectories when listing files.
35-
8. File reader class to read files from the FS.
36-
9. Regular expression to filter files from the FS.
18+
## Contribute
3719

38-
#### Policies ####
39-
40-
##### SimplePolicy #####
41-
42-
Just list files included in the corresponding URI.
43-
44-
##### SleepyPolicy #####
45-
46-
Simple policy with an custom sleep on each execution.
47-
48-
```
49-
policy.sleepy.sleep=200000
50-
policy.sleepy.fraction=100
51-
policy.sleepy.max_execs=-1
52-
```
53-
1. Max sleep time (in ms) to wait to look for files in the FS.
54-
2. Sleep fraction to divide the sleep time to allow interrupt the policy.
55-
3. Max sleep times allowed (negative to disable).
56-
57-
##### HdfsFileWatcherPolicy #####
58-
59-
It uses Hadoop notifications events (since Hadoop 2.6.0) and all create/append/close events will be reported as new files to be ingested.
60-
Just use it when your URIs start with ``hdfs://``
61-
62-
#### File readers ####
63-
64-
##### AvroFileReader #####
65-
66-
Read files with [Avro](http://avro.apache.org/) format.
67-
68-
##### ParquetFileReader #####
69-
70-
Read files with [Parquet](https://parquet.apache.org/) format.
71-
72-
##### SequenceFileReader #####
73-
74-
Read [Sequence files](https://wiki.apache.org/hadoop/SequenceFile).
75-
76-
##### DelimitedTextFileReader #####
77-
78-
Text file reader using custom tokens to distinguish different columns on each line.
79-
80-
```
81-
file_reader.delimited.token=,
82-
file_reader.delimited.header=true
83-
```
84-
1. If the file contains header or not (default false).
85-
2. The token delimiter for columns.
86-
87-
##### TextFileReader #####
88-
89-
Read plain text files. Each line represents one record.
90-
91-
### Running in development ###
92-
```
93-
mvn clean package
94-
export CLASSPATH="$(find target/ -type f -name '*.jar'| grep '\-package' | tr '\n' ':')"
95-
$CONFLUENT_HOME/bin/connect-standalone $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties config/kafka-connect-fs.properties
96-
```
97-
98-
## TODO's
99-
100-
- [ ] Add more file readers.
101-
- [ ] Add more policies.
102-
- [ ] Manages FS blocks.
103-
- [ ] Improve documentation.
104-
- [ ] Include a FS Sink Connector.
105-
106-
## Contributing
107-
108-
If you would like to add/fix something to this connector, you are welcome to do so!
20+
- Source Code: https://github.com/mmolimar/kafka-connect-fs
21+
- Issue Tracker: https://github.com/mmolimar/kafka-connect-fs/issues
10922

11023
## License
11124

11225
Released under the Apache License, version 2.0.
113-

0 commit comments

Comments
 (0)