Skip to content

Commit 999dfe5

Browse files
committed
Agnostic file reader documentation
1 parent 0d73223 commit 999dfe5

File tree

2 files changed

+58
-3
lines changed

2 files changed

+58
-3
lines changed

docs/source/config_options.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,3 +276,36 @@ In order to configure custom properties for this reader, the name you must use i
276276
* Type: string
277277
* Default: null
278278
* Importance: low
279+
280+
Agnostic
281+
--------------------------------------------
282+
283+
In order to configure custom properties for this reader, the name you must use is ``agnostic``.
284+
285+
``file_reader.agnostic.extensions.parquet``
286+
A comma-separated string list with the accepted extensions for Parquet files.
287+
288+
* Type: string
289+
* Default: parquet
290+
* Importance: medium
291+
292+
``file_reader.agnostic.extensions.avro``
293+
A comma-separated string list with the accepted extensions for Avro files.
294+
295+
* Type: string
296+
* Default: avro
297+
* Importance: medium
298+
299+
``file_reader.agnostic.extensions.sequence``
300+
A comma-separated string list with the accepted extensions for Sequence files.
301+
302+
* Type: string
303+
* Default: seq
304+
* Importance: medium
305+
306+
``file_reader.agnostic.extensions.delimited``
307+
A comma-separated string list with the accepted extensions for Delimited text files.
308+
309+
* Type: string
310+
* Default: tsv,csv
311+
* Importance: medium

docs/source/filereaders.rst

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,12 @@ to Kafka is created by transforming the record by means of
88
`Confluent avro-converter <https://github.com/confluentinc/schema-registry/tree/master/avro-converter>`__
99
API.
1010

11+
More information about properties of this file reader :ref:`here<config_options-filereaders-avro>`.
12+
1113
Parquet
1214
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1315

14-
Read files with `Parquet <https://parquet.apache.org/>`__ format.
16+
Reads files with `Parquet <https://parquet.apache.org/>`__ format.
1517

1618
The reader takes advantage of the Parquet-Avro API and uses the Parquet file
1719
as if it were an Avro file, so the message sent to Kafka is built in the same
@@ -22,6 +24,8 @@ way as the Avro file reader does.
2224
over and over again and has to seek the file, the performance
2325
can be affected.
2426

27+
More information about properties of this file reader :ref:`here<config_options-filereaders-parquet>`.
28+
2529
SequenceFile
2630
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2731

@@ -32,8 +36,7 @@ This reader can process this file format and build a Kafka message with the
3236
key/value pair. These two values are named ``key`` and ``value`` in the message
3337
by default but you can customize these field names.
3438

35-
More information about properties of this file reader
36-
:ref:`here<config_options-filereaders-sequencefile>`.
39+
More information about properties of this file reader :ref:`here<config_options-filereaders-sequencefile>`.
3740

3841
Text
3942
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -44,6 +47,8 @@ Each line represents one record which will be in a field
4447
named ``value`` in the message sent to Kafka by default but you can
4548
customize these field names.
4649

50+
More information about properties of this file reader :ref:`here<config_options-filereaders-text>`.
51+
4752
Delimited text
4853
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4954

@@ -56,3 +61,20 @@ Also, the token delimiter for columns is configurable.
5661

5762
More information about properties of this file reader :ref:`here<config_options-filereaders-delimited>`.
5863

64+
Agnostic
65+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
66+
67+
Actually, this reader is a wrapper of the readers listing above.
68+
69+
It tries to read any kind of file format using an internal reader based on the file extension,
70+
applying the proper one (Parquet, Avro, SecuenceFile, Text or Delimited text). In case of no
71+
extension has been matched, the Text file reader will be applied.
72+
73+
Default extensions for each format:
74+
* Parquet: .parquet
75+
* Avro: .avro
76+
* SequenceFile: .seq
77+
* Delimited text: .tsv, .csv
78+
* Text: any other sort of file extension.
79+
80+
More information about properties of this file reader :ref:`here<config_options-filereaders-agnostic>`.

0 commit comments

Comments
 (0)