[ML] Add recursive chunking strategy with Markdown splitters

TODO: Update this once there's been some time to research this.

The goal is to be able to chunk markdown documents and the best way seems to be through recursive chunking (splitting on a list of characters until chunks reach a desired size). We might as well create a generic recursive chunker that we can extend to use markdown chunking. We can look at LangChain's recursive splitter https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/. They also offer some interesting extra features with markdown chunking that include metadata in the chunked results. We might be able to utilize this metadata in the future so let's look into how we can expand this in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Add recursive chunking strategy with Markdown splitters #125243

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Add recursive chunking strategy with Markdown splitters #125243

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions