Skip to content

[ML] Add recursive chunking strategy with Markdown splitters #125243

Closed
@dan-rubinstein

Description

@dan-rubinstein

TODO: Update this once there's been some time to research this.

The goal is to be able to chunk markdown documents and the best way seems to be through recursive chunking (splitting on a list of characters until chunks reach a desired size). We might as well create a generic recursive chunker that we can extend to use markdown chunking. We can look at LangChain's recursive splitter https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/. They also offer some interesting extra features with markdown chunking that include metadata in the chunked results. We might be able to utilize this metadata in the future so let's look into how we can expand this in the future.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions