Loaders
What are Loaders?
CAMEL’s Loaders provide flexible ways to ingest and process all kinds of data
structured files, unstructured text, web content, and even OCR from images.
They power your agent’s ability to interact with the outside world. itionally,
several data readers were added, including Apify Reader
, Chunkr Reader
,
Firecrawl Reader
, Jina_url Reader
, and Mistral Reader
, which enable
retrieval of external data for improved data integration and analysis.
Types
Get Started
Using Base IO
This module is designed to read files of various formats, extract their contents, and represent them as File
objects, each tailored to handle a specific file type.
Using Unstructured IO
To get started with the Unstructured IO
module, just import and initialize it. You can parse, clean, extract, chunk, and stage data from files or URLs. Here’s how you use it step by step:
This guide gets you started with Unstructured IO
. For more, see the Unstructured IO Documentation.
Using Apify Reader
Initialize the Apify client, set up the required actors and parameters, and run the actor.
Using Firecrawl Reader
Firecrawl Reader provides a simple way to turn any website into LLM-ready markdown format. Here’s how you can use it step by step:
Initialize the Firecrawl client and start a crawl
First, create a Firecrawl client and crawl a specific URL.
When the status is “completed”
, the content extraction is done and you can retrieve the results.
Retrieve the extracted markdown content
Once finished, access the LLM-ready markdown directly from the response:
That’s it. With just a couple of lines, you can turn any website into clean markdown, ready for LLM pipelines or further processing.
Using Chunkr Reader
Chunkr Reader allows you to process PDFs (and other docs) in chunks, with built-in OCR and format control.
Below is a basic usage pattern:
Initialize the ChunkrReader
and ChunkrReaderConfig
, set the file path and chunking options, then submit your task and fetch results:
A successful task returns a chunked structure like this:
Using Jina Reader
Jina Reader provides a convenient interface to extract clean, LLM-friendly content from any URL in a chosen format (like markdown):
Using MarkitDown Reader
MarkitDown Reader lets you convert files (like HTML or docs) into LLM-ready markdown with a single line.
Example output:
Using Mistral Reader
Mistral Reader offers OCR and text extraction from both PDFs and images, whether local or remote. Just specify the file path or URL:
You can also extract from images or local files:
Response includes structured page data, markdown content, and usage details.