LLM plugin for pulling web page content as Markdown.
This plugin fetches the primary content of a URL, formatted as Markdown, and makes it available as an LLM fragment.
For background on LLM fragments:
Install this plugin in the same environment as LLM.
llm install llm-url-markdown
You can feed the Markdown content of a web page into LLM using the md:
fragment prefix followed by the URL.
For example:
# Fetch content from a full URL
llm -f md:https://llm.datasette.io/en/stable/plugins/index.html 'Summarize the types of plugins available'
# Fetch content, defaulting to https://
llm -f md:github.com/simonw/llm 'What is the main goal of this LLM tool?'
Note: If your URL contains special characters (like ?
or &
), you must enclose the entire fragment identifier (e.g., md:https://...
) in single or double quotes on the command line. This prevents your shell from misinterpreting the URL.
The plugin prepends https://
if no protocol (like http://
or https://
) is specified in the URL. The fetched content is the Markdown representation of the webpage.
This plugin uses the Jina Reader API, which allows requests without an API key, although potentially subject to lower rate limits. The plugin functions without an API token, which is often sufficient for typical LLM fragment usage and improves usability.
If you have a Jina Reader API token and want to use it (for higher rate limits), you can set it as an environment variable:
# Set the token (only needed once per shell session)
export JINA_READER_TOKEN=your_api_token_here
# Then use the plugin as normal
llm -f md:example.com 'Summarize this page'
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-url-markdown
python -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
# Installs llm, httpx, and testing tools like pytest
pip install -e '.[test]'
To run the tests:
python -m pytest