Extracting blog post content as markdown using the markdown endpoint

This guide shows you how to capture the complete JSON output from Cloudflare's /markdown API endpoint.

We are extracting the content of a blog post from the Cloudflare Blog: Introducing AutoRAG on Cloudflare ↗

Prerequisites

Cloudflare Account and API Token.
- Create a token with Browser Rendering: Edit permissions.
- You can do this under My Profile → API Tokens → Create Token on your Cloudflare dashboard ↗.
- Note your Account ID (from the dashboard homepage) and API Token.
Command-line tools installed.
- cURL: a command-line tool for sending HTTP requests.
  - macOS/Linux: usually preinstalled.
  - Windows: available via WSL, Git Bash, or native Windows builds.

1: Configure your environment variables

Save your sensitive information into environment variables to avoid hardcoding credentials.

export CF_ACCOUNT_ID="your-cloudflare-account-id"
export CF_API_TOKEN="your-api-token-with-edit-permissions"

2: Make the API Request and save the raw JSON

Run this command to fetch the markdown representation of the AutoRAG blog post and store it into a local JSON file:

curl -s -X POST \
  "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -d '{
    "url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/"
  }' \
> autorag-full-response.json

The > parameter redirects output into a file (autorag-full-response.json).

3: Inspect the saved JSON

You can check the start of the saved JSON file to ensure it looks right:

head -n 20 autorag-full-response.json

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": "# "[Get Started Free](https://dash.cloudflare.com/sign-up)|[Contact Sales](https://www.cloudflare.com/plans/enterprise/contact/)\n\n[![The Cloudflare Blog](https://cf-assets.www.cloudflare ..."
}

4: (Optional) Skip unwanted resources

To ignore unnecessary assets like CSS, JavaScript, or images when fetching the page add rejectRequestPattern parameter:

curl -s -X POST \
  "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CF_API_TOKEN}" \
  -d '{
    "url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/",
    "rejectRequestPattern": [
      "/^.*\\.(css|js|png|svg)$/"
    ]
  }' \
> autorag-no-assets.json

5: Extracting and saving the markdown from the JSON file

After saving the full response, below is how to how to extract just the Markdown.

The script does the following:

Reads the full JSON response from autorag-full-response.json
Extracts the Markdown string from the "result" field
Writes that Markdown to autorag-blog.md

#!/usr/bin/env python3
"""
extract_markdown.py

Reads the full JSON response from Cloudflare's Markdown endpoint
and writes the 'result' field (the converted Markdown) to a .md file.
"""

import json
import sys
from pathlib import Path

# Input and output file paths
INPUT_JSON = Path("autorag-full-response.json")
OUTPUT_MD   = Path("autorag-blog.md")

def main():
    # Check that the input file exists
    if not INPUT_JSON.is_file():
        print(f"Error: Input file '{INPUT_JSON}' not found.", file=sys.stderr)
        sys.exit(1)

    # Load the JSON response
    try:
        with INPUT_JSON.open("r", encoding="utf-8") as f:
            data = json.load(f)
    except json.JSONDecodeError as e:
        print(f"Error: Failed to parse JSON in '{INPUT_JSON}': {e}", file=sys.stderr)
        sys.exit(1)

    # Validate structure
    if not data.get("success", False):
        print("Error: API reported failure.", file=sys.stderr)
        errors = data.get("errors") or data.get("messages")
        if errors:
            print("Details:", errors, file=sys.stderr)
        sys.exit(1)

    if "result" not in data:
        print("Error: 'result' field not found in JSON.", file=sys.stderr)
        sys.exit(1)

    # Extract and write the Markdown
    markdown_content = data["result"]
    try:
        with OUTPUT_MD.open("w", encoding="utf-8") as md_file:
            md_file.write(markdown_content)
    except IOError as e:
        print(f"Error: Could not write to '{OUTPUT_MD}': {e}", file=sys.stderr)
        sys.exit(1)

    print(f"Success: Markdown content written to '{OUTPUT_MD}'.")

if __name__ == "__main__":
    main()

Usage

Ensure you have run the curl command to produce autorag-full-response.json.
Place extract_markdown.py in the same directory.
Run:

python3 extract_markdown.py

After execution, autorag-blog.md will contain the extracted Markdown.

Final folder structure

After following these steps, your working folder will look like:

.
├── autorag-full-response.json    # Full API response
├── autorag-no-assets.json        # Full API response without extra assets (optional)
├── autorag-blog.md               # Extracted Markdown content
└── extract_markdown.py           # Python extraction script (optional)

Was this helpful?

Community
X
Discord
YouTube
GitHub