-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[BRAPI]Rest API guide #22056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
daisyfaithauma
wants to merge
1
commit into
production
Choose a base branch
from
rest-api-guide
base: production
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[BRAPI]Rest API guide #22056
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
178 changes: 178 additions & 0 deletions
178
src/content/docs/browser-rendering/how-to/markdown-extraction.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
--- | ||
title: Extracting blog post content as markdown using the markdown endpoint | ||
sidebar: | ||
order: 4 | ||
--- | ||
|
||
This guide shows you how to capture the complete JSON output from Cloudflare's [`/markdown` API endpoint](/browser-rendering/rest-api/markdown-endpoint/). | ||
|
||
We are extracting the content of a blog post from the Cloudflare Blog: [Introducing AutoRAG on Cloudflare](https://blog.cloudflare.com/introducing-autorag-on-cloudflare/) | ||
|
||
## Prerequisites | ||
|
||
1. Cloudflare Account and API Token. | ||
|
||
- [Create a token](/fundamentals/api/get-started/create-token/) with **Browser Rendering: Edit** permissions. | ||
- You can do this under **My Profile → API Tokens → Create Token** on your [Cloudflare dashboard](https://dash.cloudflare.com/). | ||
- Note your **Account ID** (from the dashboard homepage) and **API Token**. | ||
|
||
2. Command-line tools installed. | ||
|
||
- cURL: a command-line tool for sending HTTP requests. | ||
- macOS/Linux: usually preinstalled. | ||
- Windows: available via WSL, Git Bash, or native Windows builds. | ||
|
||
## 1: Configure your environment variables | ||
|
||
Save your sensitive information into environment variables to avoid hardcoding credentials. | ||
|
||
```bash | ||
export CF_ACCOUNT_ID="your-cloudflare-account-id" | ||
export CF_API_TOKEN="your-api-token-with-edit-permissions" | ||
``` | ||
|
||
## 2: Make the API Request and save the raw JSON | ||
|
||
Run this command to fetch the markdown representation of the AutoRAG blog post and store it into a local JSON file: | ||
|
||
```bash | ||
curl -s -X POST \ | ||
"https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer ${CF_API_TOKEN}" \ | ||
-d '{ | ||
"url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/" | ||
}' \ | ||
> autorag-full-response.json | ||
``` | ||
|
||
The `>` parameter redirects output into a file (`autorag-full-response.json`). | ||
|
||
## 3: Inspect the saved JSON | ||
|
||
You can check the start of the saved JSON file to ensure it looks right: | ||
|
||
```bash | ||
head -n 20 autorag-full-response.json | ||
``` | ||
|
||
```json output | ||
{ | ||
"success": true, | ||
"errors": [], | ||
"messages": [], | ||
"result": "# "[Get Started Free](https://dash.cloudflare.com/sign-up)|[Contact Sales](https://www.cloudflare.com/plans/enterprise/contact/)\n\n[ Skip unwanted resources | ||
|
||
To ignore unnecessary assets like CSS, JavaScript, or images when fetching the page add `rejectRequestPattern` parameter: | ||
|
||
```bash | ||
curl -s -X POST \ | ||
"https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer ${CF_API_TOKEN}" \ | ||
-d '{ | ||
"url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/", | ||
"rejectRequestPattern": [ | ||
"/^.*\\.(css|js|png|svg)$/" | ||
] | ||
}' \ | ||
> autorag-no-assets.json | ||
``` | ||
|
||
## 5: Extracting and saving the markdown from the JSON file | ||
|
||
After saving the full response, below is how to how to extract just the Markdown. | ||
|
||
The script does the following: | ||
|
||
1. Reads the full JSON response from `autorag-full-response.json` | ||
2. Extracts the Markdown string from the `"result"` field | ||
3. Writes that Markdown to `autorag-blog.md` | ||
|
||
```py | ||
#!/usr/bin/env python3 | ||
""" | ||
extract_markdown.py | ||
|
||
Reads the full JSON response from Cloudflare's Markdown endpoint | ||
and writes the 'result' field (the converted Markdown) to a .md file. | ||
""" | ||
|
||
import json | ||
import sys | ||
from pathlib import Path | ||
|
||
# Input and output file paths | ||
INPUT_JSON = Path("autorag-full-response.json") | ||
OUTPUT_MD = Path("autorag-blog.md") | ||
|
||
def main(): | ||
# Check that the input file exists | ||
if not INPUT_JSON.is_file(): | ||
print(f"Error: Input file '{INPUT_JSON}' not found.", file=sys.stderr) | ||
sys.exit(1) | ||
|
||
# Load the JSON response | ||
try: | ||
with INPUT_JSON.open("r", encoding="utf-8") as f: | ||
data = json.load(f) | ||
except json.JSONDecodeError as e: | ||
print(f"Error: Failed to parse JSON in '{INPUT_JSON}': {e}", file=sys.stderr) | ||
sys.exit(1) | ||
|
||
# Validate structure | ||
if not data.get("success", False): | ||
print("Error: API reported failure.", file=sys.stderr) | ||
errors = data.get("errors") or data.get("messages") | ||
if errors: | ||
print("Details:", errors, file=sys.stderr) | ||
sys.exit(1) | ||
|
||
if "result" not in data: | ||
print("Error: 'result' field not found in JSON.", file=sys.stderr) | ||
sys.exit(1) | ||
|
||
# Extract and write the Markdown | ||
markdown_content = data["result"] | ||
try: | ||
with OUTPUT_MD.open("w", encoding="utf-8") as md_file: | ||
md_file.write(markdown_content) | ||
except IOError as e: | ||
print(f"Error: Could not write to '{OUTPUT_MD}': {e}", file=sys.stderr) | ||
sys.exit(1) | ||
|
||
print(f"Success: Markdown content written to '{OUTPUT_MD}'.") | ||
|
||
if __name__ == "__main__": | ||
main() | ||
``` | ||
|
||
### Usage | ||
|
||
1. Ensure you have run the `curl` command to produce `autorag-full-response.json`. | ||
|
||
2. Place `extract_markdown.py` in the same directory. | ||
|
||
3. Run: | ||
|
||
``` | ||
python3 extract_markdown.py | ||
``` | ||
|
||
After execution, `autorag-blog.md` will contain the extracted Markdown. | ||
|
||
## Final folder structure | ||
|
||
After following these steps, your working folder will look like: | ||
|
||
``` | ||
. | ||
├── autorag-full-response.json # Full API response | ||
├── autorag-no-assets.json # Full API response without extra assets (optional) | ||
├── autorag-blog.md # Extracted Markdown content | ||
└── extract_markdown.py # Python extraction script (optional) | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about this tutorial..... if we're already running a Python script, couldn't we just have steps 2-4 included in the Python script as well?
Seems like it would simplify things a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to do that. Maybe you could guide me in the right direction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python has a
requests
library, which could be used to make the cURL request.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the broader question is also.... why Python here?
If we're using the REST API, all of the other examples are using the TypeScript SDK... so why would we change languages from what we provide in the rest of the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused too..
if our end goal is to generate an md file for a blog post, we could perhaps do it in a worker?
And if that is the case, it would make more sense to use BR bindings (instead of the REST API). We could all the BR work in the worker and finally return the md file as the worker response.
If it's not possible to do it in a worker, instead of python, we should do it as a node script with the typescript SDK or directly calling the REST API through fetch. Lemme know if you need any help figuring this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omarmosid I specifically did not do it with Workers because we do not have any guide of using our REST API endpoints. The goal is to show how to use our REST API endpoints.
Please assist in doing it through the fetch API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this? (using node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's something that fundamentally doesn't make sense to do via the REST API, shouldn't we be looking for another use case?
We don't want to promote an inefficient approach to a problem, even if it's illustrative.