OneFileLLM: Efficient Data Aggregation for LLM Ingestion

OneFileLLM is a command-line tool designed to streamline the creation of information-dense prompts for large language models (LLMs). It aggregates and preprocesses data from a variety of sources, compiling them into a single text file that is automatically copied to your clipboard for quick use.

Features

Automatic source type detection based on provided path, URL, or identifier
Support for local files and/or directories, GitHub repositories, GitHub pull requests, GitHub issues, academic papers from ArXiv, YouTube transcripts, web page documentation, Sci-Hub hosted papers via DOI or PMID
Handling of multiple file formats, including Jupyter Notebooks (.ipynb), PDFs, and Excel files (.xls/.xlsx)
Advanced asynchronous web crawling with extensive configuration options for depth, concurrency, content filtering, and robots.txt compliance
Integration with Sci-Hub for automatic downloading of research papers using DOIs or PMIDs
Text preprocessing, including compressed and uncompressed outputs, stopword removal, and lowercase conversion
Automatic copying of uncompressed text to the clipboard for easy pasting into LLMs
Token count reporting for both compressed and uncompressed outputs
XML encapsulation of output for improved LLM performance
Text stream input processing directly from stdin or clipboard
Format detection and processing for plain text, Markdown, JSON, HTML, and YAML
Format override option to control input processing
Excel spreadsheet (.xls/.xlsx) processing to Markdown tables
Alias system for frequently used sources
Proper PDF text extraction from local files
Cross-platform launcher scripts for easy execution

Installation

Prerequisites

Install the required dependencies:

pip install -U -r requirements.txt

GitHub Personal Access Token

To access private GitHub repositories, generate a personal access token as described in the 'Obtaining a GitHub Personal Access Token' section here:

Obtaining a GitHub Personal Access Token

Setup

Clone the repository or download the source code.

Usage

Run the script using the following command:

python onefilellm.py

You can pass a single URL or path as a command line argument for faster processing:

python onefilellm.py https://github.com/jimmc414/1filellm

Text Stream Processing

OneFileLLM now supports processing text directly from standard input (stdin) or the system clipboard:

Processing from Standard Input

Use the - argument to process text from standard input:

# Process text from a file via pipe
cat README.md | python onefilellm.py -

# Process output from another command
git diff | python onefilellm.py -

Processing from Clipboard

Use the --clipboard or -c argument to process text from the system clipboard:

# Copy text to clipboard first, then run:
python onefilellm.py --clipboard

# Or using the short form:
python onefilellm.py -c

Format Detection and Override

OneFileLLM automatically detects the format of input text (plain text, Markdown, JSON, HTML, YAML) and processes it accordingly. You can override this detection with the --format or -f option:

# Force processing as JSON
cat data.txt | python onefilellm.py - --format json

# Force processing clipboard content as Markdown 
python onefilellm.py --clipboard -f markdown

Supported format types: text, markdown, json, html, yaml, doculing, markitdown

Multiple Inputs

OneFileLLM supports processing multiple inputs at once. Simply provide multiple paths or URLs as command line arguments:

python onefilellm.py https://github.com/jimmc414/1filellm test_file1.txt test_file2.txt

When multiple inputs are provided, OneFileLLM will:

Process each input separately according to its type
Combine all outputs into a single XML document with the <onefilellm_output> root tag
Save the combined output to output.xml
Copy the content to your clipboard for immediate use with LLMs

Using Aliases

OneFileLLM includes a powerful alias system that allows you to create shortcuts for frequently used commands with support for placeholders and advanced management:

Managing Aliases

Create, list, and remove aliases using the new alias management commands:

# Add or update an alias
python onefilellm.py --alias-add myrepo "https://github.com/user/repo"

# Add alias with flags and options
python onefilellm.py --alias-add deepcrawl "https://docs.example.com --crawl-max-depth 4 --crawl-include-pattern '/docs/'"

# List all aliases (core and user-defined)
python onefilellm.py --alias-list

# List only pre-shipped core aliases
python onefilellm.py --alias-list-core

# Remove a user-defined alias
python onefilellm.py --alias-remove myrepo

Placeholder Support

Aliases support a {} placeholder that gets replaced with your input:

# Create alias with placeholder
python onefilellm.py --alias-add github_search "https://github.com/search?q={}"

# Use with replacement value
python onefilellm.py github_search "onefilellm"
# Expands to: https://github.com/search?q=onefilellm

# Create complex alias with multiple flags
python onefilellm.py --alias-add crawl_site "{} --crawl-max-depth 3 --crawl-respect-robots"

# Use it
python onefilellm.py crawl_site "https://docs.python.org"
# Expands to: https://docs.python.org --crawl-max-depth 3 --crawl-respect-robots

Core Aliases

OneFileLLM comes with useful pre-shipped aliases:

ofl_repo - OneFileLLM GitHub repository
ofl_readme - OneFileLLM README file
gh_search - GitHub search with placeholder: https://github.com/search?q={}
arxiv_search - ArXiv search with placeholder

Alias Precedence and Storage

User aliases override core aliases with the same name
JSON storage: User aliases stored in ~/.onefilellm_aliases/aliases.json
Robust error handling: Graceful handling of missing or corrupted alias files
Validation: Alias names must be alphanumeric with optional hyphens/underscores

Using Aliases

Use aliases just like any other input:

# Use a simple alias
python onefilellm.py ofl_repo

# Use alias with placeholder
python onefilellm.py gh_search "python"

# Mix aliases with direct inputs
python onefilellm.py ofl_repo local_file.txt

# Combine multiple aliases and arguments
python onefilellm.py ofl_repo github_search "machine learning" --format markdown

Advanced Web Crawling

OneFileLLM features a powerful asynchronous web crawler with extensive configuration options for precise control over content extraction:

Basic Web Crawling

# Basic web crawl (default: 3 levels deep, up to 1000 pages)
python onefilellm.py https://docs.example.com

# Custom depth and page limits
python onefilellm.py https://example.com --crawl-max-depth 5 --crawl-max-pages 200

URL Pattern Filtering

# Include only specific URL patterns
python onefilellm.py https://docs.example.com --crawl-include-pattern "/docs/"

# Exclude specific patterns (CSS, JS, images)
python onefilellm.py https://example.com --crawl-exclude-pattern "\.(css|js|png|jpg|gif)$"

# Restrict crawling to paths under the start URL
python onefilellm.py https://example.com/docs --crawl-restrict-path

Content Control

# Include images and code blocks
python onefilellm.py https://example.com --crawl-include-images

# Exclude code blocks from output
python onefilellm.py https://example.com --crawl-no-include-code

# Disable heading extraction
python onefilellm.py https://example.com --crawl-no-extract-headings

# Include PDF files in crawl
python onefilellm.py https://example.com --crawl-include-pdfs

# Follow external links (default: stay on same domain)
python onefilellm.py https://example.com --crawl-follow-links

HTML Processing Options

# Disable readability cleaning (keep raw HTML structure)
python onefilellm.py https://example.com --crawl-no-clean-html

# Keep JavaScript and CSS code
python onefilellm.py https://example.com --crawl-no-strip-js --crawl-no-strip-css

# Keep HTML comments
python onefilellm.py https://example.com --crawl-no-strip-comments

Crawling Behavior

# Custom user agent
python onefilellm.py https://example.com --crawl-user-agent "MyBot/1.0"

# Delay between requests (seconds)
python onefilellm.py https://example.com --crawl-delay 1.0

# Request timeout (seconds)
python onefilellm.py https://example.com --crawl-timeout 30

# Concurrent requests (default: 3)
python onefilellm.py https://example.com --crawl-concurrency 5

# Respect robots.txt (default: ignore for backward compatibility)
python onefilellm.py https://example.com --crawl-respect-robots

Complete Advanced Example

# Comprehensive crawl with custom settings
python onefilellm.py https://docs.example.com \
  --crawl-max-depth 4 \
  --crawl-max-pages 500 \
  --crawl-include-pattern "/docs/|/api/" \
  --crawl-exclude-pattern "\.(pdf|zip|exe)$" \
  --crawl-include-images \
  --crawl-delay 0.5 \
  --crawl-concurrency 2 \
  --crawl-respect-robots \
  --crawl-restrict-path

Web Crawler Options Reference

Option	Type	Default	Description
`--crawl-max-depth`	int	3	Maximum crawl depth from start URL
`--crawl-max-pages`	int	1000	Maximum number of pages to crawl
`--crawl-user-agent`	str	OneFileLLMCrawler/1.1	User agent string for requests
`--crawl-delay`	float	0.25	Delay between requests in seconds
`--crawl-include-pattern`	str	None	Regex pattern for URLs to include
`--crawl-exclude-pattern`	str	None	Regex pattern for URLs to exclude
`--crawl-timeout`	int	20	Request timeout in seconds
`--crawl-include-images`	flag	False	Include image URLs in output
`--crawl-no-include-code`	flag	False	Exclude code blocks from output
`--crawl-no-extract-headings`	flag	False	Exclude heading extraction
`--crawl-follow-links`	flag	False	Follow links to external domains
`--crawl-no-clean-html`	flag	False	Disable readability cleaning
`--crawl-no-strip-js`	flag	False	Keep JavaScript code
`--crawl-no-strip-css`	flag	False	Keep CSS styles
`--crawl-no-strip-comments`	flag	False	Keep HTML comments
`--crawl-respect-robots`	flag	False	Respect robots.txt files
`--crawl-concurrency`	int	3	Number of concurrent requests
`--crawl-restrict-path`	flag	False	Restrict crawl to paths under start URL
`--crawl-no-include-pdfs`	flag	False	Skip PDF files during crawl
`--crawl-no-ignore-epubs`	flag	False	Include EPUB files in crawl

Expected Inputs and Resulting Outputs

The tool supports the following input options:

Local file path (e.g., C:\documents\report.pdf)
Local directory path (e.g., C:\projects\research) -> (files of selected filetypes segmented into one flat text file)
GitHub repository URL (e.g., https://github.com/jimmc414/onefilellm) -> (Repo files of selected filetypes segmented into one flat text file)
GitHub pull request URL (e.g., dear-github/dear-github#102) -> (Pull request diff detail and comments and entire repository content concatenated into one flat text file)
GitHub issue URL (e.g., isaacs/github#1191) -> (Issue details, comments, and entire repository content concatenated into one flat text file)
ArXiv paper URL (e.g., https://arxiv.org/abs/2401.14295) -> (Full paper PDF to text file)
YouTube video URL (e.g., https://www.youtube.com/watch?v=KZ_NlnmPQYk) -> (Video transcript to text file)
Webpage URL (e.g., https://llm.datasette.io/en/stable/) -> (To scrape pages to x depth in segmented text file)
Sci-Hub Paper DOI (Digital Object Identifier of Sci-Hub hosted paper) (e.g., 10.1053/j.ajkd.2017.08.002) -> (Full Sci-Hub paper PDF to text file)
Sci-Hub Paper PMID (PubMed Identifier of Sci-Hub hosted paper) (e.g., 29203127) -> (Full Sci-Hub paper PDF to text file)
Standard input via pipe (e.g., cat file.txt | python onefilellm.py -)
Clipboard content (e.g., python onefilellm.py --clipboard)

The tool supports the following input options, with their corresponding output actions. Note that the input file extensions are selected based on the following section of code (Applicable to Repos only):

allowed_extensions = ['.xyz', '.pdq', '.example']

The output for all options is encapsulated in LLM prompt-appropriate XML and automatically copied to the clipboard.

Local file path
- Example Input: C:\documents\report.pdf
- Output: The contents of the PDF file are extracted and saved into a single text file.
Local directory path
- Example Input: C:\projects\research
- Output: Files of selected file types within the directory are segmented and saved into a single flat text file.
GitHub repository URL
- Example Input: https://github.com/jimmc414/onefilellm
- Output: Repository files of selected file types are segmented and saved into a single flat text file.
GitHub pull request URL
- Example Input: https://github.com/dear-github/dear-github/pull/102
- Output: Pull request diff details, comments, and the entire repository content are concatenated into a single flat text file.
GitHub issue URL
- Example Input: https://github.com/isaacs/github/issues/1191
- Output: Issue details, comments, and the entire repository content are concatenated into a single flat text file.
ArXiv paper URL
- Example Input: https://arxiv.org/abs/2401.14295
- Output: The full paper PDF is converted into a text file.
YouTube video URL
- Example Input: https://www.youtube.com/watch?v=KZ_NlnmPQYk
- Output: The video transcript is extracted and saved into a text file.
Webpage URL
- Example Input: https://llm.datasette.io/en/stable/
- Output: The webpage content and linked pages up to a specified depth are scraped and segmented into a text file.
Sci-Hub Paper DOI
- Example Input: 10.1053/j.ajkd.2017.08.002
- Output: The full Sci-Hub paper PDF is converted into a text file.
Sci-Hub Paper PMID
- Example Input: 29203127
- Output: The full Sci-Hub paper PDF is converted into a text file.
Standard Input
- Example Input: cat file.txt | python onefilellm.py -
- Output: The piped text content is processed according to its detected format (or format override).
Clipboard
- Example Input: python onefilellm.py --clipboard
- Output: The clipboard text content is processed according to its detected format (or format override).

The script generates the following output files:

output.xml: The full XML-structured output, automatically copied to the clipboard.
compressed_output.txt: Cleaned and compressed text (when NLTK processing is enabled).
processed_urls.txt: A list of all processed URLs during web crawling.

Configuration

File Type Configuration

To modify the allowed file types for repository processing, update the allowed_extensions list in the code:

allowed_extensions = ['.py', '.txt', '.js', '.rst', '.sh', '.md', '.pyx', '.html', '.yaml','.json', '.jsonl', '.ipynb', '.h', '.c', '.sql', '.csv']

Web Crawling Configuration

Web crawling behavior is now controlled through command-line arguments rather than hardcoded values. You can configure:

Crawl depth: Use --crawl-max-depth N (default: 3)
Page limits: Use --crawl-max-pages N (default: 1000)
URL filtering: Use --crawl-include-pattern and --crawl-exclude-pattern
Content processing: Use flags like --crawl-include-images, --crawl-no-include-code
Request behavior: Use --crawl-delay, --crawl-timeout, --crawl-concurrency
Domain restrictions: Use --crawl-follow-links, --crawl-restrict-path
Robots.txt compliance: Use --crawl-respect-robots

Environment Variables

The tool supports environment variables for configuration:

GITHUB_TOKEN: Set your GitHub personal access token for private repository access
RUN_INTEGRATION_TESTS: Set to true to enable integration tests
RUN_SLOW_TESTS: Set to true to enable slow tests

You can also use a .env file in the project root directory to set these variables:

# .env file
GITHUB_TOKEN=your_github_token_here
RUN_INTEGRATION_TESTS=false
RUN_SLOW_TESTS=false

Obtaining a GitHub Personal Access Token

To access private GitHub repositories, you need a personal access token. Follow these steps:

Log in to your GitHub account and go to Settings.
Navigate to Developer settings > Personal access tokens.
Click on "Generate new token" and provide a name.
Select the necessary scopes (at least repo for private repositories).
Click "Generate token" and copy the token value.

In the onefilellm.py script, replace GITHUB_TOKEN with your actual token or set it as an environment variable:

For Windows:
```
setx GITHUB_TOKEN "YourGitHubToken"
```

For Linux:

echo 'export GITHUB_TOKEN="YourGitHubToken"' >> ~/.bashrc
source ~/.bashrc

XML Output Format

All output is encapsulated in XML tags. This structure was implemented based on evaluations showing that LLMs perform better with prompts structured in XML. The general structure of the output is as follows:

Single Source Output

<onefilellm_output>
  <source type="[source_type]" [additional_attributes]>
    <[content_type]>
      [Extracted content]
    </[content_type]>
  </source>
</onefilellm_output>

Multiple Sources Output

<onefilellm_output>
  <source type="[source_type_1]" [additional_attributes]>
    <[content_type]>
      [Extracted content 1]
    </[content_type]>
  </source>
  <source type="[source_type_2]" [additional_attributes]>
    <[content_type]>
      [Extracted content 2]
    </[content_type]>
  </source>
  <!-- Additional sources as needed -->
</onefilellm_output>

Where [source_type] could be one of: "github_repository", "github_pull_request", "github_issue", "arxiv_paper", "youtube_transcript", "web_documentation", "sci_hub_paper", "local_directory", "local_file", "stdin", or "clipboard".

This XML structure provides clear delineation of different content types and sources, improving the LLM's understanding and processing of the input.

Data Flow Diagram

                                 +--------------------------------+
                                 |      External Services         |
                                 |--------------------------------|
                                 |  GitHub API  | YouTube API     |
                                 |  Sci-Hub     | ArXiv           |
                                 +--------------------------------+
                                           |
                                           |
                                           v
 +----------------------+          +---------------------+         +----------------------+
 |                      |          |                     |         |                      |
 |        User          |          |  Command Line Tool  |         |  External Libraries  |
 |----------------------|          |---------------------|         |----------------------|
 | - Provides input URL |--------->| - Handles user input|         | - Requests           |
 | - Provides text via  |          | - Detects source    |<--------| - BeautifulSoup      |
 |   pipe or clipboard  |          |   type              |         | - PyPDF2             |
 | - Receives text      |          | - Calls appropriate |         | - Tiktoken           |
 |   in clipboard       |<---------| - processing modules|         | - NLTK               |
 |                      |          | - Preprocesses text |         | - Nbformat           |
 +----------------------+          | - Generates output  |         | - Nbconvert          |
                                   |   files             |         | - YouTube Transcript |
                                   | - Copies text to    |         |   API                |
                                   |   clipboard         |         | - Pyperclip          |
                                   | - Reports token     |         | - Wget               |
                                   |   count             |         | - Tqdm               |
                                   +---------------------+         | - Rich               |
                                           |                       | - PyYAML             |
                                           |                       +----------------------+
                                           v
                                    +---------------------+
                                    | Source Type         |
                                    | Detection           |
                                    |---------------------|
                                    | - Determines type   |
                                    |   of source         |
                                    +---------------------+
                                           |
                                           v
                                    +---------------------+
                                    | Processing Modules  |
                                    |---------------------|
                                    | - GitHub Repo Proc  |
                                    | - Local Dir Proc    |
                                    | - YouTube Transcript|
                                    |   Proc              |
                                    | - ArXiv PDF Proc    |
                                    | - Sci-Hub Paper Proc|
                                    | - Webpage Crawling  |
                                    |   Proc              |
                                    | - Text Stream Proc  |
                                    +---------------------+
                                           |
                                           v
                                    +---------------------+
                                    | Text Preprocessing  |
                                    |---------------------|
                                    | - Stopword removal  |
                                    | - Lowercase         |
                                    |   conversion        |
                                    | - Text cleaning     |
                                    +---------------------+
                                           |
                                           v
                                    +---------------------+
                                    | Output Generation   |
                                    |---------------------|
                                    | - Compressed text   |
                                    |   file output       |
                                    | - Uncompressed text |
                                    |   file output       |
                                    +---------------------+
                                           |
                                           v
                                    +---------------------+
                                    | Token Count         |
                                    | Reporting           |
                                    |---------------------|
                                    | - Report token count|
                                    |                     |
                                    | - Copies text to    |
                                    |   clipboard         |
                                    +---------------------+

Recent Changes

2025-06-01:
- Major Enhancement: Alias Management 2.0: Complete overhaul of the alias system
- Added powerful new alias management with JSON-based storage (~/.onefilellm_aliases/aliases.json)
- Implemented placeholder support using {} token for dynamic command substitution
- Added new CLI commands: --alias-add, --alias-remove, --alias-list, --alias-list-core
- Created pre-shipped core aliases (ofl_repo, ofl_readme, gh_search, arxiv_search)
- Added user alias precedence over core aliases for customization
- Implemented early alias expansion before argument parsing for proper command-line integration
- Enhanced alias validation and robust error handling for missing/corrupted files
- Replaced old file-per-alias system with centralized JSON management
2025-05-30:
- Major Enhancement: Implemented advanced asynchronous web crawler with extensive configuration options
- Added 19 new command-line switches for web crawling control (--crawl-* options)
- Integrated structured content extraction with support for headings, code blocks, tables, and images
- Added robots.txt compliance, concurrent request control, and URL pattern filtering
- Implemented python-dotenv support for environment variable management
- Enhanced test suite with 57 comprehensive tests covering GitHub Issues/PRs, advanced web crawling, and multiple input processing
- Converted core functionality to async/await architecture for improved performance
2025-05-14:
- Added text stream input processing directly from stdin or clipboard
- Implemented format detection for plain text, Markdown, JSON, HTML, and YAML
- Added format override option with --format TYPE or -f TYPE flags
- Updated help messages and error handling for stream processing
- Added comprehensive test suite for stream processing features
2025-05-10:
- Added Excel spreadsheet (.xls/.xlsx) processing with conversion to Markdown tables
- Support for both local Excel files and Excel files via URL
- Each sheet in an Excel workbook is converted to a separate Markdown table
- Added intelligent header detection for tables with varying formats
2025-05-07:
- Added alias management system for frequently used sources
- Added legacy alias management system (replaced by Alias Management 2.0)
- Fixed PDF text extraction for local PDF files
- Changed root XML tag from <combined_sources> to <onefilellm_output>
- Added cross-platform launcher scripts for Windows and Linux/macOS
- Improved user feedback during alias operations
2025-05-03:
- Added support for processing multiple inputs as command line arguments
- Implemented functionality to combine multiple outputs into a cohesive XML document
- Refactored code to improve modularity and reusability
- Added test files to demonstrate multi-input capabilities
2025-01-20:
- Added file and directory exclusion functionality to reduce context window usage
- Added ability to exclude auto-generated files (*.pb.go, etc.)
- Added ability to exclude mock files and test directories
- Updated documentation with exclusion configuration instructions
2025-01-17:
- Added ability to exclude specific directories from processing
- Updated directory traversal logic to respect exclusion rules
2024-07-29:
- Updated output format to encapsulate content in XML tags. This change was implemented due to evaluations showing that LLMs perform better with prompts structured in XML.
- Added tests for GitHub issues and GitHub pull requests to improve robustness and reliability.
- Updated various processing functions to return formatted content instead of writing directly to files, improving consistency and testability.
2024-05-17: Added ability to pass path or URL as command line argument.
2024-05-16: Updated text colors.
2024-05-11:
- Updated requirements.txt.
- Added Rich library to onefilellm.py.
2024-04-04:
- Added GitHub PR and issue tests.
- Added GitHub PR and issues.
- Added tests for GitHub PRs and issues.
- Added ability to concatenate specific GitHub issue and repo when GitHub issue URL is passed.
- Updated tests to include pull request changes.
- Added ability to concatenate pull request and repo when GitHub pull request URL is passed.
2024-04-03:
- Included the ability to pull a complete GitHub pull request given the GitHub pull request URL.
- Updated onefilellm.py to return an error when Sci-hub is inaccessible or no document is found.

Notes

Repository Processing

File types: Modify the allowed_extensions list in the code to add or remove file types:

allowed_extensions = ['.py', '.txt', '.js', '.rst', '.sh', '.md', '.pyx', '.html', '.yaml','.json', '.jsonl', '.ipynb', '.h', '.c', '.sql', '.csv']

File exclusion: Modify the excluded_patterns list to customize which files are filtered out
Directory exclusion: Modify the EXCLUDED_DIRS list to customize which directories are skipped

Web Crawling

Crawl depth: Use --crawl-max-depth N command-line option (default: 3)
Page limits: Use --crawl-max-pages N command-line option (default: 1000)
URL filtering: Use --crawl-include-pattern and --crawl-exclude-pattern for precise control
Content control: Use various --crawl-* flags for images, code blocks, headings, etc.
Performance: Use --crawl-concurrency N and --crawl-delay X for request management
Compliance: Use --crawl-respect-robots to honor robots.txt files

General

Token counts are displayed in the console for both output files
Environment variables can be set via .env file for easier configuration
All web crawling configuration is now done via command-line arguments rather than code modification

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
docs		docs
extras		extras
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
onefilellm.py		onefilellm.py
readme.md		readme.md
requirements.txt		requirements.txt
utils.py		utils.py

License

jimmc414/onefilellm

Folders and files

Latest commit

History

Repository files navigation

OneFileLLM: Efficient Data Aggregation for LLM Ingestion

Features

Installation

Prerequisites

GitHub Personal Access Token

Setup

Usage

Text Stream Processing

Processing from Standard Input

Processing from Clipboard

Format Detection and Override

Multiple Inputs

Using Aliases

Managing Aliases

Placeholder Support

Core Aliases

Alias Precedence and Storage

Using Aliases

Advanced Web Crawling

Basic Web Crawling

URL Pattern Filtering

Content Control

HTML Processing Options

Crawling Behavior

Complete Advanced Example

Web Crawler Options Reference

Expected Inputs and Resulting Outputs

Configuration

File Type Configuration

Web Crawling Configuration

Environment Variables

Obtaining a GitHub Personal Access Token

XML Output Format

Single Source Output

Multiple Sources Output

Data Flow Diagram

Recent Changes

Notes

Repository Processing

Web Crawling

General

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 8

Languages