|
| 1 | +--- |
| 2 | +title: "Automating Document Workflows: Batch Processing and Summarizing Files |
| 3 | + with GenAIScript" |
| 4 | +date: 2025-05-26 |
| 5 | +authors: genaiscript |
| 6 | +tags: |
| 7 | + - automation |
| 8 | + - batch-processing |
| 9 | + - document-summarization |
| 10 | + - genaiscript |
| 11 | + - workflows |
| 12 | +group: automation |
| 13 | +draft: true |
| 14 | +description: Automatically discover, batch process, and summarize multiple |
| 15 | + documentation files efficiently using GenAIScript. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +# "Automating Document Workflows: Batch Processing and Summarizing Files with GenAIScript" |
| 20 | + |
| 21 | +Efficiently processing and summarizing large sets of documentation can be a daunting task. With GenAIScript, you can automate the discovery, batch processing, and summarization of a wide range of documentation files with just a few lines of code! 🚀 In this post, we'll break down a GenAIScript that does exactly that—*explaining every step along the way*. |
| 22 | + |
| 23 | +Let's dive in! 👇 |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Introduction |
| 28 | + |
| 29 | +Imagine you have a large repository of Markdown or MDX documentation files. You want to process them in small groups (batches), generate concise AI summaries for each file, and then combine the results into an overview. This script shows you how to make that workflow seamless and scalable using GenAIScript’s automation capabilities. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## Step-by-Step: How the Script Works |
| 34 | + |
| 35 | +### 1. Script Metadata and Parameters |
| 36 | + |
| 37 | +```typescript |
| 38 | +title = "Automating Document Workflows: Batch Processing and Summarizing Files with GenAIScript" |
| 39 | +description = "Automatically discover, batch process, and summarize multiple documentation files efficiently using GenAIScript." |
| 40 | +group = "automation" |
| 41 | +``` |
| 42 | + |
| 43 | +- **Purpose & Visibility**: These lines define the script's title, description, and documentation group. This helps organize scripts when they appear in lists or dashboards. |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +```typescript |
| 48 | +parameters = { |
| 49 | + fileGlob: { |
| 50 | + type: "string", |
| 51 | + description: "Glob pattern for files to process", |
| 52 | + default: "docs/src/content/docs/**/*.md*" |
| 53 | + }, |
| 54 | + batchSize: { |
| 55 | + type: "number", |
| 56 | + description: "Number of files to process in each batch", |
| 57 | + default: 5 |
| 58 | + } |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +- **Parameters**: The script is configurable! |
| 63 | + - `fileGlob` specifies which files to process using a glob pattern (e.g., all `.md` and `.mdx` files). |
| 64 | + - `batchSize` determines how many files to process in one go. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +```typescript |
| 69 | +files = fileGlob |
| 70 | +accept = ".md,.mdx" |
| 71 | +``` |
| 72 | + |
| 73 | +- **File Discovery**: These lines tell GenAIScript to only consider files matching the `fileGlob` pattern and extensions `.md` or `.mdx`. |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## 2. Batch Processing Helper |
| 78 | + |
| 79 | +```typescript |
| 80 | +async function* batchFiles(files, batchSize) { |
| 81 | + let batch = [] |
| 82 | + for (const file of files) { |
| 83 | + batch.push(file) |
| 84 | + if (batch.length >= batchSize) { |
| 85 | + yield batch |
| 86 | + batch = [] |
| 87 | + } |
| 88 | + } |
| 89 | + if (batch.length > 0) yield batch |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +- **`batchFiles` Generator**: |
| 94 | + - This async generator splits an array of files into chunks (batches) of `batchSize`. |
| 95 | + - For each file, it adds it to the current `batch`. When the batch reaches the desired size, it yields that batch and starts a new one. |
| 96 | + - After looping, any remaining files (if not evenly divisible) are yielded as a smaller batch. |
| 97 | + - 💡 Generators like these make working with large datasets *memory-efficient*. |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +## 3. Summarizing Files |
| 102 | + |
| 103 | +```typescript |
| 104 | +async function summarizeBatch(batch) { |
| 105 | + const summaries = [] |
| 106 | + for (const file of batch) { |
| 107 | + const content = await fs_read_file({ filename: file }) |
| 108 | + const summary = await ai("Summarize the following documentation file in 2-3 bullet points:", content) |
| 109 | + summaries.push({ file, summary }) |
| 110 | + } |
| 111 | + return summaries |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +- **`summarizeBatch`**: |
| 116 | + - For each file in the batch: |
| 117 | + - `fs_read_file` reads the file content. |
| 118 | + - `ai()` invokes GenAIScript’s AI summarization: the file’s contents are summarized into 2-3 bullet points. |
| 119 | + - The filename and its summary are stored as an object in the `summaries` list. |
| 120 | + - The result: a list of summary objects, one per file in the batch. |
| 121 | + - 📝 This isolates the summarization logic, making it reusable and clear. |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## 4. Main Workflow |
| 126 | + |
| 127 | +```typescript |
| 128 | +async function main({ fileGlob, batchSize }, trace) { |
| 129 | + trace.heading(1, `Batch Processing and Summarization for: ${fileGlob}`) |
| 130 | + const files = (await fs_find_files({ glob: fileGlob, count: 100 })).map(f => f.filename) |
| 131 | + let allSummaries = [] |
| 132 | + let batchIndex = 0 |
| 133 | + for await (const batch of batchFiles(files, batchSize)) { |
| 134 | + trace.heading(2, `Batch ${++batchIndex}`) |
| 135 | + const summaries = await summarizeBatch(batch) |
| 136 | + allSummaries.push(...summaries) |
| 137 | + for (const { file, summary } of summaries) { |
| 138 | + trace.detailsFenced(file, summary, "markdown") |
| 139 | + } |
| 140 | + } |
| 141 | + trace.heading(1, "Combined Summary") |
| 142 | + for (const { file, summary } of allSummaries) { |
| 143 | + trace.item(`- **${file}**: ${summary}`) |
| 144 | + } |
| 145 | + return allSummaries |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +Let's break this down: |
| 150 | + |
| 151 | +- **trace.heading(1, ...)**: Adds a primary heading to the output, indicating which file glob is being processed. |
| 152 | +- **fs_find_files**: Finds files matching the glob pattern, limited to 100 by default (adjust as needed). |
| 153 | +- **Processing Batches**: |
| 154 | + - Uses `batchFiles` to iterate over files in batches. |
| 155 | + - For each batch: |
| 156 | + - Adds a subheading for clarity. |
| 157 | + - Calls `summarizeBatch` to generate summaries. |
| 158 | + - Displays each file’s summary using `trace.detailsFenced`, nicely formatted in markdown. |
| 159 | + - Summaries from each batch are accumulated in `allSummaries`. |
| 160 | +- **Combined View**: |
| 161 | + - After all batches, a final heading and markdown-formatted list of all file summaries is printed. |
| 162 | +- **Return Value**: The full summary list is returned for further processing or inspection. |
| 163 | + |
| 164 | +--- |
| 165 | + |
| 166 | +## Wrapping Up 🎉 |
| 167 | + |
| 168 | +This GenAIScript demonstrates how you can: |
| 169 | + |
| 170 | +- **Automatically discover** relevant files using glob patterns. |
| 171 | +- **Batch-process** files to manage resource usage. |
| 172 | +- **Generate concise summaries** for documentation using built-in AI integration. |
| 173 | +- **Aggregate and format** results for easy review. |
| 174 | + |
| 175 | +If you need to process large documentation sets regularly, customizing this workflow can save countless hours and provide consistent, high-quality overviews of your knowledge base. |
| 176 | + |
| 177 | +Looking for more automation advice and script samples? Check out the [official documentation](https://microsoft.github.io/genaiscript/) or browse community samples in `packages/sample/src/**/*.genai.*js`. |
| 178 | + |
| 179 | +Happy automating! ✨ |
0 commit comments