Skip to content

Commit 30c850a

Browse files
pelikhangithub-actions[bot]
authored andcommitted
[genai] generated blog posts
1 parent c6bad43 commit 30c850a

File tree

1 file changed

+179
-0
lines changed

1 file changed

+179
-0
lines changed
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
title: "Automating Document Workflows: Batch Processing and Summarizing Files
3+
with GenAIScript"
4+
date: 2025-05-26
5+
authors: genaiscript
6+
tags:
7+
- automation
8+
- batch-processing
9+
- document-summarization
10+
- genaiscript
11+
- workflows
12+
group: automation
13+
draft: true
14+
description: Automatically discover, batch process, and summarize multiple
15+
documentation files efficiently using GenAIScript.
16+
17+
---
18+
19+
# "Automating Document Workflows: Batch Processing and Summarizing Files with GenAIScript"
20+
21+
Efficiently processing and summarizing large sets of documentation can be a daunting task. With GenAIScript, you can automate the discovery, batch processing, and summarization of a wide range of documentation files with just a few lines of code! 🚀 In this post, we'll break down a GenAIScript that does exactly that—*explaining every step along the way*.
22+
23+
Let's dive in! 👇
24+
25+
---
26+
27+
## Introduction
28+
29+
Imagine you have a large repository of Markdown or MDX documentation files. You want to process them in small groups (batches), generate concise AI summaries for each file, and then combine the results into an overview. This script shows you how to make that workflow seamless and scalable using GenAIScript’s automation capabilities.
30+
31+
---
32+
33+
## Step-by-Step: How the Script Works
34+
35+
### 1. Script Metadata and Parameters
36+
37+
```typescript
38+
title = "Automating Document Workflows: Batch Processing and Summarizing Files with GenAIScript"
39+
description = "Automatically discover, batch process, and summarize multiple documentation files efficiently using GenAIScript."
40+
group = "automation"
41+
```
42+
43+
- **Purpose & Visibility**: These lines define the script's title, description, and documentation group. This helps organize scripts when they appear in lists or dashboards.
44+
45+
---
46+
47+
```typescript
48+
parameters = {
49+
fileGlob: {
50+
type: "string",
51+
description: "Glob pattern for files to process",
52+
default: "docs/src/content/docs/**/*.md*"
53+
},
54+
batchSize: {
55+
type: "number",
56+
description: "Number of files to process in each batch",
57+
default: 5
58+
}
59+
}
60+
```
61+
62+
- **Parameters**: The script is configurable!
63+
- `fileGlob` specifies which files to process using a glob pattern (e.g., all `.md` and `.mdx` files).
64+
- `batchSize` determines how many files to process in one go.
65+
66+
---
67+
68+
```typescript
69+
files = fileGlob
70+
accept = ".md,.mdx"
71+
```
72+
73+
- **File Discovery**: These lines tell GenAIScript to only consider files matching the `fileGlob` pattern and extensions `.md` or `.mdx`.
74+
75+
---
76+
77+
## 2. Batch Processing Helper
78+
79+
```typescript
80+
async function* batchFiles(files, batchSize) {
81+
let batch = []
82+
for (const file of files) {
83+
batch.push(file)
84+
if (batch.length >= batchSize) {
85+
yield batch
86+
batch = []
87+
}
88+
}
89+
if (batch.length > 0) yield batch
90+
}
91+
```
92+
93+
- **`batchFiles` Generator**:
94+
- This async generator splits an array of files into chunks (batches) of `batchSize`.
95+
- For each file, it adds it to the current `batch`. When the batch reaches the desired size, it yields that batch and starts a new one.
96+
- After looping, any remaining files (if not evenly divisible) are yielded as a smaller batch.
97+
- 💡 Generators like these make working with large datasets *memory-efficient*.
98+
99+
---
100+
101+
## 3. Summarizing Files
102+
103+
```typescript
104+
async function summarizeBatch(batch) {
105+
const summaries = []
106+
for (const file of batch) {
107+
const content = await fs_read_file({ filename: file })
108+
const summary = await ai("Summarize the following documentation file in 2-3 bullet points:", content)
109+
summaries.push({ file, summary })
110+
}
111+
return summaries
112+
}
113+
```
114+
115+
- **`summarizeBatch`**:
116+
- For each file in the batch:
117+
- `fs_read_file` reads the file content.
118+
- `ai()` invokes GenAIScript’s AI summarization: the file’s contents are summarized into 2-3 bullet points.
119+
- The filename and its summary are stored as an object in the `summaries` list.
120+
- The result: a list of summary objects, one per file in the batch.
121+
- 📝 This isolates the summarization logic, making it reusable and clear.
122+
123+
---
124+
125+
## 4. Main Workflow
126+
127+
```typescript
128+
async function main({ fileGlob, batchSize }, trace) {
129+
trace.heading(1, `Batch Processing and Summarization for: ${fileGlob}`)
130+
const files = (await fs_find_files({ glob: fileGlob, count: 100 })).map(f => f.filename)
131+
let allSummaries = []
132+
let batchIndex = 0
133+
for await (const batch of batchFiles(files, batchSize)) {
134+
trace.heading(2, `Batch ${++batchIndex}`)
135+
const summaries = await summarizeBatch(batch)
136+
allSummaries.push(...summaries)
137+
for (const { file, summary } of summaries) {
138+
trace.detailsFenced(file, summary, "markdown")
139+
}
140+
}
141+
trace.heading(1, "Combined Summary")
142+
for (const { file, summary } of allSummaries) {
143+
trace.item(`- **${file}**: ${summary}`)
144+
}
145+
return allSummaries
146+
}
147+
```
148+
149+
Let's break this down:
150+
151+
- **trace.heading(1, ...)**: Adds a primary heading to the output, indicating which file glob is being processed.
152+
- **fs_find_files**: Finds files matching the glob pattern, limited to 100 by default (adjust as needed).
153+
- **Processing Batches**:
154+
- Uses `batchFiles` to iterate over files in batches.
155+
- For each batch:
156+
- Adds a subheading for clarity.
157+
- Calls `summarizeBatch` to generate summaries.
158+
- Displays each file’s summary using `trace.detailsFenced`, nicely formatted in markdown.
159+
- Summaries from each batch are accumulated in `allSummaries`.
160+
- **Combined View**:
161+
- After all batches, a final heading and markdown-formatted list of all file summaries is printed.
162+
- **Return Value**: The full summary list is returned for further processing or inspection.
163+
164+
---
165+
166+
## Wrapping Up 🎉
167+
168+
This GenAIScript demonstrates how you can:
169+
170+
- **Automatically discover** relevant files using glob patterns.
171+
- **Batch-process** files to manage resource usage.
172+
- **Generate concise summaries** for documentation using built-in AI integration.
173+
- **Aggregate and format** results for easy review.
174+
175+
If you need to process large documentation sets regularly, customizing this workflow can save countless hours and provide consistent, high-quality overviews of your knowledge base.
176+
177+
Looking for more automation advice and script samples? Check out the [official documentation](https://microsoft.github.io/genaiscript/) or browse community samples in `packages/sample/src/**/*.genai.*js`.
178+
179+
Happy automating! ✨

0 commit comments

Comments
 (0)