SummarizeHub

A browser-based article summarization tool that uses WebLLM to run open-source language models directly in your browser. Upload articles in bulk via CSV/XLSX and get AI-generated summaries and tags without sending your data to external servers.

Features

🚀 Browser-based LLM processing - no server required
📊 Bulk article processing via CSV/XLSX uploads
🔒 Privacy-focused - all processing happens locally
📝 Generates concise summaries and relevant tags
📋 Interactive data grid for easy content management
💾 Export results in XLSX or CSV format

Installation

Note: the app is currently deployed at https://summarize-hub.vercel.app/

Clone the repository:

git clone https://github.com/scottguthart/summarize-hub.git
cd summarize-hub

Install dependencies:

npm install

Run the development server:

npm run dev

Open http://localhost:3000 in your browser

How It Works

Summarization Approach

SummarizeHub uses @mlc-ai/web-llm to run open-source language models directly in your browser. Currently, it uses the Gemma 2B-IT model (quantized to 4-bit for efficiency) for generating summaries and tags.

Key benefits of this approach:

Privacy: All processing happens locally in your browser
No API Keys: No need for API keys or usage limits
Free: No ongoing costs for API calls
Offline Capable: Once the model is downloaded, it can work offline

Model Details

Model: gemma-2-2b-it-q4f16_1-MLC
Size: ~1.2GB (downloaded once and cached)
Capabilities: Instruction-tuned for better task understanding
Format: 4-bit quantized for browser efficiency

Usage

Click "Download Template" to get the article template
Fill in your articles in the template (maximum 250 articles per batch)
Click "Load File" to upload your filled template
Click "Process Articles" to process all articles
Export the results using "Export XLSX" or "Export CSV"

Limitations

Browser-Based Processing:
- Models are downloaded and stored in the browser
- Not ideal for users with limited bandwidth or storage
- Processing speed depends on client hardware
- Browser storage limits may affect model caching
Article Processing:
- Maximum article length is 30,000 characters (~4,000 words or 10 single-spaced pages)
- Articles are automatically chunked for efficient processing
- Long articles are split into smaller segments at sentence boundaries
- Each segment is summarized independently and then combined
- Processing time increases with article length
- Articles exceeding the length limit will be marked with an error
Batch Processing:
- Maximum 250 articles per batch for optimal performance
- 10MB file size limit
- Larger datasets should be split into multiple batches
- Processing time varies based on article length and complexity
Model Size:
- Initial model download is ~1.2GB
- Requires stable internet connection for first use
- Model is cached in browser for subsequent use

Planned Upgrades

Short Term

Support for multiple models
- Allow switching between different models
- Compare summaries from different models
- Model performance metrics
Enhanced Data Validation
- Column data type validation
- Required field checks
- Custom validation rules
Flexible File Uploads
- Support more file formats
- Less rigid column requirements
- Custom column mapping

Long Term

Backend Integration
- User authentication
- Database storage
- API endpoints for programmatic access
Advanced Features
- Concurrent summarization
- Custom summary instructions
- Model cache management
- Progress persistence
- Batch processing controls

Infrastructure

Server-Side Processing Option
- Offload processing from client
- Support for larger models
- Better handling of large datasets
Progressive Web App
- Offline support
- Background processing
- Push notifications

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
lib		lib
public		public
types		types
.gitignore		.gitignore
2_articles.json		2_articles.json
README.md		README.md
Untitled.png		Untitled.png
next.config.js		next.config.js
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
test_articles.csv		test_articles.csv
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SummarizeHub

Features

Installation

How It Works

Summarization Approach

Model Details

Usage

Limitations

Planned Upgrades

Short Term

Long Term

Infrastructure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ScottGuthart/summarize-hub

Folders and files

Latest commit

History

Repository files navigation

SummarizeHub

Features

Installation

How It Works

Summarization Approach

Model Details

Usage

Limitations

Planned Upgrades

Short Term

Long Term

Infrastructure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages