-
Notifications
You must be signed in to change notification settings - Fork 317
Description
Add LanguageTool Grammar Checking to Linting Pipeline
Problem Statement
The current linting configuration (Vale + write-good) catches style issues but does not detect grammar errors like subject-verb agreement, verb tense errors, or other grammatical mistakes.
Example: Line 6 in DOCS-FRONTMATTER.md contains two errors that are not caught:
- This should throws an error- Grammar error: "should throws" → should be "should throw" (modal verb requires base form)
- Formatting error: Inconsistent indentation (caught by remark-lint but only auto-fixed, not flagged)
Current Linting Tools Limitations
Vale with write-good detects:
- ✅ Passive voice, wordy phrases, weak adverbs
- ✅ Consistency with style guides
- ✅ Terminology and branding
Vale does NOT detect:
- ❌ Subject-verb agreement ("This should throws")
- ❌ Verb tense consistency
- ❌ Traditional grammar errors
- ❌ Pronoun-antecedent agreement
- ❌ Articles (a/an/the) misuse
Proposed Solution
Integrate LanguageTool for grammar checking, with primary focus on GitHub Actions workflows rather than local pre-commit hooks.
Why LanguageTool?
- ✅ Detects 25+ types of grammar errors
- ✅ Open source and free
- ✅ Can run locally (Docker) or as a service
- ✅ Highly accurate for technical writing
- ✅ Active development and community
Implementation Options
Option 1: GitHub Actions Integration (RECOMMENDED)
Best for: CI/CD pipelines, PR reviews, minimal local overhead
Implementation:
- Add LanguageTool action to
.github/workflows/pr-grammar-check.yml:
name: Grammar Check
on:
pull_request:
paths:
- 'content/**/*.md'
- '*[A-Z]*.md'
- '.github/**/*.md'
- '.claude/**/*.md'
- 'api-docs/README.md'
jobs:
grammar-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start LanguageTool
run: |
docker run -d -p 8010:8010 \
--name languagetool \
erikvl87/languagetool:latest
# Wait for service to be ready
timeout 30 bash -c 'until curl -s http://localhost:8010/v2/check > /dev/null; do sleep 1; done'
- name: Install dependencies
run: sudo apt-get install -y jq
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v44
with:
files: |
content/**/*.md
*[A-Z]*.md
.github/**/*.md
.claude/**/*.md
api-docs/README.md
- name: Check grammar
if: steps.changed-files.outputs.any_changed == 'true'
run: |
.ci/languagetool/check.sh ${{ steps.changed-files.outputs.all_changed_files }}
- name: Cleanup
if: always()
run: docker stop languagetool-
Create
.ci/languagetool/check.shscript (see Option 2 for script details) -
Add job to existing PR validation workflow or create separate workflow
Pros:
- ✅ No local development overhead
- ✅ Catches issues before merge
- ✅ Easy to skip/disable for contributors
- ✅ Centralized reporting in PR comments
- ✅ Can cache results for unchanged files
Cons:
⚠️ Slower feedback loop (only on PR)⚠️ Adds ~30-60 seconds to CI runtime
Option 2: Local Docker Integration
Best for: Catching issues early, pre-push validation
Implementation:
- Add to
compose.yaml:
services:
languagetool:
image: erikvl87/languagetool
container_name: languagetool-server
ports:
- "8010:8010"
environment:
- Java_Xms=512m
- Java_Xmx=1g
profiles:
- lint
- grammar- Create
.ci/languagetool/check.sh:
#!/bin/bash
set -e
if ! curl -s http://localhost:8010/v2/check > /dev/null 2>&1; then
echo "❌ LanguageTool not running. Start with: docker compose up -d languagetool"
exit 1
fi
EXIT_CODE=0
for file in "$@"; do
[[ ! -f "$file" ]] && continue
text=$(cat "$file")
response=$(curl -s -X POST "http://localhost:8010/v2/check" \
--data-urlencode "text=$text" \
--data "language=en-US")
match_count=$(echo "$response" | jq -r '.matches | length')
if [[ "$match_count" -gt 0 ]]; then
echo ""
echo "📝 Grammar issues in $file:"
echo "$response" | jq -r '.matches[] |
" \(.message)\n Context: \(.context.text)\n Suggestion: \(.replacements[0].value // "N/A")"'
echo ""
EXIT_CODE=1
fi
done
[[ $EXIT_CODE -eq 0 ]] && echo "✅ No grammar issues found"
exit $EXIT_CODE- Add to
lefthook.yml(pre-push recommended):
pre-push:
commands:
grammar-check:
tags: grammar
glob: "{README.md,*[A-Z]*.md,.github/**/*.md,.claude/**/*.md,api-docs/README.md}"
run: |
docker compose up -d languagetool 2>/dev/null || true
sleep 2
.ci/languagetool/check.sh {staged_files}- Add to
package.json:
{
"scripts": {
"grammar:start": "docker compose up -d languagetool",
"grammar:stop": "docker compose stop languagetool",
"grammar:check": ".ci/languagetool/check.sh"
}
}Resource Requirements:
- Docker image: ~500MB
- Memory: 512MB-1GB RAM
- Startup: ~10-15 seconds
- Per-file check: ~1-2 seconds
Pros:
- ✅ Immediate feedback during development
- ✅ Runs locally (no external API)
- ✅ Privacy (content stays local)
- ✅ Can be skipped with
--no-verify
Cons:
⚠️ Slower commits/pushes⚠️ Requires LanguageTool service running⚠️ 500MB Docker image overhead
Option 3: Node.js with Public API
Best for: Minimal setup, no Docker
Implementation:
yarn add -D languagetool-api// .ci/languagetool/check.js
import { check } from 'languagetool-api';
import fs from 'fs';
const files = process.argv.slice(2);
let hasErrors = false;
for (const file of files) {
if (!fs.existsSync(file)) continue;
const text = fs.readFileSync(file, 'utf8');
const result = await check({
text,
language: 'en-US',
apiUrl: 'https://api.languagetool.org/v2'
});
if (result.matches.length > 0) {
console.log(`\n📝 Grammar issues in ${file}:`);
result.matches.forEach(match => {
console.log(` ${match.message}`);
console.log(` Context: ${match.context.text}\n`);
});
hasErrors = true;
}
}
process.exit(hasErrors ? 1 : 0);Pros:
- ✅ No Docker required
- ✅ Fast startup
- ✅ Simple setup
Cons:
⚠️ Rate limits (20 requests/min on free tier)⚠️ Network dependency⚠️ Privacy concerns (sends content externally)
Option 4: Custom Vale Rules (Limited)
Best for: Quick fix for specific patterns
Create .ci/vale/styles/InfluxDataDocs/SubjectVerbAgreement.yml:
extends: existence
message: "Possible subject-verb agreement error: '%s'"
level: warning
ignorecase: false
tokens:
- '\bshould\s+\w+s\b'
- '\bwill\s+\w+s\b'
- '\bcan\s+\w+s\b'
- '\bmust\s+\w+s\b'Pros:
- ✅ No additional dependencies
- ✅ Very fast
Cons:
⚠️ Limited coverage (only catches specific patterns)⚠️ False positives⚠️ Won't catch: "They was", "He don't", etc.
Comparison Matrix
| Solution | Grammar Detection | Setup | Performance | Maintenance | Privacy | CI/CD Ready |
|---|---|---|---|---|---|---|
| GitHub Actions (Option 1) | ✅ Excellent | Medium | Slow (~30-60s) | Low | ✅ Local | ✅ Yes |
| Local Docker (Option 2) | ✅ Excellent | Medium | Slow (~2-5s/file) | Low | ✅ Local | |
| Node.js API (Option 3) | ✅ Good | Low | Medium (~1s/file) | Low | ❌ External | ✅ Yes |
| Custom Vale (Option 4) | Low | Fast (<0.1s) | Medium | ✅ Local | ✅ Yes |
Recommended Implementation Plan
Phase 1: GitHub Actions Only (Recommended Starting Point)
- Add
.github/workflows/pr-grammar-check.yml - Create
.ci/languagetool/check.shscript - Test on a few PRs
- Gather feedback from contributors
Timeline: 1-2 hours
Impact: Catches grammar errors in PRs with zero local overhead
Phase 2: Local Integration (Optional)
- Add LanguageTool service to
compose.yaml - Add pre-push hook to
lefthook.yml - Document usage in
DOCS-CONTRIBUTING.md
Timeline: 1 hour
Impact: Faster feedback for contributors who want it
Phase 3: Optimization (Future)
- Cache LanguageTool results for unchanged files
- Add PR comment integration for inline feedback
- Whitelist technical terms to reduce false positives
Test Case
After implementation, this should be caught:
# DOCS-FRONTMATTER.md:6
- This should throws an error
Expected output:
📝 Grammar issues in DOCS-FRONTMATTER.md:
The modal verb 'should' requires base form.
Context: This should throws an error
Suggestion: throwAlternative: Document Current Limitations
If we decide NOT to implement LanguageTool, we should document Vale's limitations:
Update DOCS-TESTING.md:
### Style Linting vs Grammar Checking
**Vale checks:**
- ✅ Style (passive voice, wordiness, weak adverbs)
- ✅ Terminology and branding consistency
**Vale does NOT check:**
- ❌ Grammar (subject-verb agreement, tense, etc.)
**For grammar checking, use:**
- Grammarly (browser extension)
- Microsoft Editor (Word/browser)
- LanguageTool (VS Code extension)Related Issues
- Vale configuration and style linting
- Pre-commit hook improvements
References
Next Steps
- Decide on implementation approach (GitHub Actions vs Local vs Both)
- Create test PR with intentional grammar errors
- Review false positive rate on existing documentation
- Document workflow in
DOCS-CONTRIBUTING.mdorDOCS-TESTING.md - Gather contributor feedback after 2-3 weeks
Questions to Answer
- Should grammar checking be blocking (fail CI) or informational (comment only)?
- Should we check all Markdown files or only instruction/documentation files?
- What's the acceptable false positive rate before we need to add exceptions?
- Should we run grammar checks on every PR or only when content files change?
- Do we want inline PR comments or just a summary report?
Priority: Medium
Effort: Small (GitHub Actions) | Medium (Local Docker)
Impact: Improves documentation quality by catching grammar errors Vale cannot detect