Skip to content

Proposal: add grammar checking to linting and CI/CD pipeline #6480

@jstirnaman

Description

@jstirnaman

Add LanguageTool Grammar Checking to Linting Pipeline

Problem Statement

The current linting configuration (Vale + write-good) catches style issues but does not detect grammar errors like subject-verb agreement, verb tense errors, or other grammatical mistakes.

Example: Line 6 in DOCS-FRONTMATTER.md contains two errors that are not caught:

   - This should throws an error
  1. Grammar error: "should throws" → should be "should throw" (modal verb requires base form)
  2. Formatting error: Inconsistent indentation (caught by remark-lint but only auto-fixed, not flagged)

Current Linting Tools Limitations

Vale with write-good detects:

  • ✅ Passive voice, wordy phrases, weak adverbs
  • ✅ Consistency with style guides
  • ✅ Terminology and branding

Vale does NOT detect:

  • ❌ Subject-verb agreement ("This should throws")
  • ❌ Verb tense consistency
  • ❌ Traditional grammar errors
  • ❌ Pronoun-antecedent agreement
  • ❌ Articles (a/an/the) misuse

Proposed Solution

Integrate LanguageTool for grammar checking, with primary focus on GitHub Actions workflows rather than local pre-commit hooks.

Why LanguageTool?

  • ✅ Detects 25+ types of grammar errors
  • ✅ Open source and free
  • ✅ Can run locally (Docker) or as a service
  • ✅ Highly accurate for technical writing
  • ✅ Active development and community

Implementation Options

Option 1: GitHub Actions Integration (RECOMMENDED)

Best for: CI/CD pipelines, PR reviews, minimal local overhead

Implementation:

  1. Add LanguageTool action to .github/workflows/pr-grammar-check.yml:
name: Grammar Check

on:
  pull_request:
    paths:
      - 'content/**/*.md'
      - '*[A-Z]*.md'
      - '.github/**/*.md'
      - '.claude/**/*.md'
      - 'api-docs/README.md'

jobs:
  grammar-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start LanguageTool
        run: |
          docker run -d -p 8010:8010 \
            --name languagetool \
            erikvl87/languagetool:latest

          # Wait for service to be ready
          timeout 30 bash -c 'until curl -s http://localhost:8010/v2/check > /dev/null; do sleep 1; done'

      - name: Install dependencies
        run: sudo apt-get install -y jq

      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v44
        with:
          files: |
            content/**/*.md
            *[A-Z]*.md
            .github/**/*.md
            .claude/**/*.md
            api-docs/README.md

      - name: Check grammar
        if: steps.changed-files.outputs.any_changed == 'true'
        run: |
          .ci/languagetool/check.sh ${{ steps.changed-files.outputs.all_changed_files }}

      - name: Cleanup
        if: always()
        run: docker stop languagetool
  1. Create .ci/languagetool/check.sh script (see Option 2 for script details)

  2. Add job to existing PR validation workflow or create separate workflow

Pros:

  • ✅ No local development overhead
  • ✅ Catches issues before merge
  • ✅ Easy to skip/disable for contributors
  • ✅ Centralized reporting in PR comments
  • ✅ Can cache results for unchanged files

Cons:

  • ⚠️ Slower feedback loop (only on PR)
  • ⚠️ Adds ~30-60 seconds to CI runtime

Option 2: Local Docker Integration

Best for: Catching issues early, pre-push validation

Implementation:

  1. Add to compose.yaml:
services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool-server
    ports:
      - "8010:8010"
    environment:
      - Java_Xms=512m
      - Java_Xmx=1g
    profiles:
      - lint
      - grammar
  1. Create .ci/languagetool/check.sh:
#!/bin/bash
set -e

if ! curl -s http://localhost:8010/v2/check > /dev/null 2>&1; then
    echo "❌ LanguageTool not running. Start with: docker compose up -d languagetool"
    exit 1
fi

EXIT_CODE=0

for file in "$@"; do
    [[ ! -f "$file" ]] && continue

    text=$(cat "$file")
    response=$(curl -s -X POST "http://localhost:8010/v2/check" \
        --data-urlencode "text=$text" \
        --data "language=en-US")

    match_count=$(echo "$response" | jq -r '.matches | length')

    if [[ "$match_count" -gt 0 ]]; then
        echo ""
        echo "📝 Grammar issues in $file:"
        echo "$response" | jq -r '.matches[] |
            "  \(.message)\n    Context: \(.context.text)\n    Suggestion: \(.replacements[0].value // "N/A")"'
        echo ""
        EXIT_CODE=1
    fi
done

[[ $EXIT_CODE -eq 0 ]] && echo "✅ No grammar issues found"
exit $EXIT_CODE
  1. Add to lefthook.yml (pre-push recommended):
pre-push:
  commands:
    grammar-check:
      tags: grammar
      glob: "{README.md,*[A-Z]*.md,.github/**/*.md,.claude/**/*.md,api-docs/README.md}"
      run: |
        docker compose up -d languagetool 2>/dev/null || true
        sleep 2
        .ci/languagetool/check.sh {staged_files}
  1. Add to package.json:
{
  "scripts": {
    "grammar:start": "docker compose up -d languagetool",
    "grammar:stop": "docker compose stop languagetool",
    "grammar:check": ".ci/languagetool/check.sh"
  }
}

Resource Requirements:

  • Docker image: ~500MB
  • Memory: 512MB-1GB RAM
  • Startup: ~10-15 seconds
  • Per-file check: ~1-2 seconds

Pros:

  • ✅ Immediate feedback during development
  • ✅ Runs locally (no external API)
  • ✅ Privacy (content stays local)
  • ✅ Can be skipped with --no-verify

Cons:

  • ⚠️ Slower commits/pushes
  • ⚠️ Requires LanguageTool service running
  • ⚠️ 500MB Docker image overhead

Option 3: Node.js with Public API

Best for: Minimal setup, no Docker

Implementation:

yarn add -D languagetool-api
// .ci/languagetool/check.js
import { check } from 'languagetool-api';
import fs from 'fs';

const files = process.argv.slice(2);
let hasErrors = false;

for (const file of files) {
  if (!fs.existsSync(file)) continue;

  const text = fs.readFileSync(file, 'utf8');
  const result = await check({
    text,
    language: 'en-US',
    apiUrl: 'https://api.languagetool.org/v2'
  });

  if (result.matches.length > 0) {
    console.log(`\n📝 Grammar issues in ${file}:`);
    result.matches.forEach(match => {
      console.log(`  ${match.message}`);
      console.log(`    Context: ${match.context.text}\n`);
    });
    hasErrors = true;
  }
}

process.exit(hasErrors ? 1 : 0);

Pros:

  • ✅ No Docker required
  • ✅ Fast startup
  • ✅ Simple setup

Cons:

  • ⚠️ Rate limits (20 requests/min on free tier)
  • ⚠️ Network dependency
  • ⚠️ Privacy concerns (sends content externally)

Option 4: Custom Vale Rules (Limited)

Best for: Quick fix for specific patterns

Create .ci/vale/styles/InfluxDataDocs/SubjectVerbAgreement.yml:

extends: existence
message: "Possible subject-verb agreement error: '%s'"
level: warning
ignorecase: false
tokens:
  - '\bshould\s+\w+s\b'
  - '\bwill\s+\w+s\b'
  - '\bcan\s+\w+s\b'
  - '\bmust\s+\w+s\b'

Pros:

  • ✅ No additional dependencies
  • ✅ Very fast

Cons:

  • ⚠️ Limited coverage (only catches specific patterns)
  • ⚠️ False positives
  • ⚠️ Won't catch: "They was", "He don't", etc.

Comparison Matrix

Solution Grammar Detection Setup Performance Maintenance Privacy CI/CD Ready
GitHub Actions (Option 1) ✅ Excellent Medium Slow (~30-60s) Low ✅ Local ✅ Yes
Local Docker (Option 2) ✅ Excellent Medium Slow (~2-5s/file) Low ✅ Local ⚠️ Partial
Node.js API (Option 3) ✅ Good Low Medium (~1s/file) Low ❌ External ✅ Yes
Custom Vale (Option 4) ⚠️ Limited Low Fast (<0.1s) Medium ✅ Local ✅ Yes

Recommended Implementation Plan

Phase 1: GitHub Actions Only (Recommended Starting Point)

  1. Add .github/workflows/pr-grammar-check.yml
  2. Create .ci/languagetool/check.sh script
  3. Test on a few PRs
  4. Gather feedback from contributors

Timeline: 1-2 hours
Impact: Catches grammar errors in PRs with zero local overhead

Phase 2: Local Integration (Optional)

  1. Add LanguageTool service to compose.yaml
  2. Add pre-push hook to lefthook.yml
  3. Document usage in DOCS-CONTRIBUTING.md

Timeline: 1 hour
Impact: Faster feedback for contributors who want it

Phase 3: Optimization (Future)

  1. Cache LanguageTool results for unchanged files
  2. Add PR comment integration for inline feedback
  3. Whitelist technical terms to reduce false positives

Test Case

After implementation, this should be caught:

# DOCS-FRONTMATTER.md:6
   - This should throws an error

Expected output:
📝 Grammar issues in DOCS-FRONTMATTER.md:
  The modal verb 'should' requires base form.
    Context: This should throws an error
    Suggestion: throw

Alternative: Document Current Limitations

If we decide NOT to implement LanguageTool, we should document Vale's limitations:

Update DOCS-TESTING.md:

### Style Linting vs Grammar Checking

**Vale checks:**
- ✅ Style (passive voice, wordiness, weak adverbs)
- ✅ Terminology and branding consistency

**Vale does NOT check:**
- ❌ Grammar (subject-verb agreement, tense, etc.)

**For grammar checking, use:**
- Grammarly (browser extension)
- Microsoft Editor (Word/browser)
- LanguageTool (VS Code extension)

Related Issues

  • Vale configuration and style linting
  • Pre-commit hook improvements

References


Next Steps

  1. Decide on implementation approach (GitHub Actions vs Local vs Both)
  2. Create test PR with intentional grammar errors
  3. Review false positive rate on existing documentation
  4. Document workflow in DOCS-CONTRIBUTING.md or DOCS-TESTING.md
  5. Gather contributor feedback after 2-3 weeks

Questions to Answer

  • Should grammar checking be blocking (fail CI) or informational (comment only)?
  • Should we check all Markdown files or only instruction/documentation files?
  • What's the acceptable false positive rate before we need to add exceptions?
  • Should we run grammar checks on every PR or only when content files change?
  • Do we want inline PR comments or just a summary report?

Priority: Medium
Effort: Small (GitHub Actions) | Medium (Local Docker)
Impact: Improves documentation quality by catching grammar errors Vale cannot detect

Metadata

Metadata

Assignees

No one assigned

    Labels

    ProposalProposed changes and updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions