Proposal: add grammar checking to linting and CI/CD pipeline

# Add LanguageTool Grammar Checking to Linting Pipeline

## Problem Statement

The current linting configuration (Vale + write-good) catches style issues but **does not detect grammar errors** like subject-verb agreement, verb tense errors, or other grammatical mistakes.

**Example:** Line 6 in `DOCS-FRONTMATTER.md` contains two errors that are not caught:
```markdown
   - This should throws an error
```

1. **Grammar error**: "should throws" → should be "should throw" (modal verb requires base form)
2. **Formatting error**: Inconsistent indentation (caught by remark-lint but only auto-fixed, not flagged)

### Current Linting Tools Limitations

**Vale with write-good detects:**
- ✅ Passive voice, wordy phrases, weak adverbs
- ✅ Consistency with style guides
- ✅ Terminology and branding

**Vale does NOT detect:**
- ❌ Subject-verb agreement ("This should throws")
- ❌ Verb tense consistency
- ❌ Traditional grammar errors
- ❌ Pronoun-antecedent agreement
- ❌ Articles (a/an/the) misuse

## Proposed Solution

Integrate **LanguageTool** for grammar checking, with primary focus on **GitHub Actions workflows** rather than local pre-commit hooks.

### Why LanguageTool?

- ✅ Detects 25+ types of grammar errors
- ✅ Open source and free
- ✅ Can run locally (Docker) or as a service
- ✅ Highly accurate for technical writing
- ✅ Active development and community

## Implementation Options

### Option 1: GitHub Actions Integration (RECOMMENDED)

**Best for:** CI/CD pipelines, PR reviews, minimal local overhead

#### Implementation:

1. **Add LanguageTool action to `.github/workflows/pr-grammar-check.yml`:**

```yaml
name: Grammar Check

on:
  pull_request:
    paths:
      - 'content/**/*.md'
      - '*[A-Z]*.md'
      - '.github/**/*.md'
      - '.claude/**/*.md'
      - 'api-docs/README.md'

jobs:
  grammar-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start LanguageTool
        run: |
          docker run -d -p 8010:8010 \
            --name languagetool \
            erikvl87/languagetool:latest

          # Wait for service to be ready
          timeout 30 bash -c 'until curl -s http://localhost:8010/v2/check > /dev/null; do sleep 1; done'

      - name: Install dependencies
        run: sudo apt-get install -y jq

      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v44
        with:
          files: |
            content/**/*.md
            *[A-Z]*.md
            .github/**/*.md
            .claude/**/*.md
            api-docs/README.md

      - name: Check grammar
        if: steps.changed-files.outputs.any_changed == 'true'
        run: |
          .ci/languagetool/check.sh ${{ steps.changed-files.outputs.all_changed_files }}

      - name: Cleanup
        if: always()
        run: docker stop languagetool
```

2. **Create `.ci/languagetool/check.sh` script** (see Option 2 for script details)

3. **Add job to existing PR validation workflow** or create separate workflow

**Pros:**
- ✅ No local development overhead
- ✅ Catches issues before merge
- ✅ Easy to skip/disable for contributors
- ✅ Centralized reporting in PR comments
- ✅ Can cache results for unchanged files

**Cons:**
- ⚠️ Slower feedback loop (only on PR)
- ⚠️ Adds ~30-60 seconds to CI runtime

---

### Option 2: Local Docker Integration

**Best for:** Catching issues early, pre-push validation

#### Implementation:

1. **Add to `compose.yaml`:**

```yaml
services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool-server
    ports:
      - "8010:8010"
    environment:
      - Java_Xms=512m
      - Java_Xmx=1g
    profiles:
      - lint
      - grammar
```

2. **Create `.ci/languagetool/check.sh`:**

```bash
#!/bin/bash
set -e

if ! curl -s http://localhost:8010/v2/check > /dev/null 2>&1; then
    echo "❌ LanguageTool not running. Start with: docker compose up -d languagetool"
    exit 1
fi

EXIT_CODE=0

for file in "$@"; do
    [[ ! -f "$file" ]] && continue

    text=$(cat "$file")
    response=$(curl -s -X POST "http://localhost:8010/v2/check" \
        --data-urlencode "text=$text" \
        --data "language=en-US")

    match_count=$(echo "$response" | jq -r '.matches | length')

    if [[ "$match_count" -gt 0 ]]; then
        echo ""
        echo "📝 Grammar issues in $file:"
        echo "$response" | jq -r '.matches[] |
            "  \(.message)\n    Context: \(.context.text)\n    Suggestion: \(.replacements[0].value // "N/A")"'
        echo ""
        EXIT_CODE=1
    fi
done

[[ $EXIT_CODE -eq 0 ]] && echo "✅ No grammar issues found"
exit $EXIT_CODE
```

3. **Add to `lefthook.yml` (pre-push recommended):**

```yaml
pre-push:
  commands:
    grammar-check:
      tags: grammar
      glob: "{README.md,*[A-Z]*.md,.github/**/*.md,.claude/**/*.md,api-docs/README.md}"
      run: |
        docker compose up -d languagetool 2>/dev/null || true
        sleep 2
        .ci/languagetool/check.sh {staged_files}
```

4. **Add to `package.json`:**

```json
{
  "scripts": {
    "grammar:start": "docker compose up -d languagetool",
    "grammar:stop": "docker compose stop languagetool",
    "grammar:check": ".ci/languagetool/check.sh"
  }
}
```

**Resource Requirements:**
- Docker image: ~500MB
- Memory: 512MB-1GB RAM
- Startup: ~10-15 seconds
- Per-file check: ~1-2 seconds

**Pros:**
- ✅ Immediate feedback during development
- ✅ Runs locally (no external API)
- ✅ Privacy (content stays local)
- ✅ Can be skipped with `--no-verify`

**Cons:**
- ⚠️ Slower commits/pushes
- ⚠️ Requires LanguageTool service running
- ⚠️ 500MB Docker image overhead

---

### Option 3: Node.js with Public API

**Best for:** Minimal setup, no Docker

#### Implementation:

```bash
yarn add -D languagetool-api
```

```javascript
// .ci/languagetool/check.js
import { check } from 'languagetool-api';
import fs from 'fs';

const files = process.argv.slice(2);
let hasErrors = false;

for (const file of files) {
  if (!fs.existsSync(file)) continue;

  const text = fs.readFileSync(file, 'utf8');
  const result = await check({
    text,
    language: 'en-US',
    apiUrl: 'https://api.languagetool.org/v2'
  });

  if (result.matches.length > 0) {
    console.log(`\n📝 Grammar issues in ${file}:`);
    result.matches.forEach(match => {
      console.log(`  ${match.message}`);
      console.log(`    Context: ${match.context.text}\n`);
    });
    hasErrors = true;
  }
}

process.exit(hasErrors ? 1 : 0);
```

**Pros:**
- ✅ No Docker required
- ✅ Fast startup
- ✅ Simple setup

**Cons:**
- ⚠️ Rate limits (20 requests/min on free tier)
- ⚠️ Network dependency
- ⚠️ Privacy concerns (sends content externally)

---

### Option 4: Custom Vale Rules (Limited)

**Best for:** Quick fix for specific patterns

Create `.ci/vale/styles/InfluxDataDocs/SubjectVerbAgreement.yml`:

```yaml
extends: existence
message: "Possible subject-verb agreement error: '%s'"
level: warning
ignorecase: false
tokens:
  - '\bshould\s+\w+s\b'
  - '\bwill\s+\w+s\b'
  - '\bcan\s+\w+s\b'
  - '\bmust\s+\w+s\b'
```

**Pros:**
- ✅ No additional dependencies
- ✅ Very fast

**Cons:**
- ⚠️ Limited coverage (only catches specific patterns)
- ⚠️ False positives
- ⚠️ Won't catch: "They was", "He don't", etc.

---

## Comparison Matrix

| Solution | Grammar Detection | Setup | Performance | Maintenance | Privacy | CI/CD Ready |
|----------|------------------|-------|-------------|-------------|---------|-------------|
| **GitHub Actions (Option 1)** | ✅ Excellent | Medium | Slow (~30-60s) | Low | ✅ Local | ✅ Yes |
| **Local Docker (Option 2)** | ✅ Excellent | Medium | Slow (~2-5s/file) | Low | ✅ Local | ⚠️ Partial |
| **Node.js API (Option 3)** | ✅ Good | Low | Medium (~1s/file) | Low | ❌ External | ✅ Yes |
| **Custom Vale (Option 4)** | ⚠️ Limited | Low | Fast (<0.1s) | Medium | ✅ Local | ✅ Yes |

---

## Recommended Implementation Plan

### Phase 1: GitHub Actions Only (Recommended Starting Point)
1. Add `.github/workflows/pr-grammar-check.yml`
2. Create `.ci/languagetool/check.sh` script
3. Test on a few PRs
4. Gather feedback from contributors

**Timeline:** 1-2 hours
**Impact:** Catches grammar errors in PRs with zero local overhead

### Phase 2: Local Integration (Optional)
1. Add LanguageTool service to `compose.yaml`
2. Add pre-push hook to `lefthook.yml`
3. Document usage in `DOCS-CONTRIBUTING.md`

**Timeline:** 1 hour
**Impact:** Faster feedback for contributors who want it

### Phase 3: Optimization (Future)
1. Cache LanguageTool results for unchanged files
2. Add PR comment integration for inline feedback
3. Whitelist technical terms to reduce false positives

---

## Test Case

After implementation, this should be caught:

```markdown
# DOCS-FRONTMATTER.md:6
   - This should throws an error

Expected output:
📝 Grammar issues in DOCS-FRONTMATTER.md:
  The modal verb 'should' requires base form.
    Context: This should throws an error
    Suggestion: throw
```

---

## Alternative: Document Current Limitations

If we decide NOT to implement LanguageTool, we should document Vale's limitations:

**Update `DOCS-TESTING.md`:**

```markdown
### Style Linting vs Grammar Checking

**Vale checks:**
- ✅ Style (passive voice, wordiness, weak adverbs)
- ✅ Terminology and branding consistency

**Vale does NOT check:**
- ❌ Grammar (subject-verb agreement, tense, etc.)

**For grammar checking, use:**
- Grammarly (browser extension)
- Microsoft Editor (Word/browser)
- LanguageTool (VS Code extension)
```

---

## Related Issues

- Vale configuration and style linting
- Pre-commit hook improvements

## References

- [LanguageTool Documentation](https://languagetool.org/)
- [LanguageTool Docker Image](https://github.com/erikvl87/docker-languagetool)
- [tj-actions/changed-files](https://github.com/tj-actions/changed-files)
- [Vale Limitations](https://vale.sh/docs/topics/scoping/)

---

## Next Steps

1. **Decide on implementation approach** (GitHub Actions vs Local vs Both)
2. **Create test PR** with intentional grammar errors
3. **Review false positive rate** on existing documentation
4. **Document workflow** in `DOCS-CONTRIBUTING.md` or `DOCS-TESTING.md`
5. **Gather contributor feedback** after 2-3 weeks

---

## Questions to Answer

- [ ] Should grammar checking be **blocking** (fail CI) or **informational** (comment only)?
- [ ] Should we check **all** Markdown files or only **instruction/documentation** files?
- [ ] What's the acceptable **false positive rate** before we need to add exceptions?
- [ ] Should we run grammar checks on **every PR** or only when **content files** change?
- [ ] Do we want **inline PR comments** or just a **summary report**?

---

**Priority:** Medium
**Effort:** Small (GitHub Actions) | Medium (Local Docker)
**Impact:** Improves documentation quality by catching grammar errors Vale cannot detect



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: add grammar checking to linting and CI/CD pipeline #6480

Add LanguageTool Grammar Checking to Linting Pipeline

Problem Statement

Current Linting Tools Limitations

Proposed Solution

Why LanguageTool?

Implementation Options

Option 1: GitHub Actions Integration (RECOMMENDED)

Implementation:

Option 2: Local Docker Integration

Implementation:

Option 3: Node.js with Public API

Implementation:

Option 4: Custom Vale Rules (Limited)

Comparison Matrix

Recommended Implementation Plan

Phase 1: GitHub Actions Only (Recommended Starting Point)

Phase 2: Local Integration (Optional)

Phase 3: Optimization (Future)

Test Case

Alternative: Document Current Limitations

Related Issues

References

Next Steps

Questions to Answer

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Solution	Grammar Detection	Setup	Performance	Maintenance	Privacy	CI/CD Ready
GitHub Actions (Option 1)	✅ Excellent	Medium	Slow (~30-60s)	Low	✅ Local	✅ Yes
Local Docker (Option 2)	✅ Excellent	Medium	Slow (~2-5s/file)	Low	✅ Local	⚠️ Partial
Node.js API (Option 3)	✅ Good	Low	Medium (~1s/file)	Low	❌ External	✅ Yes
Custom Vale (Option 4)	⚠️ Limited	Low	Fast (<0.1s)	Medium	✅ Local	✅ Yes

Proposal: add grammar checking to linting and CI/CD pipeline #6480

Description

Add LanguageTool Grammar Checking to Linting Pipeline

Problem Statement

Current Linting Tools Limitations

Proposed Solution

Why LanguageTool?

Implementation Options

Option 1: GitHub Actions Integration (RECOMMENDED)

Implementation:

Option 2: Local Docker Integration

Implementation:

Option 3: Node.js with Public API

Implementation:

Option 4: Custom Vale Rules (Limited)

Comparison Matrix

Recommended Implementation Plan

Phase 1: GitHub Actions Only (Recommended Starting Point)

Phase 2: Local Integration (Optional)

Phase 3: Optimization (Future)

Test Case

Alternative: Document Current Limitations

Related Issues

References

Next Steps

Questions to Answer

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions