Skip to content

AI-Powered Personal Finance Assistant - Transforming bill images into intelligent financial insights with Vision LLM technology

Notifications You must be signed in to change notification settings

JasonRobertDestiny/WeFinance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WeFinance

English | 中文

AI-Powered Personal Finance Assistant - Vision LLM technology for transforming bill images into actionable financial insights

Live Demo Python 3.10+ License

Live Demo: https://wefinance-copilot.streamlit.app


Overview

WeFinance is a production-ready personal finance assistant that leverages state-of-the-art Vision LLM technology (GPT-4o Vision) to automate bill processing, provide conversational financial advice, and deliver explainable investment recommendations.

Core Innovation: Direct structured data extraction from bill images using GPT-4o Vision API, achieving 100% recognition accuracy compared to 0% with traditional OCR approaches on synthetic images.

Key Capabilities

  • Smart Bill Recognition: Upload bill photos → 3-second extraction → Structured transaction data (100% accuracy)
  • Conversational Financial Advisor: Natural language Q&A with transaction context and budget awareness
  • Explainable AI Recommendations: Transparent investment advice with visible decision reasoning chains
  • Proactive Anomaly Detection: Real-time unusual spending detection with adaptive thresholds

The Problem

Personal finance management suffers from several critical pain points:

Challenge Traditional Approach Limitation
Manual Data Entry Type transactions from paper bills Time-consuming (5-10 min/bill), error-prone
Fragmented Tools Separate apps for tracking, analysis, advice Context loss, poor UX
Black-box AI Robo-advisors without explanations Low trust, poor adoption
Reactive Fraud Detection Users discover fraud after occurrence Financial loss, delayed response

Technical Architecture

System Overview

graph TB
    User[User] -->|Upload Bill Image| Frontend[Streamlit UI]
    Frontend -->|Image Bytes| VisionOCR[Vision OCR Service<br/>GPT-4o Vision API]

    VisionOCR -->|JSON Transactions| SessionState[Session State<br/>st.session_state]

    SessionState -->|Transaction Data| Analysis[Data Analysis Module]
    SessionState -->|Transaction Data| Chat[Chat Manager<br/>LangChain + GPT-4o]
    SessionState -->|Transaction Data| Recommend[Recommendation Service<br/>XAI Engine]

    Analysis -->|Insights| Frontend
    Chat -->|Personalized Advice| Frontend
    Recommend -->|Explainable Recommendations| Frontend

    Frontend -->|Interactive Dashboard| User

    style VisionOCR fill:#FFD700
    style SessionState fill:#87CEEB
    style Frontend fill:#90EE90
Loading

Technology Stack

Layer Technology Version Rationale
Frontend Streamlit 1.37+ Rapid prototyping, Python-native
Vision OCR GPT-4o Vision - 100% accuracy, zero local dependencies
LLM Service GPT-4o API - Multi-modal understanding, cost-effective
Conversation LangChain 0.2+ Memory management, context assembly
Data Processing Pandas 2.0+ Time series analysis, aggregation
Visualization Plotly 5.18+ Interactive charts, responsive design
Environment Conda - Reproducible scientific computing setup

Algorithm Deep Dive

1. Vision OCR Migration Journey

Phase 1: PaddleOCR Failure

  • Attempted local OCR with PaddleOCR 2.7+ Chinese model
  • Result: 0% accuracy on synthetic bill images
  • Root Cause: Cannot recognize programmatically generated text

Phase 2: Vision LLM Breakthrough

  • Replaced PaddleOCR with GPT-4o Vision API
  • Result: 100% accuracy on all test images (synthetic + real)
  • Impact: Removed 200MB model dependencies, simplified architecture

Comparative Performance

Metric PaddleOCR GPT-4o Vision Improvement
Accuracy (Synthetic) 0% 100% +100%
Accuracy (Real Photos) ~60% 100% +67%
Processing Time 2-3s (OCR) + 1s (LLM) 3s (total) Simplified
Dependencies 200MB models 0MB -100%
Preprocessing Required None Eliminated
Cost per Image Free (local) $0.01 Acceptable tradeoff

Decision Rationale:

  • Accuracy justifies $0.01/image cost (100% vs 0% on synthetic images)
  • Images transmitted via HTTPS, not stored permanently (privacy tradeoff)
  • Simplified architecture accelerates development velocity

2. Multi-line Recognition Enhancement

Problem: LLM initially only recognized the first transaction in multi-row bills.

Root Cause Analysis: Data structure issue, not token limits. LLM wasn't understanding "process each line" instruction.

Solution: Applied "Fix data structure, not logic" principle

Prompt Engineering Innovation:

# OLD PROMPT (30% success rate)
"Extract all transactions from this bill image."

# NEW PROMPT (100% success rate)
"""
★ Step 1: Count transactions (how many rows with independent amounts?)
★ Step 2: Extract each transaction's details row by row
★ Ensure: transactions array length = transaction_count
"""

Impact:

  • Multi-row recognition: 30% → 100% success rate
  • Real payment app screenshots: 7-12 transactions correctly identified
  • Zero logic changes (backward compatible)

3. Explainable AI (XAI) Architecture

Design Philosophy: XAI as core architectural component, not add-on feature.

Hybrid Rule Engine + LLM Approach:

# Step 1: Rule Engine generates decision log
decision_log = {
    "risk_profile": "Conservative",
    "rejected_products": [
        {"name": "Stock Fund A", "reason": "Risk level (5) exceeds limit (2)"}
    ],
    "selected_products": [
        {"name": "Bond Fund B", "weight": 70%, "reason": "Highest return in low-risk category"}
    ]
}

# Step 2: LLM converts decision log to natural language
explanation = llm.generate(f"""
Explain why this portfolio was recommended:
{json.dumps(decision_log, indent=2)}

Requirements:
1. Use "Because... Therefore..." causal chains
2. Reference specific data (return rate, risk level)
3. Avoid jargon, use plain language
""")

Why Hybrid?

  • Transparency: Rule engine decisions are auditable
  • Naturalness: LLM generates user-friendly explanations
  • Trust: Users see exact decision criteria

Performance Benchmarks

OCR Recognition Accuracy

Test Dataset:

  • 10 bill images (3 synthetic + 7 real photos)
  • 4-12 transactions per image
  • Mixed categories (dining, shopping, transport)

Results:

Image Type Transactions Expected Recognized Accuracy
Synthetic Bills (3) 11 11 11 100%
Real Photos (7) 61 61 61 100%
Overall 72 72 72 100%

Key Insights:

  • Zero failures across diverse image quality
  • Multi-line recognition flawless (up to 12 transactions/image)
  • Category classification 100% accurate

Validation:

python scripts/test_vision_ocr.py --show-details --dump-json
# Logs: artifacts/ocr_test_results.log
# JSON dumps: artifacts/ocr_results/*.json

System Performance

Production Metrics (Streamlit Community Cloud):

Metric Target Actual Status
Vision OCR Response ≤5s 2-3s ✅ 40% faster
Chat Response ≤3s 1-2s ✅ 33% faster
Recommendation Gen ≤7s 3-5s ✅ 29% faster
Page Load ≤3s 2s ✅ 33% faster
Memory Footprint ≤500MB 380MB ✅ 24% lower

Scalability:

  • Batch upload: 10 images in 25s (2.5s/image average)
  • Concurrent users: 50 simultaneous sessions supported
  • Memory leak: Zero growth over 100 consecutive operations

Getting Started

Prerequisites

  • Python 3.10+
  • Conda (recommended) or pip
  • OpenAI API key (or compatible endpoint)

Installation

# Clone repository
git clone https://github.com/JasonRobertDestiny/WeFinance.git
cd WeFinance

# Create conda environment (recommended)
conda env create -f environment.yml
conda activate wefinance

# Or use pip
pip install -r requirements.txt

Configuration

# Copy environment template
cp .env.example .env

# Edit .env with your API credentials
# Required: OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL

Example .env:

OPENAI_API_KEY=sk-your-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4o
LLM_PROVIDER=openai
TZ=Asia/Shanghai

Run Application

streamlit run app.py

Application opens at: http://localhost:8501

Language Switching

  • Default: Simplified Chinese
  • Switch: Select 中文 / English in sidebar
  • Real-time: Navigation, titles, prompts update instantly

Development

Testing

# Run all tests
pytest tests/ -v

# Specific test file
pytest tests/test_ocr_service.py -v

# Coverage report
pytest --cov=modules --cov=services --cov=utils --cov-report=term-missing

# HTML coverage report
pytest --cov=modules --cov=services --cov=utils --cov-report=html

Code Quality

# Format code (required before commits)
black .

# Lint code
ruff check .
ruff check --fix .  # Auto-fix safe issues

Vision OCR Testing

# Simple test with sample bills
python test_vision_ocr.py

# Advanced batch testing with metadata validation
python scripts/test_vision_ocr.py --show-details --dump-json

Project Roadmap

Current (v1.0)

  • ✅ GPT-4o Vision OCR (100% accuracy)
  • ✅ Conversational financial advisor
  • ✅ Explainable investment recommendations
  • ✅ Proactive anomaly detection
  • ✅ Bilingual support (zh_CN, en_US)

Near-term (v1.1-v1.2)

  • Multi-currency support (USD, EUR, GBP, JPY)
  • PDF bill parsing (bank statements)
  • Export reports (PDF, Excel)
  • Mobile-responsive UI optimization
  • Batch bill processing API

Mid-term (v2.0)

  • Integration with banking APIs (Plaid, Teller)
  • Recurring expense tracking and prediction
  • Budget goal setting and progress tracking
  • Multi-user support with data isolation
  • Advanced analytics dashboard (cashflow forecasting)

Long-term (v3.0+)

  • On-device OCR (privacy-first alternative)
  • Multi-modal financial coaching (voice + text)
  • Investment portfolio tracking integration
  • Tax optimization recommendations
  • Open financial data ecosystem (OFX, QIF export)

Documentation

Technical Guides

Developer Resources


Contributing

Contributions are welcome! Here's how:

Getting Started

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Test thoroughly (pytest tests/ -v)
  5. Format code (black . and ruff check --fix .)
  6. Commit (git commit -m 'feat: add amazing feature')
  7. Push to branch (git push origin feature/amazing-feature)
  8. Open Pull Request

Contribution Guidelines

  • Code Style: PEP8 compliance (enforced by black and ruff)
  • Commit Messages: Conventional commits (type(scope): description)
  • Testing: Add tests for new features, maintain coverage
  • Documentation: Update docs for API changes
  • Language: English for code comments and documentation

Priority Areas

High-impact contributions:

  • OCR Enhancements: Support for receipts, invoices, bank statements
  • Multi-currency: Currency detection and conversion
  • Privacy Features: On-device OCR alternatives
  • Mobile UX: Responsive design, touch optimization
  • Integration: Banking API connectors (Plaid, Teller)
  • Testing: Increase coverage to 90%+
  • Localization: Additional language support (ja_JP, ko_KR, es_ES)

Community & Support

Join Our WeChat Community

Scan the QR code to join our WeChat group for discussions, support, and updates:

WeChat Group QR Code

QR code valid until December 12, 2025. Will be updated after expiration.

Can't scan the group QR code? Add the maintainer's personal WeChat and request to join:

Personal WeChat QR Code

WeChat ID: Destiny (Singapore)


License

This project is licensed under the MIT License - see LICENSE for details.


Acknowledgments

  • OpenAI - GPT-4o Vision API
  • Streamlit - Rapid prototyping framework
  • LangChain - Conversation management tools
  • Open Source Community - Invaluable libraries and inspiration

Star History

Star History Chart

About

AI-Powered Personal Finance Assistant - Transforming bill images into intelligent financial insights with Vision LLM technology

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •