InvoiceFlow
Active

InvoiceFlow

AI-powered document matching and three-way reconciliation for Purchase Orders, Invoices, and Delivery Notes

Technologies

FastAPIv0.104+Azure Form RecognizerOpenAI GPT-4InstructorRapidFuzzPostgreSQLv16MinIOPyMuPDF+6 more

InvoiceFlow — AI-Powered Document Reconciliation (Portfolio Case Study)

The Problem

Manual reconciliation of purchase orders, invoices, and delivery notes is time-consuming, error-prone, and doesn't scale. Finance teams struggle to:

  • Match documents across different formats and vendors
  • Detect discrepancies in quantities, prices, and tax calculations
  • Track currency mismatches and missing line items
  • Generate audit-ready reconciliation reports

A typical reconciliation process involves manually comparing Purchase Orders, Invoices, and Delivery Notes, checking each line item for quantity and price matches, verifying tax calculations, and identifying discrepancies. This manual process takes hours per reconciliation cycle, increases the risk of errors, and doesn't scale with volume.

The Solution

InvoiceFlow automates the entire reconciliation workflow. The platform combines AI-powered document extraction with intelligent matching, enabling finance teams to:

  • Automatically extract structured data from PDF documents using Azure Form Recognizer and LLM validation
  • Intelligently match documents by PO number or vendor name with fuzzy matching
  • Detect discrepancies automatically for quantities, prices, currency, tax, and missing items
  • Generate audit-ready reports in PDF, JSON, and CSV formats with detailed analysis

Example: Instead of spending hours manually comparing a Purchase Order, Invoice, and Delivery Note, a finance team can upload all three documents, automatically extract structured data in minutes, run intelligent matching to connect related documents, and get a comprehensive report highlighting all discrepancies with detailed analysis.

What I Built

1) AI-Powered Document Extraction

InvoiceFlow uses a two-stage extraction process for maximum accuracy:

Stage 1: Azure Form Recognizer

  • OCR and initial field extraction from PDF documents
  • Handles various document formats and layouts
  • Extracts structured fields like vendor name, PO number, line items, totals

Stage 2: LLM-Enhanced Validation

  • Validates and enhances Azure extraction results using GPT-4 via Instructor
  • Ensures consistent currency extraction and validation
  • Validates tax calculations and line item structure
  • Handles edge cases and format variations

The extraction process produces structured data including:

  • Vendor information (name, address, contact)
  • Document metadata (PO number, invoice number, dates)
  • Line items (description, quantity, unit price, total)
  • Financial totals (subtotal, tax, total amount)
  • Currency information

2) Intelligent Document Matching

InvoiceFlow matches documents using multiple strategies:

PO Number Matching:

  • Exact match on PO number when available
  • Fuzzy matching to handle formatting variations
  • Handles cases where PO number appears in different formats

Vendor Name Matching:

  • Fuzzy string matching using RapidFuzz
  • Handles variations in company name formatting
  • Tolerates typos and abbreviations

Three-Way Matching:

  • Matches Purchase Order ↔ Invoice ↔ Delivery Note
  • Creates matching results linking all three document types
  • Calculates match confidence scores

3) Automatic Discrepancy Detection

InvoiceFlow automatically detects and flags discrepancies:

Quantity Mismatches:

  • Compares ordered quantities (PO) vs invoiced quantities vs delivered quantities
  • Flags over-deliveries, under-deliveries, and missing items

Price Changes:

  • Detects price differences between PO and Invoice
  • Flags price increases or decreases with percentage calculations

Currency Differences:

  • Identifies currency mismatches between documents
  • Flags when PO, Invoice, and Delivery Note use different currencies

Tax Calculation Errors:

  • Validates tax calculations across documents
  • Detects tax rate mismatches and calculation errors

Missing Line Items:

  • Identifies items present in PO but missing in Invoice or Delivery Note
  • Flags items invoiced but not ordered

Each discrepancy includes:

  • Type of discrepancy (quantity, price, currency, tax, missing item)
  • Affected line items
  • Expected vs actual values
  • Severity level

4) Multi-Format Report Generation

InvoiceFlow generates comprehensive reports in multiple formats:

PDF Reports:

  • Professional formatted reports with company branding
  • Detailed discrepancy analysis with highlighted issues
  • Summary statistics and match confidence scores
  • Visual comparison tables

JSON Reports:

  • Machine-readable structured data
  • Complete extraction results and matching data
  • Discrepancy details with full context
  • Suitable for integration with other systems

CSV Reports:

  • Spreadsheet-compatible format
  • Line-by-line discrepancy details
  • Easy filtering and analysis in Excel or Google Sheets
  • Summary statistics

5) Workspace Organization

InvoiceFlow organizes reconciliation projects into workspaces:

  • Multi-workspace support for managing multiple reconciliation cycles
  • Document grouping by workspace for organized workflows
  • Workspace-level matching to keep reconciliation projects separate
  • Temporary workspaces for quick one-off reconciliations

6) Document Management

InvoiceFlow provides comprehensive document management:

  • Upload and storage of PDF documents in S3-compatible storage (MinIO)
  • Document type classification (Purchase Order, Invoice, Delivery Note)
  • Status tracking (uploaded, processing, extracted, matched)
  • Download capabilities for original documents and reports

Technical Deep Dive

Architecture Overview

InvoiceFlow follows a microservices architecture with clear separation of concerns:

Loading diagram...

Frontend Layer:

  • Next.js 14 with App Router for modern React development
  • TypeScript for type safety
  • Tailwind CSS and shadcn/ui for consistent UI components
  • Document upload, matching interface, and report viewing

Backend API Layer:

  • FastAPI for high-performance async API
  • RESTful endpoints for document management, matching, and reporting
  • Request validation with Pydantic models
  • Comprehensive error handling

Service Layer:

  • DocumentProcessor: Orchestrates document processing workflow
  • FormRecognizerService: Azure Form Recognizer integration
  • MatchingService: Document matching and discrepancy detection
  • ReportGenerator: Multi-format report generation
  • StorageService: File operations with MinIO

Extraction Services:

  • CurrencyExtractor: Multi-source currency extraction and validation
  • TaxExtractor: Tax validation and extraction
  • LLMExtractor: LLM-enhanced extraction with Instructor

Document Processing Flow

Loading diagram...

Matching Flow

Loading diagram...

Technology Stack

Backend:

  • FastAPI (Python 3.11+): Async API framework with high performance
  • Azure Form Recognizer: Enterprise-grade OCR and document field extraction
  • OpenAI GPT-4: LLM-enhanced extraction validation via Instructor
  • Instructor: Structured output validation for consistent extraction
  • RapidFuzz: Fuzzy string matching for intelligent document matching
  • PyMuPDF: PDF parsing and validation
  • SQLAlchemy: ORM for database operations
  • Alembic: Database migrations

Frontend:

  • Next.js 14: Modern React framework with App Router
  • TypeScript: Type safety across frontend codebase
  • Tailwind CSS: Utility-first CSS framework
  • shadcn/ui: Component library for consistent UI

Infrastructure:

  • PostgreSQL 16: Relational database for structured data
  • MinIO: S3-compatible object storage for PDF files
  • Docker: Containerization for consistent deployment
  • Docker Compose: Orchestration for multi-service deployment

Key Technical Decisions

Azure Form Recognizer for OCR:

  • Provides high-accuracy document OCR and structured field extraction
  • Handles various document formats and layouts
  • Reduces need for custom parsing logic
  • Enterprise-grade reliability

LLM-Enhanced Extraction:

  • Validates and enhances Azure extraction results
  • Ensures structured output with proper currency, tax, and line item extraction
  • Handles edge cases and format variations
  • Uses Instructor for consistent structured output

RapidFuzz for Fuzzy Matching:

  • Enables intelligent document matching with tolerance for variations
  • Handles formatting differences, typos, and naming conventions
  • Improves match accuracy across different document formats

MinIO for Object Storage:

  • S3-compatible storage enables easy migration to AWS S3
  • Self-hosted flexibility for development
  • Scalable file storage for large document volumes

Production Impact

InvoiceFlow demonstrates production-grade architecture for document processing and reconciliation:

  • Scalable extraction using Azure Form Recognizer and LLM validation
  • Intelligent matching with fuzzy logic for real-world document variations
  • Comprehensive discrepancy detection across multiple document types
  • Multi-format reporting for different use cases and integrations
  • Workspace organization for managing multiple reconciliation projects
  • Microservices architecture with clear separation of concerns

The platform is designed for finance teams who need to automate reconciliation workflows, reduce manual errors, and scale reconciliation processes across large document volumes.

Want to Build Something Similar?

I specialize in building production-ready AI systems that scale. Let's discuss your project.