
InvoiceFlow
AI-powered document matching and three-way reconciliation for Purchase Orders, Invoices, and Delivery Notes
Technologies
InvoiceFlow — AI-Powered Document Reconciliation (Portfolio Case Study)
The Problem
Manual reconciliation of purchase orders, invoices, and delivery notes is time-consuming, error-prone, and doesn't scale. Finance teams struggle to:
- Match documents across different formats and vendors
- Detect discrepancies in quantities, prices, and tax calculations
- Track currency mismatches and missing line items
- Generate audit-ready reconciliation reports
A typical reconciliation process involves manually comparing Purchase Orders, Invoices, and Delivery Notes, checking each line item for quantity and price matches, verifying tax calculations, and identifying discrepancies. This manual process takes hours per reconciliation cycle, increases the risk of errors, and doesn't scale with volume.
The Solution
InvoiceFlow automates the entire reconciliation workflow. The platform combines AI-powered document extraction with intelligent matching, enabling finance teams to:
- Automatically extract structured data from PDF documents using Azure Form Recognizer and LLM validation
- Intelligently match documents by PO number or vendor name with fuzzy matching
- Detect discrepancies automatically for quantities, prices, currency, tax, and missing items
- Generate audit-ready reports in PDF, JSON, and CSV formats with detailed analysis
Example: Instead of spending hours manually comparing a Purchase Order, Invoice, and Delivery Note, a finance team can upload all three documents, automatically extract structured data in minutes, run intelligent matching to connect related documents, and get a comprehensive report highlighting all discrepancies with detailed analysis.
What I Built
1) AI-Powered Document Extraction
InvoiceFlow uses a two-stage extraction process for maximum accuracy:
Stage 1: Azure Form Recognizer
- OCR and initial field extraction from PDF documents
- Handles various document formats and layouts
- Extracts structured fields like vendor name, PO number, line items, totals
Stage 2: LLM-Enhanced Validation
- Validates and enhances Azure extraction results using GPT-4 via Instructor
- Ensures consistent currency extraction and validation
- Validates tax calculations and line item structure
- Handles edge cases and format variations
The extraction process produces structured data including:
- Vendor information (name, address, contact)
- Document metadata (PO number, invoice number, dates)
- Line items (description, quantity, unit price, total)
- Financial totals (subtotal, tax, total amount)
- Currency information
2) Intelligent Document Matching
InvoiceFlow matches documents using multiple strategies:
PO Number Matching:
- Exact match on PO number when available
- Fuzzy matching to handle formatting variations
- Handles cases where PO number appears in different formats
Vendor Name Matching:
- Fuzzy string matching using RapidFuzz
- Handles variations in company name formatting
- Tolerates typos and abbreviations
Three-Way Matching:
- Matches Purchase Order ↔ Invoice ↔ Delivery Note
- Creates matching results linking all three document types
- Calculates match confidence scores
3) Automatic Discrepancy Detection
InvoiceFlow automatically detects and flags discrepancies:
Quantity Mismatches:
- Compares ordered quantities (PO) vs invoiced quantities vs delivered quantities
- Flags over-deliveries, under-deliveries, and missing items
Price Changes:
- Detects price differences between PO and Invoice
- Flags price increases or decreases with percentage calculations
Currency Differences:
- Identifies currency mismatches between documents
- Flags when PO, Invoice, and Delivery Note use different currencies
Tax Calculation Errors:
- Validates tax calculations across documents
- Detects tax rate mismatches and calculation errors
Missing Line Items:
- Identifies items present in PO but missing in Invoice or Delivery Note
- Flags items invoiced but not ordered
Each discrepancy includes:
- Type of discrepancy (quantity, price, currency, tax, missing item)
- Affected line items
- Expected vs actual values
- Severity level
4) Multi-Format Report Generation
InvoiceFlow generates comprehensive reports in multiple formats:
PDF Reports:
- Professional formatted reports with company branding
- Detailed discrepancy analysis with highlighted issues
- Summary statistics and match confidence scores
- Visual comparison tables
JSON Reports:
- Machine-readable structured data
- Complete extraction results and matching data
- Discrepancy details with full context
- Suitable for integration with other systems
CSV Reports:
- Spreadsheet-compatible format
- Line-by-line discrepancy details
- Easy filtering and analysis in Excel or Google Sheets
- Summary statistics
5) Workspace Organization
InvoiceFlow organizes reconciliation projects into workspaces:
- Multi-workspace support for managing multiple reconciliation cycles
- Document grouping by workspace for organized workflows
- Workspace-level matching to keep reconciliation projects separate
- Temporary workspaces for quick one-off reconciliations
6) Document Management
InvoiceFlow provides comprehensive document management:
- Upload and storage of PDF documents in S3-compatible storage (MinIO)
- Document type classification (Purchase Order, Invoice, Delivery Note)
- Status tracking (uploaded, processing, extracted, matched)
- Download capabilities for original documents and reports
Technical Deep Dive
Architecture Overview
InvoiceFlow follows a microservices architecture with clear separation of concerns:
Frontend Layer:
- Next.js 14 with App Router for modern React development
- TypeScript for type safety
- Tailwind CSS and shadcn/ui for consistent UI components
- Document upload, matching interface, and report viewing
Backend API Layer:
- FastAPI for high-performance async API
- RESTful endpoints for document management, matching, and reporting
- Request validation with Pydantic models
- Comprehensive error handling
Service Layer:
- DocumentProcessor: Orchestrates document processing workflow
- FormRecognizerService: Azure Form Recognizer integration
- MatchingService: Document matching and discrepancy detection
- ReportGenerator: Multi-format report generation
- StorageService: File operations with MinIO
Extraction Services:
- CurrencyExtractor: Multi-source currency extraction and validation
- TaxExtractor: Tax validation and extraction
- LLMExtractor: LLM-enhanced extraction with Instructor
Document Processing Flow
Matching Flow
Technology Stack
Backend:
- FastAPI (Python 3.11+): Async API framework with high performance
- Azure Form Recognizer: Enterprise-grade OCR and document field extraction
- OpenAI GPT-4: LLM-enhanced extraction validation via Instructor
- Instructor: Structured output validation for consistent extraction
- RapidFuzz: Fuzzy string matching for intelligent document matching
- PyMuPDF: PDF parsing and validation
- SQLAlchemy: ORM for database operations
- Alembic: Database migrations
Frontend:
- Next.js 14: Modern React framework with App Router
- TypeScript: Type safety across frontend codebase
- Tailwind CSS: Utility-first CSS framework
- shadcn/ui: Component library for consistent UI
Infrastructure:
- PostgreSQL 16: Relational database for structured data
- MinIO: S3-compatible object storage for PDF files
- Docker: Containerization for consistent deployment
- Docker Compose: Orchestration for multi-service deployment
Key Technical Decisions
Azure Form Recognizer for OCR:
- Provides high-accuracy document OCR and structured field extraction
- Handles various document formats and layouts
- Reduces need for custom parsing logic
- Enterprise-grade reliability
LLM-Enhanced Extraction:
- Validates and enhances Azure extraction results
- Ensures structured output with proper currency, tax, and line item extraction
- Handles edge cases and format variations
- Uses Instructor for consistent structured output
RapidFuzz for Fuzzy Matching:
- Enables intelligent document matching with tolerance for variations
- Handles formatting differences, typos, and naming conventions
- Improves match accuracy across different document formats
MinIO for Object Storage:
- S3-compatible storage enables easy migration to AWS S3
- Self-hosted flexibility for development
- Scalable file storage for large document volumes
Production Impact
InvoiceFlow demonstrates production-grade architecture for document processing and reconciliation:
- Scalable extraction using Azure Form Recognizer and LLM validation
- Intelligent matching with fuzzy logic for real-world document variations
- Comprehensive discrepancy detection across multiple document types
- Multi-format reporting for different use cases and integrations
- Workspace organization for managing multiple reconciliation projects
- Microservices architecture with clear separation of concerns
The platform is designed for finance teams who need to automate reconciliation workflows, reduce manual errors, and scale reconciliation processes across large document volumes.
Want to Build Something Similar?
I specialize in building production-ready AI systems that scale. Let's discuss your project.

