When prospects ask us "what exactly do you extract?", we tend to answer with a number — 23 fields per invoice — but the number isn't the interesting part. The interesting part is why those specific fields, how they relate to each other, and what goes wrong when an extraction system gets some of them right and others wrong.
This post is the full field-level breakdown. It's written for AP leads, operations directors, and engineering teams who are evaluating whether our extraction output maps cleanly to their ERP schema. No marketing language — just the fields, their format, and the extraction considerations for each.
Header Fields: Document Identity and Transaction Context
These are the fields that identify the transaction itself — what document this is, who it's from, and when it was issued.
| Field | Output Format | Notes |
|---|---|---|
| invoice_number | String | Alphanumeric, preserves vendor-specific formatting |
| invoice_date | ISO 8601 (YYYY-MM-DD) | Normalized from any input format (MM/DD/YY, DD-MON-YYYY, etc.) |
| due_date | ISO 8601 | Extracted from explicit field or inferred from payment terms + invoice date |
| purchase_order_ref | String (nullable) | Cross-reference to PO; null if not present on invoice |
| currency_code | ISO 4217 (USD, EUR, GBP, etc.) | Detected from symbol or explicit label; defaults to USD if ambiguous + US vendor |
The due_date field deserves specific attention. About 30% of invoices we process don't have an explicit due date field — they have payment terms like "Net 30" or "2/10 Net 30". We compute the due date from the invoice date and the extracted payment terms string. If the payment terms contain an early payment discount qualifier (the "2/10" part), we flag it in a separate early_payment_discount field so your AP system can act on it.
Vendor Fields: Supplier Identification
This field group is where the gap between "OCR that found some text" and "extraction that understood the document" becomes most visible.
| Field | Output Format | Notes |
|---|---|---|
| vendor_name | String | Legal entity name as printed on invoice header |
| vendor_tax_id | String (nullable) | EIN/VAT/GST number where present |
| vendor_address | Structured object (street, city, state, zip, country) | Parsed into components for ERP address fields |
| vendor_bank_account | String (nullable, masked) | Extracted where present; last 4 digits shown in output |
| remit_to_address | Structured object (nullable) | Separate from vendor address when "Remit To" block is present |
Vendor address extraction is trickier than it looks. An invoice from a large vendor often has 3–4 addresses on it: the vendor's legal entity address, the remit-to address (which may be a lockbox or third-party payment processor), a "ship from" address, and a "sold by" entity that differs from the invoice issuer. Conflating these is a common extraction error with real downstream consequences — particularly the remit-to vs. vendor-address confusion, which can lead to payments sent to the wrong entity.
Financial Fields: Amounts, Taxes, and Totals
The amount fields are where extraction errors cause the most direct financial exposure. We apply cross-validation across these fields as a rule-based check on top of model extraction: subtotal + taxes + fees should equal total_due. If they don't, the document routes to exception regardless of confidence score.
| Field | Output Format | Notes |
|---|---|---|
| subtotal | Decimal (2dp) | Pre-tax amount |
| tax_amount | Decimal (2dp) | Total tax; broken out by tax_line_items when multiple rates present |
| tax_rate | Decimal % (nullable) | Extracted or computed from subtotal / tax_amount |
| tax_code | String (nullable) | GST, VAT, sales tax, HST — as labeled on document |
| shipping_amount | Decimal (2dp, nullable) | Freight/delivery charges if present as separate line |
| discount_amount | Decimal (2dp, nullable) | Invoice-level discount if applied |
| total_due | Decimal (2dp) | Amount owed; cross-validated against component fields |
| payment_terms | String + structured object | Raw string plus parsed net_days, discount_percent, discount_days |
Line Item Fields: The Hardest Part of Invoice Extraction
Line items are where most extraction systems struggle. They require understanding table structure — which means correctly identifying column boundaries, handling multi-row line items where a description wraps to a second row, and aggregating correctly when a table spans multiple pages.
Each line item is returned as an object in a line_items array:
{
"line_number": 3,
"description": "Industrial bearing assembly, 6205-2RS",
"quantity": 24,
"unit": "EA",
"unit_price": 18.75,
"line_total": 450.00,
"product_code": "BRG-6205-2RS",
"gl_account_hint": null,
"tax_applicable": true
}
The gl_account_hint field is worth explaining. Some invoices include GL account codes on line items — particularly invoices from internal service providers or between entities in the same corporate group. When present, we extract it. When absent, it's null. We don't guess GL codes — that's your system's job, not ours.
We also extract line-item-level tax indicators where invoices apply different tax rates to different items (common in jurisdictions with varying tax treatment for goods vs. services).
What We Don't Extract (and Why)
We want to be direct about what's out of scope. We don't extract fields that require business logic your company defines — GL coding, cost center assignment, approval routing by business rule. Those require mapping to your chart of accounts and approval structure. We provide the raw extraction; your ERP or AP automation layer applies business rules to it.
We also don't extract free-form notes or internal comments that appear on some invoices unless they're structurally labeled as a specific field type (like "Delivery Instructions" or "Special Terms"). Unstructured text fields get captured as a raw notes string where present, not classified further.
We're not saying GL coding or business-rule application is unimportant — it's critically important. We're saying it belongs in your ERP or AP automation layer, not in the extraction step. Conflating document extraction with business-rule execution creates systems that are hard to maintain and harder to audit.
ERP Field Mapping
When we connect to SAP S/4HANA, Oracle NetSuite, Microsoft Dynamics 365, or other ERPs, we map our canonical field names to the target system's field schema. The mapping is configured during integration setup and is version-controlled — so when an ERP update changes a field name or adds a required field, the mapping gets updated rather than failing silently.
If you want to see exactly how our 23 fields map to your specific ERP schema, that's the right conversation to have early in an evaluation. Bring your ERP field spec to the demo and we'll walk through the mapping live with your actual invoice samples.