Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.comstruct.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

comstruct can receive emails containing invoices and automatically extract, classify, and process their attachments. This enables a hands-off workflow: forward supplier invoices to a dedicated address and comstruct takes care of the rest. The email processing endpoint powers two main scenarios:

Standard email forwarding

Forward invoices to comstruct. Attachments are extracted, classified by AI, and turned into invoices.

Staple scan processing

Send scanned multi-invoice PDFs. comstruct segments them into individual invoices using AI page-boundary detection.

How it works

When an email arrives at the processing endpoint, comstruct runs the following pipeline:

Step-by-step

  1. Email parsing — The raw email (MIME/RFC 822) is parsed. Inline images (signatures, logos) are filtered out. Attachments are split into supported and unsupported categories.
  2. Format conversion — TIFF attachments are automatically converted to PDF. Failed conversions are preserved as additional documents.
  3. Routing — Based on the number and type of supported attachments, comstruct decides the most efficient processing path (see Processing modes).
  4. Invoice creation — Each identified invoice enters the standard comstruct processing pipeline: AI-assisted data extraction, supplier matching, and approval workflows.

Supported document formats

Invoice documents (processed as invoices)

FormatMIME typesNotes
PDFapplication/pdfPrimary format; best AI extraction results
XMLapplication/xml, text/xmlProcessed as XRechnung when valid electronic invoice XML is detected
TIFFimage/tiff, image/tifAutomatically converted to PDF before processing
PDFs are also detected by content inspection, so a file with an incorrect MIME type but valid PDF content will still be processed correctly.

Additional documents (preserved alongside invoices)

Non-invoice attachments are uploaded and linked to the resulting invoice as reference documents. This includes:
FormatExamples
Office documents.doc, .docx, .xls, .xlsx
Images.jpg, .png, .bmp, .webp, .gif
Archives.zip
Forwarded emails.eml, message/rfc822
Attachments that are neither invoice documents nor uploadable reference documents are logged and skipped.

Processing modes

Standard email processing

Standard processing determines the best path based on the number and type of supported attachments:
The most common case. The PDF is sent directly to the invoice processing pipeline — no classification needed.Result: One invoice created immediately.
When an email contains more than one supported document, or a mix of PDFs and other types, all attachments are sent to the AI classification queue.The classifier groups documents into:
  • Invoice documents — each group becomes a separate invoice
  • Supporting documents — linked to the relevant invoice(s)
XML attachments are matched to PDF invoices by comparing normalized invoice numbers.
When the email contains only XML attachments with valid electronic invoice content (XRechnung / EN 16931), comstruct:
  1. Validates the XML as a recognized electronic invoice format
  2. Generates a human-readable PDF from the structured data
  3. Creates an invoice with both the PDF and original XML attached
If the XML is not a valid electronic invoice, a placeholder invoice is created with the XML preserved as an additional document.
When an email has no processable attachments (e.g., only images or Office files), comstruct still creates a placeholder invoice so the email is not lost:
  • A placeholder PDF is generated
  • All uploadable attachments are linked as additional documents
  • The email body (sender, subject, date, text) is rendered as a separate PDF and attached
This ensures every forwarded email is accounted for and can be reviewed manually.

Staple scan processing

Staple scan mode is designed for scanning workflows where multiple paper invoices are fed through a scanner in one batch, producing a single multi-page PDF. When staple scan mode is active:
  1. Only PDF attachments are processed (non-PDF supported docs are ignored)
  2. Each PDF is analyzed by AI to detect page boundaries between individual invoices
  3. The PDF is split into segments — one per detected invoice
  4. Each segment is sent directly to invoice processing (classification is skipped, since all pages are known to be invoices)
Activate staple scan mode by setting the x-staple-scan: true header on the request. Fallback: If segmentation fails or the PDF has only one page, the entire PDF is treated as a single invoice.

Email metadata preservation

comstruct extracts metadata from the original email and uses it throughout processing:
Metadata fieldUsage
FromSender identification; helps with supplier matching
SubjectStored for reference and searchability
DateOriginal email timestamp
Body textWhen no invoice attachments are found, the email body is rendered as a PDF and attached to the placeholder invoice

AI-powered classification

When an email contains multiple attachments, comstruct uses AI (Gemini) to intelligently group and classify them:
  • Invoice vs. supporting document — The classifier determines which documents are actual invoices and which are supplementary (e.g., delivery notes, cover letters, specs)
  • Grouping — Multiple pages or files that belong to the same invoice are grouped together
  • XML–PDF matching — When both XML (XRechnung) and PDF versions of an invoice are present, they are matched by normalized invoice number and associated as a single invoice
  • Duplicate detection — If a document has already been processed (matched by document ID), it is skipped to prevent duplicate invoices

Fallback behavior

If AI classification fails for any reason, comstruct falls back to a safe default: each PDF attachment is processed as a separate invoice, with XML and unsupported documents attached to all resulting invoices. This ensures no invoice is lost.

Queue and retry behavior

Email processing uses a job queue to handle classification asynchronously when needed:
SettingValue
Max attempts5
Backoff strategyExponential, starting at 10 seconds
ConcurrencyConfigurable (default: 1 worker)
The downstream invoice processing pipeline (data extraction, OCR, matching) runs on a separate queue with its own retry logic (10 attempts, 15-second initial backoff).

Integration with SendGrid

The email processing endpoint is designed to receive SendGrid Inbound Parse webhook payloads. SendGrid forwards incoming emails as multipart form-data, and comstruct extracts the raw email content from the email field.

Setup

  1. Configure a SendGrid Inbound Parse webhook pointing to your comstruct instance
  2. Set the MX records for your forwarding domain to SendGrid
  3. Include authentication headers (x-api-key) in the webhook configuration
  4. Optionally set x-staple-scan: true for scan-dedicated addresses

Request format

The endpoint accepts a raw body (up to 32 MB) containing multipart form-data. The email field must contain the complete raw email in RFC 822 / MIME format.

Best practices

  • Use a dedicated email address per forwarding purpose (e.g., one for regular invoices, another for staple scans)
  • Configure email rules to forward invoices automatically — avoid manual forwarding where possible
  • Ensure the forwarding preserves original attachments (avoid inline-only forwarding)
  • PDF yields the best AI extraction results — prefer it over scanned images
  • Use 300 DPI or higher for scanned documents
  • Ensure documents are not password-protected
  • Avoid extremely large attachments — the endpoint accepts up to 32 MB total
  • Feed invoices in order — comstruct detects boundaries but preserves page sequence
  • Use clear page separations between invoices
  • Single-page invoices work best; multi-page invoices within a staple scan are also supported
  • Send XML files as standard attachments (not inline)
  • When sending both PDF and XML versions, use matching invoice numbers so comstruct can link them automatically
  • Supported formats: XRechnung (EN 16931 compliant)

Single invoice upload

Upload a single PDF invoice directly via API.

Email invoice

Upload a single raw PDF with email-style headers (project, tenant).

Invoice list

Query and filter processed invoices.

Invoice callback

Receive status updates from ERP systems.