API Live IBM Docling Engine 37,000+ GitHub Stars

Documents in,
structured data
out.

One API call. PDF, DOCX, PPTX, XLSX → clean Markdown or JSON. Enterprise engine. Startup price. No cloud account needed.

Try Live Demo API Documentation →
PDF to MarkdownDOCX to JSONPPTX to TextTable ExtractionOCR Built-in$0.005 Per PageFree Tier Available PDF to MarkdownDOCX to JSONPPTX to TextTable ExtractionOCR Built-in$0.005 Per PageFree Tier Available
Capabilities
What Doktral does.
Everything you need to parse documents at scale. Nothing you don't.
001 — FORMATS

10+ document types. One endpoint.

PDF, DOCX, PPTX, XLSX, HTML, Markdown, TXT, PNG, JPG, TIFF, BMP, CSV. Send anything — we parse it. No format-specific endpoints. No configuration. Just POST.

002 — TABLES

Automatic table extraction

Tables detected and extracted as structured arrays with headers and rows. Ready for spreadsheets, databases, or data pipelines.

003 — OUTPUT

Three output modes

Markdown for LLMs and RAG. JSON for structured processing. Plain text for search. Choose per request.

004 — OCR

Built-in OCR

Scanned PDFs and images auto-detected and processed. Multi-language support. No extra config.

005 — METADATA

Rich document metadata

Page count, word count, file type, size, title, author. Returned with every parse. Zero extra calls.

006 — ENGINE

IBM Research Docling

MIT licensed. Linux Foundation backed. 37k+ stars. Used by enterprises worldwide. Not a toy parser.

007 — PRIVACY

Your documents are never stored.

Processed in memory. Deleted immediately after parsing. No logging. No training on your data. GDPR-conscious by design.

008 — AUTH

Simple API keys

Bearer token auth. One POST to create a key. No OAuth. No signup forms. No SDKs required.

Live Demo

Drop a document. Watch it parse. Real API. Real-time. No mockups.

Drop a file here
or click to browse

PDF · DOCX · PPTX · XLSX · HTML · PNG · JPG
// Drop a file above to see parsed output
// Calls our live production API at api.doktral.com
// No tricks — real document parsing
Quick Start
Six lines. Any language.
If it speaks HTTP, it works with Doktral.
parse_invoice.py
import requests

# Get your free API key
key = requests.post("https://api.doktral.com/v1/keys",
    json={"name": "my-app"}).json()["api_key"]

# Parse any document
r = requests.post("https://api.doktral.com/v1/parse",
    headers={"Authorization": f"Bearer {key}"},
    files={"file": open("invoice.pdf", "rb")}).json()

print(r["content"])   # → Clean markdown
print(r["tables"])    # → Extracted tables
print(r["metadata"])  # → Page count, word count, etc.
Pricing
Transparent. Predictable.
No surprises. No hidden fees. Start free.
Free
$0 forever
100 pages/mo
  • All formats
  • Markdown + JSON
  • Table extraction
  • 10 req/min
Pro
$199 /month
25,000 pages/mo
  • Everything in Starter
  • 120 req/min
  • 50MB files
  • Priority support
Business
$499 /month
100,000 pages/mo
  • Everything in Pro
  • 300 req/min
  • White-label
  • Dedicated support

Google Document AI

$0.01 – $0.065/page

Complex GCP setup. Enterprise pricing. Overkill for startups and indie projects.

Doktral

$0.005 – $0.01/page

Same IBM-grade engine. One API call. Free tier. 10x cheaper. Built for developers.

Azure Doc Intelligence

$0.01 – $0.05/page

Azure account required. Complex auth. Enterprise-first. Heavy for simple use cases.

FAQ
Questions.
What file formats do you support?+
PDF, DOCX, PPTX, XLSX, HTML, Markdown, plain text, PNG, JPG, TIFF, BMP, and CSV. We add formats based on demand — just ask.
How is this different from using Docling directly?+
Doktral handles all the infrastructure: Docker deployment, scaling, GPU management, API auth, usage tracking, rate limiting, and error handling. You send a file, get structured data back. No DevOps required.
Do you store my documents?+
No. Documents are processed in memory and deleted immediately after parsing. We store only usage counts for billing. No logging of content. No training on your data.
What happens when I hit my page limit?+
Free tier: requests are blocked until next month. Paid plans: overage billing kicks in at $0.008/page (Starter) so you never lose service.
Do you support OCR?+
Yes. Scanned PDFs and images are automatically detected and processed with OCR. Multi-language support included. No extra configuration needed.
Can I use this for RAG pipelines?+
Absolutely. Doktral's Markdown output is specifically optimized for RAG — clean structure, proper heading hierarchy, and extracted tables that LLMs can reason over effectively.
What's your uptime?+
We target 99.9% uptime. Our API runs on Railway's managed infrastructure with health monitoring and automatic restarts.
About

Built by developers,
for developers.

Doktral was born from a simple frustration: document parsing shouldn't cost a fortune or require a PhD in cloud infrastructure. We take IBM Research's world-class Docling engine and make it accessible through a dead-simple API.

Based in India. Building globally. Shipping fast.

2025
Founded
10+
Formats
37k+
Engine Stars
$0
To Start

Parse your first
document now.

Free tier. No credit card. No signup form. 30 seconds to your first API call.

Get API Key →