Doktral

Capabilities

What Doktral does.

Everything you need to parse documents at scale. Nothing you don't.

001 — FORMATS

10+ document types. One endpoint.

PDF, DOCX, PPTX, XLSX, HTML, Markdown, TXT, PNG, JPG, TIFF, BMP, CSV. Send anything — we parse it. No format-specific endpoints. No configuration. Just POST.

002 — TABLES

Automatic table extraction

Tables detected and extracted as structured arrays with headers and rows. Ready for spreadsheets, databases, or data pipelines.

003 — OUTPUT

Three output modes

Markdown for LLMs and RAG. JSON for structured processing. Plain text for search. Choose per request.

004 — OCR

Built-in OCR

Scanned PDFs and images auto-detected and processed. Multi-language support. No extra config.

005 — METADATA

Rich document metadata

Page count, word count, file type, size, title, author. Returned with every parse. Zero extra calls.

006 — ENGINE

IBM Research Docling

MIT licensed. Linux Foundation backed. 37k+ stars. Used by enterprises worldwide. Not a toy parser.

007 — PRIVACY

Your documents are never stored.

Processed in memory. Deleted immediately after parsing. No logging. No training on your data. GDPR-conscious by design.

008 — AUTH

Simple API keys

Bearer token auth. One POST to create a key. No OAuth. No signup forms. No SDKs required.

Live Demo

Drop a document. Watch it parse. Real API. Real-time. No mockups.

↑

Drop a file here
or click to browse

PDF · DOCX · PPTX · XLSX · HTML · PNG · JPG

// Drop a file above to see parsed output
// Calls our live production API at api.doktral.com
// No tricks — real document parsing

Quick Start

Six lines. Any language.

If it speaks HTTP, it works with Doktral.

                parse_invoice.py
                
            

import requests

# Get your free API key
key = requests.post("https://api.doktral.com/v1/keys",
    json={"name": "my-app"}).json()["api_key"]

# Parse any document
r = requests.post("https://api.doktral.com/v1/parse",
    headers={"Authorization": f"Bearer {key}"},
    files={"file": open("invoice.pdf", "rb")}).json()

print(r["content"])   # → Clean markdown
print(r["tables"])    # → Extracted tables
print(r["metadata"])  # → Page count, word count, etc.
            

Pricing

Transparent. Predictable.

No surprises. No hidden fees. Start free.

Free

$0 forever

100 pages/mo

All formats
Markdown + JSON
Table extraction
10 req/min

Starter

$49 /month

5,000 pages/mo

Everything in Free
60 req/min
25MB files
Email support
$0.008 overage

Pro

$199 /month

25,000 pages/mo

Everything in Starter
120 req/min
50MB files
Priority support

Business

$499 /month

100,000 pages/mo

Everything in Pro
300 req/min
White-label
Dedicated support

Google Document AI

$0.01 – $0.065/page

Complex GCP setup. Enterprise pricing. Overkill for startups and indie projects.

$0.005 – $0.01/page

Same IBM-grade engine. One API call. Free tier. 10x cheaper. Built for developers.

Azure Doc Intelligence

$0.01 – $0.05/page

Azure account required. Complex auth. Enterprise-first. Heavy for simple use cases.

FAQ

Questions.

What file formats do you support?+

PDF, DOCX, PPTX, XLSX, HTML, Markdown, plain text, PNG, JPG, TIFF, BMP, and CSV. We add formats based on demand — just ask.

How is this different from using Docling directly?+

Doktral handles all the infrastructure: Docker deployment, scaling, GPU management, API auth, usage tracking, rate limiting, and error handling. You send a file, get structured data back. No DevOps required.

Do you store my documents?+

No. Documents are processed in memory and deleted immediately after parsing. We store only usage counts for billing. No logging of content. No training on your data.

What happens when I hit my page limit?+

Free tier: requests are blocked until next month. Paid plans: overage billing kicks in at $0.008/page (Starter) so you never lose service.

Do you support OCR?+

Yes. Scanned PDFs and images are automatically detected and processed with OCR. Multi-language support included. No extra configuration needed.

Can I use this for RAG pipelines?+

Absolutely. Doktral's Markdown output is specifically optimized for RAG — clean structure, proper heading hierarchy, and extracted tables that LLMs can reason over effectively.

What's your uptime?+

We target 99.9% uptime. Our API runs on Railway's managed infrastructure with health monitoring and automatic restarts.

About

Built by developers,
for developers.

Doktral was born from a simple frustration: document parsing shouldn't cost a fortune or require a PhD in cloud infrastructure. We take IBM Research's world-class Docling engine and make it accessible through a dead-simple API.

Based in India. Building globally. Shipping fast.

2025

Founded

10+

Formats

37k+

Engine Stars

To Start

Documents in,
structured data
out.

10+ document types. One endpoint.

Automatic table extraction

Three output modes

Built-in OCR

Rich document metadata

IBM Research Docling

Your documents are never stored.

Simple API keys

Live Demo

Google Document AI

Doktral

Azure Doc Intelligence

Built by developers,
for developers.

Parse your first
document now.

Documents in,structured dataout.

10+ document types. One endpoint.

Automatic table extraction

Three output modes

Built-in OCR

Rich document metadata

IBM Research Docling

Your documents are never stored.

Simple API keys

Live Demo

Google Document AI

Doktral

Azure Doc Intelligence

Built by developers,for developers.

Parse your firstdocument now.

Documents in,
structured data
out.

Built by developers,
for developers.

Parse your first
document now.