Docling is an open-source toolkit for document conversion and understanding, designed to parse diverse document formats into a unified structured representation for downstream AI workflows. It is described in the Docling Technical Report and in the newer paper Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion.
Code, installation instructions, examples, and integrations are available in the official GitHub repository.
Docling is a document processing framework rather than a single model. Its purpose is to convert heterogeneous document inputs into a common, richly structured representation that can be used in retrieval, extraction, agentic AI, and other document-centric pipelines. It supports a broad range of input types, including PDF, DOCX, PPTX, XLSX, HTML, images, LaTeX, WebVTT, audio formats, and plain text.
A central strength of Docling is its advanced PDF understanding pipeline. It combines specialized AI components for document layout analysis, reading order, table structure recognition, OCR, and related parsing tasks, while exposing a unified DoclingDocument representation and multiple export formats such as Markdown, HTML, JSON, DocTags, and WebVTT. It is designed for both local execution and integration into modern GenAI ecosystems.
Key traits of Docling:
- Multi-format document parsing: Supports a wide variety of office, web, image, audio, and text formats.
- Advanced PDF understanding: Handles layout, reading order, tables, formulas, code, and document structure.
- Unified document representation: Converts outputs into a common structured DoclingDocument format.
- Multiple export options: Supports Markdown, HTML, JSON, DocTags, WebVTT, and more.
- GenAI integration: Connects with frameworks such as LangChain, LlamaIndex, Haystack, and agentic systems.

Figure 1 (from the technical report and project materials) illustrates the workflow of Docling:
- A source document is provided from one of many supported formats, such as PDF, DOCX, HTML, image, or audio.
- Docling applies specialized parsing and understanding steps, including layout analysis, OCR, table extraction, and structural interpretation.
- The parsed content is transformed into a unified DoclingDocument representation.
- This structured representation can then be exported into various downstream formats such as Markdown, HTML, or lossless JSON.
- The resulting outputs are intended to feed search, extraction, RAG, and agent-based AI workflows.
Docling is intended for:
- Document conversion for downstream AI and search pipelines.
- PDF understanding in research, enterprise, and knowledge-management workflows.
- RAG and agentic AI preprocessing, where documents need to be turned into structured machine-readable content.
- Local or privacy-sensitive document processing in environments where cloud upload is not acceptable.
Limitations:
- Docling is a toolkit, not a single end-to-end universal model, so results depend on the selected parsing pipeline and underlying components.
- Complex documents with difficult layouts, poor scan quality, or unusual formatting may still require tuning or validation.
- Individual integrated models may have their own separate licenses or usage terms, even though the core Docling codebase is MIT-licensed.
- Some newer capabilities are explicitly marked as beta or coming soon, so feature maturity may vary across tasks.
¶ BibTeX entry and citation info
@techreport{Docling,
author = {Deep Search Team},
month = {8},
title = {Docling Technical Report},
url = {https://arxiv.org/abs/2408.09869},
eprint = {2408.09869},
doi = {10.48550/arXiv.2408.09869},
version = {1.0.0},
year = {2024}
}