Docling is a popular open-source project contributed by IBM. It supports easy and fast parsing of PDFs and several other file types including images. Docling can be run via containers and deployed to Kubernetes, OpenShift and watsonx.ai.
Here are the high-level features from the Docling repo:
- Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
- Advanced PDF document understanding including page layout, reading order & table structures
- Unified, expressive DoclingDocument representation format
- Easy integration with LlamaIndex & LangChain for powerful RAG / QA applications
- OCR support for scanned PDFs
Docling Serve allows running Doclink as a container and deployments to Kubernetes-based systems. Deployments of Docling as a service on Kubernetes-based systems increase the deployment complexity and increase network utilization. However, these types of deployments allow re-use from various projects and better scalability.
Container
There are images that contain the full Doclink functionality including various models.
1
podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve
Swagger UI:
Additionally, there is a better user interface under the path ‘/ui’. The screenshot at the top of this post shows the input parameters and the next screenshot shows the output:
Deployment
Deployments to watsonx.ai can be done via the following commands:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
oc login ...
oc new-project doclink-serve
git clone https://github.com/docling-project/docling-serve.git
cd doclink-serve
NAMESPACE=doclink-serve
kubectl apply -f docs/deploy-examples/docling-serve-oauth.yaml
DOCLING_NAME=docling-serve
DOCLING_ROUTE="https://$(oc get routes ${DOCLING_NAME} --template=)"
OCP_AUTH_TOKEN=$(oc whoami --show-token)
curl -X 'POST' \
"${DOCLING_ROUTE}/v1alpha/convert/source/async" \
-H "Authorization: Bearer ${OCP_AUTH_TOKEN}" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
}'
After the deployment the resources are displayed in the OpenShift console.
Configuration
For production scenarios doclink-serve can be configured. The API allows programmatic access to all features.
Next Steps
Here are some resources:
- SmolDocling: Vision-Language Model for Document Conversion
- Open Source Document Parser including OCR
- Docling Repo
- Paper
- SmolDocling : Streamlined OCR Document Conversion and Lightweight Understanding
- IBM and Hugging Face Researchers Release SmolDocling: A 256M Open-Source Vision Language Model for Complete Document OCR
To learn more, check out the Watsonx.ai documentation and the Watsonx.ai landing page.