heidloff.net - Building is my Passion
Post
Cancel

Deploying Docling on watsonx.ai

Docling is a popular open-source project contributed by IBM. It supports easy and fast parsing of PDFs and several other file types including images. Docling can be run via containers and deployed to Kubernetes, OpenShift and watsonx.ai.

Here are the high-level features from the Docling repo:

  • Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
  • Advanced PDF document understanding including page layout, reading order & table structures
  • Unified, expressive DoclingDocument representation format
  • Easy integration with LlamaIndex & LangChain for powerful RAG / QA applications
  • OCR support for scanned PDFs

Docling Serve allows running Doclink as a container and deployments to Kubernetes-based systems. Deployments of Docling as a service on Kubernetes-based systems increase the deployment complexity and increase network utilization. However, these types of deployments allow re-use from various projects and better scalability.

Container

There are images that contain the full Doclink functionality including various models.

1
podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve

Swagger UI:

image

Additionally, there is a better user interface under the path ‘/ui’. The screenshot at the top of this post shows the input parameters and the next screenshot shows the output:

image

Deployment

Deployments to watsonx.ai can be done via the following commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
oc login ...
oc new-project doclink-serve
git clone https://github.com/docling-project/docling-serve.git
cd doclink-serve
NAMESPACE=doclink-serve
kubectl apply -f docs/deploy-examples/docling-serve-oauth.yaml

DOCLING_NAME=docling-serve
DOCLING_ROUTE="https://$(oc get routes ${DOCLING_NAME} --template=)"
OCP_AUTH_TOKEN=$(oc whoami --show-token)
curl -X 'POST' \
  "${DOCLING_ROUTE}/v1alpha/convert/source/async" \
  -H "Authorization: Bearer ${OCP_AUTH_TOKEN}" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
  }'

After the deployment the resources are displayed in the OpenShift console.

image

Configuration

For production scenarios doclink-serve can be configured. The API allows programmatic access to all features.

Next Steps

Here are some resources:

To learn more, check out the Watsonx.ai documentation and the Watsonx.ai landing page.

Featured Blog Posts
Disclaimer
The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.
Contents
Trending Tags