Running IBM Watson Speech To Text in Minikube

IBM Watson NLP (Natural Language Understanding) and Watson Speech containers can be run locally, on-premises or Kubernetes and OpenShift clusters. Via REST and WebSockets APIs AI can easily be embedded in applications. This post describes how to run Watson Speech To Text locally in Minikube.

To set some context, check out the landing page IBM Watson Speech Libraries for Embed.

The Watson Speech To Text library is available as containers providing REST and WebSockets interfaces. While this offering is new, the underlaying functionality has been used and optimized for a long time in IBM offerings like the IBM Cloud SaaS service for STS and IBM Cloud Pak for Data.

To try it, a trial is available. The container images are stored in an IBM container registry that is accessed via an IBM Entitlement Key.

How to run STS locally via Minikube

My post Running IBM Watson Speech to Text in Containers explained how to run Watson STT locally in Docker. The instructions below describe how to deploy Watson Speech To Text locally to Minikube via kubectl and yaml files.

First you need to install Minikube, for example via brew on MacOS. Next Minikube needs to be started with more memory and disk size than the Minikube defaults. I’ve used the settings below which is more than required, but I wanted to leave space for other applications. Note that you also need to give your container runtime more resources. For example if you use Docker Desktop, navigate to Preferences-Resources to do this.

$ brew install minikube 
$ minikube start --cpus 12 --memory 16000 --disk-size 50g

The namespace and secret need to be created.

$ kubectl create namespace watson-demo
$ kubectl config set-context --current --namespace=watson-demo
$ kubectl create secret docker-registry \
--docker-server=cp.icr.io \
--docker-username=cp \
--docker-password=<your IBM Entitlement Key> \
-n watson-demo \
ibm-entitlement-key

Clone a repo with the Kubernetes yaml files to deploy Watson Speech To Text.

$ git clone https://github.com/nheidloff/watson-embed-demos.git
$ kubectl apply -f watson-embed-demos/minikube-speech-to-text/kubernetes/
$ kubectl get pods --watch

To use other models, modify deployment.yaml.

- name: watson-stt-en-us-telephony
  image: cp.icr.io/cp/ai/watson-stt-en-us-telephony:1.0.0
  args:
  - sh
  - -c
  - cp model/* /models/pool2
  env:
  - name: ACCEPT_LICENSE
    value: "true"
  resources:
    limits:
      cpu: 1
      ephemeral-storage: 1Gi
      memory: 1Gi
    requests:
      cpu: 100m
      ephemeral-storage: 1Gi
      memory: 256Mi
  volumeMounts:
  - name: models
    mountPath: /models/pool2

When you open the Kubernetes Dashboard (via ‘minikube dashboard’), you’ll see the deployed resources. The pod contains the runtime container and four init containers (two specific models, a generic model and a utility container).

To invoke Watson Speech To Text, port forwarding can be used.

$ kubectl port-forward svc/ibm-watson-tts-embed 1080

Invoke the REST API with a sample audio file.

$ curl "http://localhost:1080/speech-to-text/api/v1/recognize" \
   --header "Content-Type: audio/wav" \
   --data-binary @watson-embed-demos/demo.wav
{
   "result_index": 0,
   "results": [
      {
         "final": true,
         "alternatives": [
            {
               "transcript": "ibm watson speech to text can easily be embedded in applications",
               "confidence": 0.85
            }
         ]
      }
   ]
}

To find out more about Watson Speech To Speech and Watson for Embed in general, check out these resources: