Building AR Applications with Unity and IBM Watson

Over the last days I’ve enjoyed playing with Unity and the IBM Watson SDK, which allows using cognitive services like speech recognition in Unity projects. With this technology you can not only build games, but also other exciting scenarios. I’ve extended an Augmented Reality application from my colleague Amara Keller which allows iOS users to have conversations with a virtual character.

The picture shows a printed piece of paper with a pattern. When using the app, the 3D character shows up on the paper. Users can have conversations with the character, for example:

  • User: How is the weather?
  • Virtual character: In which location?
  • User: Munich
  • Virtual character: The temperature in Munich is currently 24 degrees.
  • User: How is the weather in Berlin?
  • Virtual character: The temperature in Berlin is currently 28 degrees.

Check out the video for a short demo.

Get the code from GitHub.

Technically the following services and tools are used:

The main logic is in this file. Let’s take a look how to use the Speech To Text service as an example. First you need to initialize the service with credentials you can get from the IBM Cloud. The lite account offers access to the Watson services, doesn’t cost anything and you don’t even have to provide a credit card.

SpeechToText _speechToText;
Credentials credentials = new Credentials(WATSON_SPEECH_TO_TEXT_USER, WATSON_SPEECH_TO_TEXT_PASSWORD, "https://stream.watsonplatform.net/speech-to-text/api");
_speechToText = new SpeechToText(credentials);

Next you start listening by invoking StartListening and defining some options:

_speechToText.DetectSilence = true;
_speechToText.EnableWordConfidence = false;
_speechToText.EnableTimestamps = false;
_speechToText.SilenceThreshold = 0.03f;
_speechToText.MaxAlternatives = 1;
...
_speechToText.StartListening(OnSpeechToTextResultReceived, OnRecognizeSpeaker);

The callback OnSpeechToTextResultReceived gets the spoken text as input:

private void OnSpeechToTextResultReceived(SpeechRecognitionEvent result, Dictionary<string, object> customData) {
  if (result != null && result.results.Length > 0) {
    foreach (var res in result.results) {
      foreach (var alt in res.alternatives) { 
        SendMessageToConversation(alt.transcript);                    
      }
    }
  }
}

The application also showcases how to use Watson Assistant and Watson Text To Speech in addition to Watson Speech To Text. Check out the open source project for details.

One important thing to keep in mind when using the three Watson services together is the timing. For example you should stop recording before playing an audio clip received from Watson Text To Speech so that Watson doesn’t listen to itself. Also you need to make sure to only play one clip at a time.

I’m neither a Unity, nor a C# expert. So I’m sure there are better ways to do this. Below is how I’ve solved this. I start the recording again only after the duration of the audio clip.

private void OnSynthesize(AudioClip clip, Dictionary<string, object> customData) {      
  GameObject audioObject = new GameObject("AudioObject");
  AudioSource source = audioObject.AddComponent<AudioSource>();
  source.loop = false;
  source.clip = clip;
  source.Play();
  Invoke("RecordAgain", source.clip.length);
  Destroy(audioObject, clip.length);
}

Want to run this sample yourself? Try it out on theĀ IBM Cloud.

  • pmoskovi

    Cool demo, Niklas. As an idea: could you animate the AR figure, give her directions (e.g.: sit down, stand up), and have a conversation with her around that? That way you could connect VR and speech.

    • Niklas Heidloff

      Hi Peter, I was thinking the same thing. Changing position should be easy. I will look for a free 3D animated character.