Dialogic Blog

How to Build a WebRTC Gateway and Integrate IBM Watson Speech-to-Text Services

by Vince Puglia

Sep 21, 2016 9:38:04 AM

WebRTC and Cognitive Computing are two technologies that are undoubtedly transforming industries but perhaps the velocity of change comes from how easy it is to integrate these two technologies. Cloud-Platforms-as-a-Service (CPaaS) companies like Telestax are enabling web developers to easily add real-time communications to any application through their rapid service creation environment RestComm. Then Cognition-as-a-Service companies like IBM Watson continue to push what artificial intelligence can achieve and provide an easy RESTFul based interface. Combine these two elements and get a cognitive communications that can be used in almost any vertical. From emotional analysis in the contact center to ensure customers are satisfied after every call to smartbot agents capable of handling customer inquiries via SMS - it's all possible with cognitive communications.

This 'how-to' will be a step-by-step guide on building a WebRTC service that interacts with IBM Watson. But this only the tip of the iceberg on what is possible. If you are interested in learning more about WebRTC and adding real-time communications to applications then you may want to check out the upcoming Developer Workshop: http://www.dialogic.com/en/landing/webrtc-cloud-communications-developer-workshop.aspx

Tweet me your questions/suggestions and enjoy! 

//Vince

@vfpuglia

OVERVIEW: 

The basis for this 'how-to' is a WebRTC gateway that uses the IBM Watson Speech-to-Text services to transcribe the entire call and email the transcription for later usage. The call flow is initiated by a WebRTC call from a browser to the RestComm cloud platform. RestComm then triggers an outbound SIP call to the provider to ring the PSTN phone. Media is handled and transcoded by the Dialogic PowerMedia XMS - the WebRTC side will be using Opus whereas the SIP side will be using G.711. Additionally, the PowerMedia XMS will be recording the entire call. Once the call is completed, the RestComm application will call an external PHP script that interfaces with the IBM Watson services. The PHP script will upload the recorded call to Watson and wait for the reply. Lastly, upon reply the PHP script will return the transcribed text back to RestComm which will finish by emailing the conversation for later usage. Sound complex?.... it's not with the right tools. So let's get started! 

How to build a webrtc gateway

 

PREREQUISITES: 

This 'how-to' will be requiring the following already in place: 
  • IBM BlueMix account 
  • Telestax RestComm Powered By Dialogic XMS account
    • Email me for trial account: vincent.puglia@dialogic.com
  • Web Server with PHP support 

 

PART 1: CREATING THE IBM WATSON SPEECH-TO-TEXT SERVICE

 1.) First log into the IBM BlueMix console (https://console.ng.bluemix.net/) and create a new service. 

watson-1.png

 2.) A full list of IBM BlueMix supported services will appear. Narrow down the list by searching for 'speech'. Select the 'Speech to text' service. 

watson-2.png

 3.) Give your service a name and select create. Note the first 1,000 minutes of the service are free with a charge thereafter. 

watson-3.png

 4.) Once the service is created, note the URL, username and password generated by the service. These will be needed in the next part. 

watson-4.png

 

PART 2: CREATING THE PHP SCRIPT FOR INTERFACING WITH IBM WATSON

 1.) The PHP script is what connects the IBM Watson service to the RestComm cloud platform. There are (3) sections to the PHP script. 

First the recording produced by PowerMedia XMS needs to be copied over to the local web server

Next the recording needs to be pushed up to the IBM Watson service for speech to text conversion. Be sure to enter the username and password generated in part 1 of this guide in this section. 

Lastly the reply from the IBM Watson service is in JSON format and divided up with confidence scoring. The confidence scoring is outside the scope of this 'how-to' but we'll need to combine the transcription chunks and return the entire content back to RestComm. 

Note - this PHP script will need to be uploaded to your web server and the URL noted for the next part. 

/******************* COPY ENTIRE CONTENTS BELOW *************************/ 

<?php

$file = $filePath=$_REQUEST['file'];

$newfile = 'recording_'.date('m-d-Y_hia-') . rand(1, 1000) .'.wav';

if ( copy($file, $newfile) ) {
//echo "Copy success!";
}else{
//echo "Copy failed.";
}

$username = "XXXXXXX";
$password = "XXXXXXX";
$url = 'https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_NarrowbandModel&continuous=true';
$file = fopen($newfile, 'r');
$size = filesize($newfile);
$fildata = fread($file,$size);
$headers = array( "Content-Type: audio/wav",
);

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERPWD, "$username:$password");
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch,CURLOPT_TIMEOUT,60000);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fildata);
curl_setopt($ch, CURLOPT_INFILE, $file);
curl_setopt($ch, CURLOPT_INFILESIZE, $size);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$executed = curl_exec($ch);
curl_close($ch);

$response_decode = (json_decode($executed, true));

echo "'";

for ($x = 0; $x <=1000; $x++) {
$transcript = $response_decode['results'][$x]['alternatives']['0']['transcript'];
if (is_null($transcript)){
break;
}
else{
echo htmlspecialchars($transcript);
}
}

echo "'";

?>

/****************************** STOP **********************************/ 

 

PART 3:  CREATING THE TELESTAX RESTCOMM APPLICATION

 1.) Log into the RestComm Visual Designer (https://tadhack.restcomm.com/restcomm-rvd/#/login) and create a new project called 'WGW_watson'. 

create_project.png

 

2.) Click the '+' button to create new modules within your new application project. The WebRTC gateway application will have (4) modules: 

Welcome - prompts the caller and gathers the (11) digit phone number to be dialed

dialNumber - dials the number and initiates the record

contactWatson - calls the PHP script with the recording to be transcribed by IBM Watson

emailTranscript - sends the transcribed text out via email 

create-modules.png
3.) In the 'Welcome' module, drag the 'say' verb into the module and enter the prompt for the caller to enter the (11) digit phone number. Next drag a 'collect' verb into the module, change the radio button to be 'collect digits' and assign those digits to 'numToDial'. Be sure to change the scope of this variable to 'application' as we'll be using this later. Change the 'continue to' to be 'dialNumber' and lastly change the 'Finish on Key' to be '#'

welcome-module.png

 

 4.) In the 'dialNumber' module, drag the 'Dial' verb into the module followed by dragging the 'number' section into the verb. In the phone number to dial section, change to be '$numToDial' which is the variable we defined in the 'Welcome' module. Select to 'continue to' to be 'contactWatson' and lastly change the recording to be 'Yes'. 

dialNumber-module.png

 5.)  In the 'contactWatson' module, drag the 'External Service' verb into the module and enter the URL of the PHP script created in Part 2 of this guide. Select to 'Add service parameter' and give a 'Name' of 'file' and a value of '$core_PublicRecordURL'. This will pass the public URL of the recorded file up to the PHP script so it can send it to IBM Watson for processing. Assign the response of the PHP script to a variable. Change the 'Assign to' to be 'transcript', 'scope' to be 'application' and 'value' to be the type. Lastly, change the 'continue to' section to be fixed to  'emailTranscript'. 

contactWatson-module.png

  

 6.) In the 'emailTranscript' module, drag the 'email' verb into the module. In the 'email content' section use the '$transcript' variable to send the call transcription. The 'core_PublicRecordingURL' can also be used to send a link to the recorded audio. Enter a email subject as well as TO/FROM fields. 

Be sure to SAVE the entire project before moving forward. 

export-module.png

 7.) Last step before testing is to link the WebRTC application you just created to a number. Log into the RestComm Dashboard (https://tadhack.restcomm.com/#/login) and select 'Numbers' at the top followed by 'Register Number'. 

register-number-1.png

 

 

 8.) Enter a number for the application and give it a friendly name for reference. For mine, I chose '878787'. Select 'Register Number' when finished. 

register-number-2.png

9.)  Confirm the number by selecting 'Register' 

register-number-3.png
 
10.) And lastly, link the number registered with the application by selecting 'WGW_watson' in the drop down. Be sure to save the changes.  
register-number-4.png
 
PART 4:  TESTING THE WEBRTC GATEWAY WITH IBM WATSON SPEECH-TO-TEXT SERVICE
 
 1.) For purposes of this 'how-to', I will be using the built in WebRTC client called Olympus (https://tadhack.restcomm.com/olympus/#/) but you can create your own client, widget or mobile application using the client SDK's: https://github.com/RestComm/restcomm-web-sdk
 
 Open the Olympus WebRTC client in Chrome and use the default 'Alice' username & '1234'  password. 
olympus-1.png
 
 
 2.) Select 'Contacts' on the left hand menu and 'Add' contact '878787' to the list. 
olympus-2.png
 
3.) Click on your newly added contact and make an audio call. You should now hear the 'Welcome' prompt as created in part 4 of this guide. Click the 'keypad' on the left hand side menu to enter the number to dial followed by pound. 
olympus-3.png
 
4.)  The APP will then dial the 11 digit number and record the call for processing by IBM Watson. Below is a sample result from the email service with the transcript. 
 
 Screen_Shot_2016-09-01_at_11.48.15_AM.png
 

And that's it!.... email or tweet me with any questions. 

Thanks!

 //Vince

@vfpuglia


Follow along the tutorial with Vince Puglia in this video:  

Topics: WebRTC, Guides: How-to's, Infographics, and more