Simple peer-to-peer use cases are rare.
The best thing about WebRTC? The fact that you can whip up your editor, fork a github project of 100 odd lines of code, and have a "Hello World" sample up and running in an hour or less. I did that on a Raspberry Pi more than once.
The worst thing about WebRTC? That the route from a demo or a proof of concept to a solid service is just as long as it always used to be.
Take SaferMobility’s use case - seemingly, a rather benign one. Simplifying it a bit, a student opens the SaferMobility app on his smartphone to make an emergency video call to the campus police. This video call is then received on a browser somewhere and the location of the smartphone is shared with the campus police.
It's a pure one-on-one call, with the addition of location sharing (=sending 2 numbers) and the need to port WebRTC to mobile. Where does that MediaServer XMS comes into play? Why do we even need it?
It turns out that the peer-to-peer model doesn’t work that well here. There is an additional requirement that needs to be addressed - the ability to record calls:
- The recordings need to be retrievable somehow
- They also need to be secured and accessed by authorized personnel only
There are ways to record calls on the client side - either on the mobile phone or the browser, but these are far from perfect and rely too much on aspects we have no control over, such as the amount of local storage available and the time it needs to upload the recordings for archive.
So we need to record it somewhere in our data center. This being the case, we might as well route the calls through that recording server and be done with it.
If we are recording, we probably want to be able to view the end result as well. Viewing doesn’t happen in WebRTC (at least not if you are looking for more flexibility and freedom). So we need to change formats from WebRTC’s SRTP network packetization to a consumable file format - WebM or MP4 or AVI. To make things worse, we probably need to transcode. Most WebRTC services today are VP8-based, while most video streaming devices use H.264 - to get ubiquity, we need to transcode our VP8 WebRTC video to H.264 video.
That playback file? Wouldn’t it be nice if we’ve had the metadata of the session stitched to it as well? Maybe glue the location in the archived recording in the beginning, or in a picture-in-picture format. Maybe have a recording timestamp show in one of the corners of the video? Since we’re already routing the media and transcoding it, there’s not too much effort involved in adding those.
In the future, we may need to be able to escalate incidents, add a third person into the call, etc.
Where does that all lead us? We started with a simple peer-to-peer scenario, added a recording requirement to it, and we are now in the land of media processing, which includes:
- File formats
- Text/image overlay
That’s the difference between that cute "Hello World" proof-of-concept we started with, and the production service. In many cases, that difference entails the use of a media server.