I would like to believe that I’m not hopelessly confused and outdated with regards to what is going on with RTCWEB. Last I checked my head is not stuck in the sand nor have I been buried under a rock for the last several years. I recently watched the February 7th, 2013 netcast talking about the data channel and questions about how it relates the SDP and the SDP ‘application m-line’.

For the love of all that is human, why is SDP part of RTCWEB efforts at all?

To be clear, I’m talking about a few specific aspects of SDP: the format, the exchange of SDP between browsers and media negotiations via the offer/answer model (and all that it implies regarding the negotiation of media streams). Come to think of it, all that makes SDP, well… SDP. I know what some will say: We need to exchange some kind of blob-like information between browsers so they can talk, that’s why SDP is used. And I would respond “of course”! Beyond arguing how arcane SDP is as a format, RTCWEB was specifically designed not to do signaling stuff at all. That part was purposefully (and wisely IMHO) left out of the specification so that the future was wide open for whatever it might hold in creative solutions.

So why was it so wise to take out call flow signaling but then decide to keep media signaling, specifically the SDP and the offer/answer model? To be clear, I’m not suggesting we dump SDP in favour of something else, like JSON blobs or JavaScript structured data with unspecified exchange formats. Nor am I’m proposing a stateless exchange of SDP (breaking offer/answer). I’m saying that it’s operating entirely at too high a level.

What we really need in order to do future stuff in the browsers (yet remain compatible with the past) is a good API for a lower level media engine to create, destroy, control and manage media streams. That’s it. Write an engine that doesn’t take SDP, but manages much lower level streams and allow the programmer to dictate how they are plumbed together, which are active and inactive, and give events for the streams as they progress.

There are sources for audio/video such as microphones, camera or pre-recorded sources. There are destinations, like the speaker and the video monitor or perhaps even a recording destination. Then there’s the wiring and controlling of the pipelines between mixers, codecs engines and finally RTP streams that can be opened given the proper information (basically a set of ICE candidates that can optionally be updated e.g. trickle ICE). Media engines are well-understood things with many reference implementations to draw upon and abstract for the sake of JavaScript.

That’s the API I want. There’s no SDP offer/answer needed. There’s no shortage of really smart people out there who would know how to produce a great API proposal.

Some will argue that this is way too complex for JavaScript programmers. Nonsense! The stuff these JavaScript guys can do is mind blowing. Plus, it’s vital and necessary to allow for incredible flexibility and power for future protocols, future services (including in browser peer-to-peer conferencing or multiparty call control), while maintaining existing compatibility to legacy protocols. Yes, there are some JavaScript programmers that would be too intimidated to touch such an API. That’s completely fine. The ambitious smart guys will write the wrappers for those who want a simple “give me a blob to give to the remote side”.

There’s no security threat introduced by managing streams with a solid API. As long as ICE is used to verify there are two endpoints who agree to talk and user listening acquisition is granted, why shouldn’t the rest be under the control of JavaScript?

Likewise, this will not make any more silos between the browsers than those that already exist because basically both sides need to have the same signaling regardless. Is it really a big stretch that both sides would have the same JavaScript to run the media engines too?

Such an API would lower the bar of browsers being able to interoperate at the media level. This removes the concerns about SDP compatibility issues (including the untold extensions that will happen to handle more powerful features and all that it implies and complex behaviours associated with SDP offer/answer, including rollback and ‘m=’ stability). If the browsers support RTP, ICE and codecs, and can stream then they are pretty much compatible even if individually their API sets aren’t up to par to their counterparts.

This also solves an issue regarding the data channel. There is no need for the data channel to be tied to an offer/answer exchange in the media at all. They are separate things entirely (as well they should be). For example, in Open Peer’s case the data channel gets formed in advanced to maintain our document subscription model between peers and media channels are open and closed as required.

Those who still want to do full on SDP can do SDP. Those who want stateless SDP-like exchanges can do exactly that. Those who want to negotiate media once and leave the streams alone can do so.

Perhaps there are those that would argue that the JavaScript APIs build the SDP hidden behind the scenes and the SDP can be treated as an opaque type and thus the appropriate low level API already exists. But they are missing the point. The moment the SDP is required the future is tied to offer/answer.

As an example, let’s examine Open Peer’s use case. Open Peer does not have, nor does it need or want a stateful offer/answer model. It also doesn’t support or require media renegotiation. Open Peer offers the ports and codecs (including offering to multiple parties the same port sets) and establishes the desired media. Call and media state is completely separated out. From then if alternative media is needed, a new media dialog is created to replace the existing one and then a ‘quick swap’ happens and the media streams are rewired appropriately to the correct inputs and outputs without renegotiation, at least this is not a renegotiation in the offer/answer sense of the meaning. Further, Open Peer allows either side to change its media without waiting for the other party’s answer.

Media is complicated for good reason as there are many use cases. The entire IETF/W3C discussion around video constraints illustrates some of the complexities and competing desires for just one single media type. If we tie ourselves to SDP we are limiting ourselves big time, and some of the cool future stuff will be horribly hampered by it.

Let’s face it, browsers are moving toward becoming sandboxed operating systems. So why do we not give an appropriate API low level as it deserves that allows for flexible futuristic application writing? Complicated and powerful HTML 5 APIs are being well received, so why can’t the same be true for lower level RTCWEB APIs?

I know Microsoft has argued the API is too high level and they’ve even gone to the trouble of submitting their own specification with CU-RTC Web and splintering and fragmenting efforts. I don’t presume to represent this stance regarding SDP, nor will I go into the merits of their offering, but I think they are right in principle. And for saying so, I’ve got my rotten tomato and egg shield in position.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google
Consent to display content from Spotify
Sound Cloud
Consent to display content from Sound