News

In the Trenches with RTCWEB and Real-time Video

The concept of video streaming seems extraordinarily simple. One side has a camera and the other side has a screen. All one has to do is move the video images from the camera to the screen and that’s it. But alas, it’s nowhere near that simple.

Cameras are input sources, but they have a variety of modes for which they can operate. Each camera has it’s own dimensions (width/height), aspect ratio (the ratio of width/height) and frame rate. Cameras are often capable of recording at selectable input formats, for example, SD, or HD formats, which dictate their pixel dimensions and aspect ratio sizes (e.g. 4:3 or 16:9). If a camera opens in one format and switches to another there can be a time penalty before the video starts streaming again from the camera (thus switching modes needs to be minimized or avoided entirely). On portable devices, the camera can be orientated in a variety of ways and dynamically change its pixel dimensions and aspect ratio on the fly as the device is physically rotated.

Some devices have multiple camera inputs (e.g. front camera or rear camera). Each source input need not be identical in dimensions nor capability and the user can choose the input on the fly. Further, there are even cameras that record multiple angles (e.g. 3D) simultaneously, but I’m not sure if that should be covered right now even though 3D TVs are all the rage (at least from Hollywood’s perspective).

If I could equate cameras to a famous movie quote: they are like a box of chocolates, you never know what you are going to get.

Cameras aren’t the only sources though. Pre-recorded video can be used as a source just as much as a camera. Pre-recorded video has a fixed width, height and aspect ratio, but it must be considered as a potential video source.

The side receiving video typically renders the video onto a display. These output displays are also known as a type of video sink. There are other types of video sinks though, such as a video recording sink or even videoconferencing sink. Each has it’s own unique attributes that vary and the output width and height of these video sinks vary greatly.

Some video recording sinks work best when they receive the maximum resolution possible. While others might desire a fixed width/height (as it’s intended for later viewing on a particular fixed size output device). When video is displayed in a webpage, it might be rendered to a fixed width/height or there might be flexibility in the size of the output. For example, the page can have a fixed width, but the video height could be adjustable (up to a maximum viewable area), or vice versa with the width being the adjustable axis. In some cases both dimensions can adjust automatically larger or smaller.

Some output areas are adjustable in size when manually manipulated by a user. In such cases the user can dynamically resize the output areas larger or smaller as desired (up to a maximum width and height). Other output screens are fixed in size entirely and can never be adjusted. Still other devices adjust their output dimensions based upon the physical rotation of the device.

The problem is how do you fit the source size into the video sink’s area? A camera can be completely different in dimensions and aspect ratio than the area for the video sink. The old adage “how do you fit a square peg in a round hole” seems to apply.

In the simplest case, the video source (camera) would be exactly the same size as the output area or the output area would be adjustable in the range to match the camera source. But what happens when they don’t match?

The good news is that video can be scaled up or down to fit. The bad news is that scaling has several problems. Making a small image into a big image makes the image appear pixelated and ugly. Making a bigger image smaller is better (except there are consequences for processing and bandwidth).

Aspect ratio is also a big problem. Anyone who’s watched a widescreen movie on a standard screen TV (or vice versa) will understand this problem. There are basically three solutions for this problem. One solution is shrinking the wide screen to fix into the narrow and put “black bars” above and below the image, known as letterboxing (or pillarboxing on the other axis). Another solution is to expand the image large enough while maintaining aspect ratio so there are no black bars (but with the side effect that some of the image is cropped because it’s too big to fit in the viewing area). Another method is to stretch the image making images look taller or fatter. Fortunately that technique is largely discredited, although still selectively used at times.

Some people might argue that  displaying video using a letterboxing/pillarboxing technique is too undesirable to ever be used. They would prefer video was stretched to fit the display area and any superfluous video image edges are automatically cropped off. Videophiles might gasp at such a suggestion for the very idea that discarding part of an image is nearing sacrilege. In practical terms, it’s both user preference as well as context that determine which technique is best.

As an example of why context is important, consider video rendered to the entire view screen (i.e. full screen mode). In this context, letterboxing/pillarboxing might be perfectly acceptable, as those black bars become part of the background of the video terminal. In a different context, black bars in the middle of a beautifully formatted web page might be horrifically ugly and unacceptable under any circumstance.

The complexities for video are far from over. When users place video calls, the source and the video sink are often not physically located together. That means that the video has to go from the source to the video sink located on different machines/devices and across a network.

When video is transmitted across a network pipe, a few important considerations must be factored. A network pipe has a maximum bandwidth that fluctuates with usage and saturation. Attempt to send too large a video and the video will become choppy and glitch badly. Even where the network pipe is sufficiently large, bandwidth has a cost associated, thus it’s wasteful to send a super high quality image to a device that is incapable of rendering it to the original quality. To waste less bandwidth, a codec is used to compress images and thus preserve network bandwidth as much as possible (the cost being the bigger an image, the more CPU required to compress the image using the codec).

As a general rule…

  • a source should never send video images to the remote video sink that ends up being discarded or at a higher quality than the receiver is capable rendering, as it’s a waste of bandwidth as well as CPU processing power. For example, do not send HD video to a device only capable of displaying SD quality.

Too bad this general rule above has to have an exception. There are cases where the video cannot be scaled down before sending, although rare. Nonetheless, this exception cannot be ignored. Some servers offer pre-recorded video and do not scale the video at the source because doing so would require expensive hardware processing power to transcode the recorded video. Likewise, a simple device might be too underpowered or hard wired to its output format to be capable of scaling the video appropriately for the remote video sink.

The question becomes which end (source or sink) manipulates the video? And then there are the questions of how and what does each side need to know to do the right thing to get the video in the correct format for the other side?

I can offer a few suggestions that will help. Again, as to the general rules..

  • A source should always attempt to send what a video sink expects and nothing more
  • A source should never attempt to stretch the source image larger than the original source image’s dimensions.
  • If the source is incapable of adjusting the dimensions to the video sink completely, it does so as much as it is capable and then the video sink must finish the job of adjusting the image before final rendering.
  • The source must understand the video sink can change dimensions and aspect ratio anytime with a moment’s notice. As such, there must be a set of active “current” video properties the source must be aware of at all times with regard to the video sink.
  • The “current” properties include the active width and height of the video sink (or maximum width or height should the area be automatically adjustable). The area needs to be flagged as safe for letterboxing/pillarboxing or not. If the area is unable to accept letterbox or pillarbox then the image must ultimately be adjusted to fill the rendered output area. Under such a situation the source could and should pre-crop the image before sending knowing the final dimensions used.
  • The source needs to know the maximum possible resolution the output video sink is capable of producing to not waste its own CPU opening a camera at a higher resolution than will ever be possible to render (e.g. an iPad sending to an iPhone device). Unfortunately, this needs to be a list of maximum rendered output dimensions as a device might have multiple combinations (such as an iPhone device suddenly turned on its side).

I’m skeptical if a reciprocal minimum resolution is ever needed (or even possible). For example, an area may be deemed letterbox/pillarbox unsafe and the image is just too small to fit a minimum dimension (and thus would have to be stretched upon rendering). In the TV world, an image is simply stretched to fit upon output (typically while maintaining aspect ratio). Yes, a stretched image can become pixilated and that sucks, but there are smoothing algorithms that do a reasonable job within reasonable limitations. People playing DVDs on Blu-ray players with HD TVs are familiar with such processes, which magically outputs the DVD video image to the HD TV output size. Perhaps a “one pixel by one pixel” source connected to an HD (1920×1080) output would be the extreme case of unacceptable, but what would anyone expect in such a circumstance? That’s like hooking up an Atari 2600 to an HD TV. There’s only so much that can be done to smooth out the image, as the source image quality just isn’t available. But that doesn’t mean the image shouldn’t be displayed at all!

Another special case happens when a source cannot be scaled down for whatever reason before transmission and the receiving video sink is incapable of scaling it down further to display (due to bandwidth or CPU limitations on the device). The CPU limitation might be pre-known, but the bandwidth might not. In theory the sink could report failures to the source and cause a scale back in frame rate (i.e. cause the sender to send fewer images rather than smaller images). If CPU and bandwidth conditions are pre-known, then a maximum acceptable dimension and bandwidth could be elected by the video sink thus such a non dimension adjusting source must be incapable of connecting.

Aside from the difficulties in building good RTC video technology, those involved in RTCWEB / WebRTC have yet to agree on which codecs are Mandatory to Implement (MTI), which isn’t helping things at all. Since MTI Video is on the agenda for IETF 86 in Orlando maybe we will see it happen soon. If there is a decision (that’s a big IF), what is likely to happen is that there will be two or more MTI video codecs, which means we will need to support codec swapping and all the heavy lifting related thereto.

I have not even touched on the IPR issues around real-time video, but if patents around video were the only problem, perhaps RTCWEB would be ready by now. The truth is that video patents are not likely to be the biggest concern that needs to be addressed when it comes to real time video. It’s just that “doing it right” in a browser, using JavaScript, on various devices… is rather complex.

WebRTC

Mobile Video for the Enterprise: Potential and Practical Considerations

Recently, video has grabbed an impressive mindshare among consumers. A plethora of video applications including video streaming, video search, video on demand, and video telephony, including mobile video, are experiencing rapid adoption. YouTube is now the number-two search engine in the world; the tablet and smartphone markets are exploding; and video has just surpassed all other applications in terms of network traffic. The next generation of tech-savvy prosumers using some form of video in their personal lives is going to demand the same experience and capabilities in the business environment.

As mobile video gains popularity among consumers, is it likely to also become the next frontier in enterprise communications and collaboration? My colleagues Roopam Jain and Shyam Krishnan took a look at this market opportunity and presented their findings in a study titled: Assessing the Potential for Mobile Videoconferencing in the Enterprise. Here follows a summary of their key observations.

Technologies that support collaboration among users at different locations are growing in demand. There has been a surge in the interest for videoconferencing, ranging from desktop to telepresence to mobile videoconferencing. As mobility continues to become the norm in everyday life and business alike, end users are looking to extend their enterprise communication experiences to mobile devices.

Faster, smarter, and more capable smart phones and the emergence of collaboration-ready enterprise tablets are fueling the interest in mobile videoconferencing. While we believe that mainstream adoption is still a few years away, the demand drivers are all aligned for the market to pick up pace.

The 2010 worldwide shipments of tablets (partially or entirely) used for business purposes was 600,000 units and is expected to go up to 49.1 million in 2015. We project that 90% of the enterprise tablets shipped in 2015 will have forward-facing cameras and will therefore be video-enabled.

Smartphone growth will be explosive. With shipments nearing 263 million smartphones in 2010, that number is expected to grow to about 500 million in 2015. In 2015, it is forecast that 90% of the smartphones will have forward-facing cameras and therefore will be video-enabled, growing up from 35% in 2010.

The move toward 4G will help carriers deliver higher-quality video. Carriers are jockeying for a more competitive position as the mobile industry moves towards 4G networks. As high bandwidth networks become widely available and camera and phone technologies continue to improve we expect to see more mobile videoconferencing on the horizon. However, there are challenges in store. As the usage of both streaming video and 2-way video catches on with users, it threatens to strangle the networks. Recent moves by network carriers to constrain the demand with monthly data caps will be a hindrance in videoconferencing usage.

Despite all the exciting developments on the device and carrier side and the growing need to have a videoconferencing solution, enterprise-level adoption is still nascent and needs to overcome several challenges, including deployment costs, business case, and increasing levels of security for wireless communications. Security issues with mobile technology are going to be a key focus as the market develops. IT will increasingly standardize on a single smartphone/tablet for its employees. IT’s policy on locking down their enterprise mobile device of choice will continue to prompt users to carry multiple devices.

Mobile videoconferencing can potentially support a wide variety of business solutions, from retail point-of-sale to hospitality, banking, healthcare, manufacturing or any custom business application. It will increasingly support  team collaboration across the entire value chain to shorten decision making time and enable immediate knowledge sharing.

In today’s context, the main use case for mobile videoconferencing in the enterprise remains remote employee interaction – for the mobile workforce or for employees who need a visual collaboration feature to ensure the “personal touch” during the call. Additionally, mobile videoconferencing offers an extension of traditional room-based and desktop based videoconferencing and leverages existing videoconferencing investments by extending the reach to the mobile user.

In planning a mobility strategy, enterprises should increasingly look at the full spectrum of devices which include smartphones and tablets along with laptops. Providing secure communications on a broad array of devices will be essential. Additionally, users will increasingly look at extending the Unified Communications experience to their mobile devices.

At the very outset, small-scale pilots would provide a good insight into typical usage stats. Mobile videoconferencing needs to be cost-justified, prior to deployment. All the key stakeholders must look at the network as a critical component in the process – developments in LTE and 4G in general, would be key to the success of mobile videoconferencing.

What do you think?

Also check out James Brehm’s blog on mobile video here.

Next-Gen Telepresence: Offering Sky-High Benefits at a Down-to-Earth Price

Please join us for a free webinar discussing trends and issues in the nascent, yet rapidly growing Telepresence market: http://www.bulldogsolutions.net/FAS/FAS09242009/frmRegistration.aspx

Here is a brief event description:

With videoconferencing finally delivering on those many years’ worth of promises, business executives around the world are now asking themselves whether videoconferencing’s big brother, telepresence, is a better communications solution.  

Much-hyped, telepresence in some quarters has gotten an unfair reputation for being an expensive alternative that few companies actually need—and fewer can afford.  

In fact, telepresence has always been a highly flexible and practical business solution, capable of meeting the needs of companies of many shapes and sizes, while delivering a healthy return on investment and presenting a platform for new competitive advantage.  

Join Dominic Dodd of Frost & Sullivan and Marc Trachtenberg of Teliris, a leading telepresence vendor, for this fresh and highly relevant presentation. The two will discuss how visual collaboration is providing the “silver bullet” for many of the challenges faced by corporations today and why the new generation of telepresence solutions should demand your serious attention.  

Attend this September 24 Webinar and learn the following:

           Why services make the big difference between telepresence and videoconference

           How benefits of telepresence can go beyond cutting business travel costs

           Possible future developments in the market for visual collaboration

Archives
Recent Comments
    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from Youtube
    Vimeo
    Consent to display content from Vimeo
    Google Maps
    Consent to display content from Google
    Spotify
    Consent to display content from Spotify
    Sound Cloud
    Consent to display content from Sound