Cascading allows you to use a Selective Forwarding Unit (SFU) to create very large, geographically distributed and fully interactive video conferences. I recently spoke at the Kranky Geek WebRTC Show about the benefits of cascading for WebRTC. Those same benefits are applicable to video conferencing. You can watch my talk here:
Selective Forwarding Unit Benefits
When setting up an interactive video conference, a network element called a Selective Forwarding Unit (SFU) is used. Every participant in the conference sends video to the SFU which then forwards that video to the other participants in the call. The SFU is only forwarding the video packets, so it adds minimal delay into the video streams. It does not always forward every video packet to each participant. It can select a different subset from the incoming video stream to forward to each endpoint in the call. The SFU only sends the minimal required amount of media required to optimize the experience for every receiver. Last but not least, the SFU can correct transmission errors that might occur on the downlink to only one of the endpoints without impacting other participants in the call. It localizes the error correction only between itself and the endpoint that’s experiencing the errors.
Selective Forwarding Unit Limitations
SFUs are useful, but they have their limitations. Sooner or later a conference might reach the capacity limits of the SFU machine. For example, assume that a video stream with 720p resolution at 30 frames per second is using about 2 megabits per second of network bandwidth. It could be less in reality, but I will use that round number as an example. In a multi-point conference, participants typically receive more bandwidth than they send, in this example 3-4 megabits per second. That means a typical Gigabit Ethernet link can support a maximum of 100-150 participants. In other words, larger conferences cannot be supported by a single SFU.
Another scenario to consider is that of geographically distributed conferences. With an SFU in one location, all the remote participants have to connect to that SFU over expensive inter-regional links. The SFU replicates the video and forwards a copy to every one of the remote participants. This results in multiple copies of the video traversing the inter-regional links. Because bandwidth on these links is expensive, the cost of hosting geographically distributed conferences can become prohibitive. To make things worse, long links typically have high latency. When the round-trip time is long, error correction is difficult, adversely impacting video quality.
Cascading To The Rescue
To overcome these SFU limitations Vidyo introduced the concept of cascading. Cascaded SFUs are two or more SFUs that are interconnected in such a way that allows for one conference to span multiple SFUs. Participants can join any one of the SFUs in the cascade and seamlessly interact with all other participants in the conference. Cascading enables the creation of conferences that dynamically grow to virtually any size as participants join. Initially, as conferences are created, they can be uniformly distributed among the available SFUs. Uniform distribution is the best approach, since there is usually no a-priori knowledge of what size each conference will grow to. As a conference grows and the SFU hosting it nears its capacity, another SFU with excess capacity can be cascaded to it on the fly. The conference can spill over to that SFU and new participants can join the same conference on it. This can continue for as long as new participants are joining the conference and until all participants have been accommodated.
Cascading also helps support geographically distributed conferences. SFU clusters can be deployed in the regions themselves and participants can use location based routing to connect to the SFU cluster closest to them. As participants join a conference on their local cluster, the conference can dynamically grow as described above. When a request is made to join the same conference on two different SFU clusters, the clusters can be cascaded to create one conference that spans both geographies. With cascading, an SFU cluster only has to forward one copy of the video to remote SFU clusters. They can in turn replicate and forward the video to all local participants over much cheaper, shorter links. Only one copy of the video has to traverse the expensive inter-regional links, dramatically decreasing the cost of hosting a distributed conference. In addition, error correction is now localized between an SFU cluster and its local participants. This results in little, if any, noticeable deterioration in quality.
There are a few important considerations when implementing cascaded SFUs.
Keep Latency Low
In the cascaded scenario, video might be traversing several SFUs. To keep a conference interactive the end-to-end delay must to be kept as low as possible. An SFU shouldn’t add significant delay to the fixed network delay. The delay through it must be kept safely below an order of magnitude less than the network delay. If the network delay of one hop is on the order of 100 milliseconds the delay through and SFU must be kept below a few milliseconds for the best results.
Select Global Active Speaker
Everybody on a call should have the same notion of who the loudest speaker is at any point in time, regardless of which SFU participants are connected to. This requires a distributed implementation of the active speaker selection algorithm that quickly converges on one global active speaker across the entire cascade.
Manage Cascade Resources
Finally, as conferences are being distributed across the SFUs, capacity has to be reserved should the need arise for future cascading. The capacity required for cascading is proportional to one additional participant for each conference hosted on the SFU. If a typical SFU supports tens of conferences then resources should be reserved for tens more participants.
Vidyo’s SFUs Perform Cascading
Cascaded SFUs have been supported by Vidyo’s products for several years, and many Vidyo customers have deployed cascaded SFU networks. Vidyo itself deploys networks of cascaded SFUs in its service offerings, including its vidyo.io developer Communications PaaS and and its UCaaS. These networks can be used for calls in which hundreds of participants seamlessly interact with each other.
To summarize, cascading increases the power of SFUs by enabling the creation of conferences that dynamically grow as needed to virtually any number of participants. It greatly reduces the cost associated with hosting geographically distributed conferences and improves the quality of the conferences by localizing error correction.