Multi-device calls with ICE forking

peter-signal on 20 Oct 2020

An illustration of desktop calling.

Now that voice and video calls are available for everyone on Signal Desktop and iPad, in addition to our mobile clients, we want to share a closer look at the technology that makes it possible for incoming calls to ring across multiple devices. Every call placed on Signal has always been private and end-to-end encrypted, but we previously supported only calls to or from one primary device per person. Now, you can make and receive secure audio and video calls from the devices in your pocket, on your lap, or at your desk.

The antique technique

In order to create the best possible conversation experience with your contacts, a low-latency connection is essential so that you can speak and see each other without excessive delays. As with everything else in Signal, this connection also needs to be private and secure. The industry standard for establishing encrypted, low-latency, peer-to-peer connections between devices involves a combination of two protocols: Interactive Connectivity Establishment (ICE) and Secure Real-time Transport Protocol (SRTP).

ICE establishes a connection between two devices on the Internet by simultaneously attempting to connect using a variety of possible routes. Some of these routes will work, and some of them won’t, but ICE always tries to select the best possible option. ICE can also route connections through a relay server (which adds a small amount of latency) when a direct connection isn’t possible, when you change your calling preferences, or when the person calling you isn’t in your Signal contacts.

SRTP encrypts and authenticates the audio and video of the call. The encryption keys that are used during this process are never shared with the service, and only the two devices that are calling each other have access to them. Even when ICE needs to route the connection through a relay server, the relay server does not know the encryption keys and cannot decrypt any audio or video content.

In order for ICE and SRTP to function, they need a way to communicate certain parameters. This exchange of parameters is often called “signaling” (no pun intended):

  1. The caller’s device sends a message (which is called an “offer”) to the recipient’s device that contains some of the parameters that are needed by ICE and SRTP.
  2. The recipient’s device replies with a message (which is called an “answer”) that includes the rest of the necessary parameters for ICE and SRTP.
  3. The two devices use the ICE parameters that they exchanged to establish a connection with each other (possibly going through a relay server).
  4. If and when the recipient answers the call, the recipient’s device sends a message to the caller’s device indicating that the call has been accepted. At this point, both devices can begin sending and receiving end-to-end encrypted audio and video using the SRTP parameters that they exchanged.

All of this is designed to work with two devices, but a different approach is necessary when multiple linked devices are involved.

Answer your phone (or iPad, or laptop)

If you’re sitting on the back porch enjoying a socially distant sunrise (depending on your sunscreen, 149.6 million kilometers is probably far enough away from the sun’s corona), the iPad or laptop in front of you should still ring even when your phone is inside. In order to do this, Signal developed a technique called ICE forking.

ICE forking looks a bit like, well, a fork. On one side of the call we have the handle of the fork (the caller’s device) and on the other side there are multiple pointy tines (each one representing a device that needs to ring). The caller’s device sends an offer message to all of the recipient’s devices, and those devices independently send back an answer message.

                +----------+
                |          |
              <-> Device 1 |
              | |          |
              | +----------+
              |
  +--------+  | +----------+
  |        |  | |          |
  | Caller <-->-| Device 2 |
  |        |  | |          |
  +--------+  | +----------+
              |
              | +----------+
              | |          |
              <-> Device 3 |
                |          |
                +----------+

If speed didn’t matter, we could do this the easy way and simply send one offer from the caller’s device to all the recipient’s devices, wait until the recipient accepts the call on one device, send back one answer, and perform the old-school call setup process between those two devices. However, this approach would introduce long delays during the ICE negotiation, which would mean adding extra time between when the call is accepted and when you could start talking. Nobody likes to wait when they’ve got something to say.

If we want to make it fast, we need to handle the ICE negotiation between all of the devices before the recipient accepts the call. That way no matter which device ultimately picks up, Signal will already be prepared to send and receive end-to-end encrypted audio and video. One call will turn into multiple rings, and you can just start talking when your friend says hello.

Stick a fork in it

ICE forking is not straightforward. It requires state to be shared across all possible ICE connections, including parameters and what are called ICE candidates. Sharing ICE parameters is easy, but exchanging ICE candidates quickly gets complicated because it requires sharing many UDP ports. In order to make everything work properly, Signal created an abstraction layer called an “ICE gatherer” that represents all of the ICE candidates (and UDP ports) that have been “gathered” for a particular set of ICE parameters.

We spent a lot of time working through all of the permutations and then submitted an upstream patch to the open source WebRTC project that adds complete support for ICE forking so that other apps can also use this technique as well. With this patch in place, it’s possible to take advantage of ICE forking just like Signal does using WebRTC’s “PeerConnection API”:

  1. The caller’s device creates a “parent” PeerConnection. Its purpose is to act as the “ICE gatherer” and create the offer message.
  2. The caller’s device creates an offer message from the parent PeerConnection and sends it to all of the recipient’s devices. Because this offer is created using a shared IceGatherer, multiple “child” PeerConnections will be able to use it later.
  3. The caller’s device waits for answers from the recipient’s devices. For each answer, it creates a new “child” PeerConnection using the offer that was sent and the answer that was received. Because all of the “child” PeerConnections share the same IceGatherer, they will use the same ICE parameters and candidates – which is exactly what we want!

Doing other cool things with ICE

Putting it all together, we end up with a convenient, cross-platform, fast, private, and secure way to answer a call from any device that rings – but we also realized that ICE forking provides us with a powerful framework for making further advancements to Signal’s voice and video calling while maintaining backwards compatibility with legacy clients.

We’ll explore that concept in another post. In the meantime, you can download Signal Desktop, take your phone off vibrate, wait for an incoming call, and experience the thrill of multiple ringtones today.