Relay Support in Nebula 1.6.0

By Brad Higgins on July 18, 2022

A collection of radio dishes for long-range radio communications

Mission - Connectivity

Defined Networking’s mission is to make fast, reliable, and secure connectivity within reach for every organization. This network connectivity (and its required quickness, security, and reliability) is provided by Nebula, the open-source project built by Defined Networking’s co-founders.

To keep the connectivity fast, Nebula establishes direct peer-to-peer connections, as opposed to a hub-and-spoke model of classic VPN solutions. Direct connections between peers removes a connection aggregation point (a potential bottleneck), as well as minimizing the network distance between peers. Nothing’s worse than a laptop in California connecting through a VPN in New York to access a website hosted in California!

Unfortunately, establishing direct connections between all peers in a Nebula network can be very tricky in some situations, and downright impossible in others. When debugging Nebula connectivity problems, people often times discover some CGNAT space in their ISP and think that this is blocking the traffic. But, I find that CGNAT doesn’t actually cause Nebula connectivity problems - the root cause of most Nebula connectivity problems is Symmetric NATs. But we’ll leave that as a story for another day.

If even a small percentage of hosts on a Nebula network can’t connect to each other, then Nebula isn’t providing its primary goal of connectivity. A slew of Nebula GitHub issues proved that Nebula users were running headlong into these scenarios.

Enter Relays

In order to provide 100% connectivity between all Nebula peers in all networks, Nebula 1.6.0 introduces support for relaying traffic between two peers through a third Nebula peer. As long as we can guarantee connectivity between all peers and the relay, we can achieve 100% connectivity for all peers through this relay.

To ensure 100% connectivity to the relay, the relay should be deployed like a Lighthouse, with a public internet IP and with appropriate firewall rules in place.

Once both peers successfully connect to this third peer, these end peers may now send Nebula traffic to each other through the intermediary peer.

Relay Configuration

First, a Nebula peer is configured to either act as a relay, or is configured to be accessible through a relay. Hosts that are relays can’t be accessed through another relay. Instead, they ought to be deployed like a Lighthouse, with a public internet IP and appropriate firewall rules to ensure 100% connectivity. This design ensures that there are no relay loops, avoiding all the complication that comes along for the ride (like a Nebula packet bouncing back and forth between relays, never reaching its destination.)

To configure a host to act as a relay, use this stanza in its Nebula configuration file (personal Nebula users should include this in their Lighthouse config, making the Lighthouse into their primary relay):

relay:
  am_relay: true

Next, Nebula peers identify which relay should be used to access them, with this configuration stanza (personal Nebula users should include this stanza in every other peer on their Nebula network):

relay:
  relays:
    - 192.168.100.1

Assuming the Lighthouse Nebula IP is 192.168.100.1, remote Nebula peers that want to connect to this peer will attempt both a direct connection using all the existing NAT traversal tricks, but will also ask 192.168.100.1 to relay packets to the peer. If the remote peer can’t establish a direct connection, it’ll use the relay. (Nebula will still prefer direct connections, and will even attempt to establish a direct connection again after seeing enough traffic through the relay.)

Fast, Secure, Reliable

So, with relays, we seem to have recreated the classic VPN architecture we were trying to avoid - a hub-and-spoke design with many peers communicating through a single aggregation point. How do we achieve the other Nebula goals on top of connectivity - making it fast, secure, and reliable?

fast

The way to keep relays fast is to deploy the relay close (in network terms) to the hosts it’s relaying for. You can have as many relays in your network as you want, and you can deploy those relays as close as you want to the target peers. For example, if you have a Nebula deployment in AWS US-West-2 in a private VPC, you can set up a relay in US-West-2 with a public IP, and configure all Nebula nodes in that region to specify this nearby relay. A separate physical deployment in Azure’s Japan region in the same Nebula network can likewise deploy a Nebula relay there, and configure each of the Japan-based hosts to be accessed through that Japan-based relay.

secure

For security, a couple of decisions were made. First, Nebula tunnels are still established (encrypted and authenticated) end-to-end. A intermediary relay peer will pass packets, but since Nebula packets are encrypted and authenticated between the two connecting peers, the relay can’t decrypt or modify the traffic without detection. Second, relaying only occurs over direct Nebula connections. For the hop between peers and relays, each packet is authenticated from peer to relay. Since the packet is already encrypted end-to-end, the packet is not double-encrypted for the relay hop. Ultimately, only hosts in the same Nebula network may act as relays for that network.

reliable

For reliability, you can specify multiple relays in the relay.relays config. Should any single relay fail, others are ready to carry the load.

If you haven’t yet given Nebula a spin, head over to the nebula page for more info on getting started.

Nebula, but easier

Take the hassle out of managing your private network with Defined Networking, built by the creators of Nebula.

Get started