Securing the keys to your kingdom

By John Maguire on March 14, 2024

At Defined Networking, we spend a lot of time thinking about the security of our customers. While some of these decisions are visible to users (such as requiring 2FA on admin accounts or supporting OIDC federated logins for customers of any tier), there’s a lot that goes on behind the scenes as well.

In this post I will outline the importance of securing the Certificate Authority (CA) private keys associated with your network as well as cover how Defined Networking solves this problem for you.

Understanding Nebula Authentication

Before diving into how we secure your network, we must understand how authentication works in Nebula. If you’re already familiar with public / private keys and certificate-based authentication you may feel free to skip this section.

Public / Private Key Authentication

We often see Nebula compared to Wireguard, because both tools provide secure network tunnels by mutually authenticating and encrypting traffic between hosts, and both are built on the Noise protocol. However, the two also differ in terms of their goals and scope: Wireguard focuses on creating point-to-point tunnels between nodes, while Nebula aims to create a mesh overlay network comprised of every host in your organization. Because of this difference, they each approach authentication differently.

When using Wireguard, each device creates a public / private keypair which identifies the host. The private key is a secret kept on the device, used by the host during tunnel authentication to establish its identity. The public key is distributed and installed into the config file of every other host it wishes to communicate with. For example:

[Peer]
PublicKey = [Peer#1PublicKey]
AllowedIPs = 10.0.0.3/32

[Peer]
PublicKey = [Peer#2PublicKey]
AllowedIPs = 10.0.0.10/32

... and so on ...

Since every host on the network needs to have the public keys of each other host added on the network, you must update all of your existing hosts’ config in order to communicate with a newly added host. This can become a lot to manage!

Certificate-based Authentication

In order to reduce the cost of adding a host to the network, Nebula eschews the above approach in favor of PKI (public-key infrastructure) or certificate-based authentication.

To add a new host to the network, you generate a private / public keypair just like before – but instead of copying the public key to every other host in the network, you simply have the CA “sign” the public key, producing a certificate. When this certificate is presented to other hosts, they will cryptographically verify that it’s signed by a trusted CA.

For this reason, rather than include every host’s public key, we only need to include a list of trusted CAs in our configuration file. For example, here’s a config file with a single trusted CA:

pki:
  ca:
    -----BEGIN NEBULA CERTIFICATE-----
    CmAKLkRlZmF1bHQgQ0EgZm9yIGpvaG5AZGVmaW5lZC5uZXQncyBPcmdhbml6YXRp
    b24og5DWkgYw/4KCuAY6IC0vUjEHoPHzVN6jIDySQtNUav5S7vaFdd8iXwH3gNl6
    QAESQDK+4x8Xs0MGgFKRzRoq2/aAkrh2O68lvI53hpGrgfDdwQYu9hjwZGieeKV8
    JsRcAjl2+zYH002oZfSBEzGmWAE=
    -----END NEBULA CERTIFICATE-----

Using a certificate has other advantages as well - most notably in Nebula, certificates are used to encode identity via a name and security group memberships, which are signed by the CA. These facets can then be used in Nebula’s firewall to limit access. In other words, using Nebula is like having AWS Security Groups that work on any device - not just AWS hosts.

Because the CA has the ability to establish trust relationships between hosts by signing host certificates, it’s of utmost importance that you protect the CA private key from any malicious actors.

Securing Certificate Authorities

Now that we understand why it’s so important to keep CAs secure we can discuss the security properties we’d like to achieve.

Generally speaking, we need a system that that restricts the ability to sign certificates to only the Defined Networking servers involved in creating certificate authorities and host certificates. Actors external to Defined Networking shouldn’t be able to communicate directly with the signing service, and it should be difficult even for an internal actor to access directly. Additionally, in the event that an attacker did manage to gain access to a public-facing server, we want to ensure that they do not have persistent access to the CA keys - and that any actions performed by the signing service while they have access are logged, for later review. We can distill this into the following goals:

CA private keys should always be encrypted at rest (i.e. when they are not actively being used to sign a certificate.)
CA private keys should be encrypted using hardware-backed keys to ensure they cannot be exfiltrated.
As a defense in depth measure, CA private key decryption requests should only be honored if they include contextual information identifying the account the CA belongs to. Requests lacking this information should be rejected.
In order to reduce surface attack area as well as the value of attaining access to a public-facing server, CA private keys should only ever be decrypted on isolated signing service machines.
For auditing purposes, any time a CA private key is decrypted, a log should be generated which can be correlated to a specific user request.

We are able to achieve these properties by storing encrypted CA private keys which can only be accessed through the use of an isolated secure signing service, backed by AWS Key Management Service (KMS).

KMS is a service provided by Amazon that uses hardware security modules (HSMs) to create hardware-backed encryption keys. Clients can issue requests to KMS to decrypt or encrypt data, but cannot access the encryption keys directly – meaning they cannot be exfiltrated even in the event that AWS is compromised, achieving our second goal.

KMS also helpfully us you to specify “encryption context” in encryption and decryption calls. Any data encrypted with a given encryption context can only be decrypted when KMS is provided with the same context. When generating a new CA, we encrypt its key using context that includes the organization’s ID. When decrypting the CA to sign hosts, we include the organization ID of the authenticated user, achieving our third goal above.

While we could communicate with KMS directly from our public-facing servers, this would violate our goal of keeping decrypted key material off of Internet-accessible machines. To solve this issue, we must introduce a signing service between the API and KMS. The signing service is built to have minimal API surface area, and can only be accessed over private networking from servers involved in creating CAs or signing host certificates. We do this by placing the signing service in its own AWS account and leveraging AWS Security Groups to restrict all inbound access, except on the Nebula port. Then, we use Nebula firewall rules to ensure that only servers involved in creating host certificates have access to these machines, achieving our fourth goal.

A diagram showing the API in a public VPC, the isolated signing service in a private VPC, and AWS KMS

Lastly, we need to ensure we are able to audit any access to the CA key material. We achieve this by generating and logging a request tracing ID at our ingress load balancer. The ID is forwarded to a public-facing API server which also logs it before passing it on to the signing service with the certificate signing request. The signing service logs both the request trace ID as well as an identifier corresponding to the KMS call.

Finally, AWS creates a CloudTrail log for each KMS event. These CloudTrail logs can now be correlated against the API and signing services to identify the exact request associated with a given KMS call.

A flow diagram demonstrating a signing request traversing the load balancer, API, CA, KMS, and CloudTrail, with logging at each step

Availability in a Secure System

As you can see, this system allows us to achieve both confidentiality and integrity of the certificate authority private keys – but what if AWS KMS goes down, or the HSM backing our data fails? While the benefits of hardware-backed encryption are clear, we know it could be disastrous to our customers if they are unable to add hosts to their network in an emergency.

To mitigate downtime concerns, we encrypt CA keys twice using KMS in different regions. If we encounter a failure when decrypting the private key, we can simply swap over to an alternative region, and try again.

A diagram showing a failed request to KMS Region 1 that was retried against KMS Region 2, where it succeeds

Conclusion

The security of your network is our top priority. Using AWS KMS and a secure, isolated signing service, we are able to protect the confidentiality, integrity, and availability of your network and provide auditability around security events.

If you’re considering using Nebula, why not let us handle the hard problems for you? Try Defined Networking Managed Nebula today!

Nebula, but easier

Take the hassle out of managing your private network with Defined Networking, built by the creators of Nebula.

Get started