Intro

Almost everything in IT follows the schema: input -> processing -> output.

From a networking perspective, most Kubernetes clusters require both an entry and an exit point. This article focuses on the exit part of Google Kubernetes Engine (GKE).

A common way to enable outbound traffic from a private cluster to the public Internet is Cloud NAT. This approach is also the easiest and most scalable. Another option is to use a NAT instance (also known as a NAT Proxy). Note that this solution applies to private clusters where nodes do not have public IP addresses.

The picture comes from: https://cloud.google.com/nat/docs/overview

This article describes how to configure a NAT instance for the GKE cluster along with Cloud NAT.

Why?

There are many reasons for having a NAT instance instead of Cloud NAT or alongside Cloud NAT. We often want to decrease network traffic costs since Cloud NAT is more expensive.

Scenario

A GKE cluster where workloads access the public Internet through Cloud NAT. The cluster is VPC-native. This is the default network mode setting for new clusters. There are many characteristics of that mode, but for our needs, the most important is:

Workloads do not use routing tables from hosts.

There is no option to define custom routing rules on hosts (nodes). VPC and its rules in Google Cloud manage the whole routing configuration.

How?

The following steps can be performed manually in the Google Console, as well as using Terraform (which I recommend). Someday, perhaps, I will provide a public Terraform module. Let me know if you want this. For now, there are some helpful sources which I also used:

Virtual Machine

A NAT instance's requirements are low. It's sufficient to use e2-micro or t2a-standard-1 (ARM). I haven't encountered any significant spikes on that server; apparently, it's very efficient. Use Ubuntu or any other Debian-like system.

Prerequisites

The instance must

Be created in the same VPC network as the Kubernetes cluster.
Have IP forwarding enabled. When using Terraform, it's the can_ip_forward argument.
Be assigned a public IP address.

Firewall

Ensure the VM is accessible on the cluster network by opening all ports for internal connections. There is no need to open ports for external connections because the VM will not accept incoming traffic from the Internet; it will only initiate outbound connections.

System configuration

Configure the system by running commands manually or add them to the metadata_startup_script argument in Terraform.

Enabled IP forwarding at the host level - sudo sysctl -w net.ipv4.ip_forward=1
Add a masquerade rule - sudo iptables -t nat -C POSTROUTING -o ens4 -j MASQUERADE (ensure ens4 is the correct network interface; verify with sudo ifconfig)

Network

Now, you should have a running and configured VM, but there is no traffic going through it yet. We need to define routing rules, called Routes, in GCP. As we set at the beginning of this article, we want to have both Cloud NAT and a NAT instance, so we need to distinguish which traffic will go through Cloud NAT and which will go through a NAT instance.

Routing goals:

Cloud NAT as the default gateway (destination IP range 0.0.0.0/0) - this is already set.
NAT instance for traffic with destination 34.117.59.81/32 (this is the IP address of https://ipinfo.io/) - this is just an example.

Add route

Static routes must have higher priority (the lower the number, the better) and must use Instance tags of worker nodes in the cluster. Higher priority ensures that routes are interpreted before going to the default gateway. Instance tags ensure that routes apply only to worker nodes. Without this, we would have infinite loops in the network layer.

Testing

To test the above configuration, we use the command curlhttps://ipinfo.io/. The output will contain the IP address visible to the public.

The VM accesses 34.117.59.81/32 using its own network interface with a public IP, so curl returns the public IP of the VM. Running the same command from any Pod in the cluster should return the public IP of the VM as well. To test it further, use a different service: curl https://ifconfig.io/. The command works and returns the same output when running on the VM, but it returns the IP of Cloud NAT when running on any Pod in the cluster.

This confirms that the routing rules work as expected.

Summary

In this article, we explored how to configure a NAT instance alongside Cloud NAT for a GKE cluster. While Cloud NAT offers scalability and simplicity, adding a NAT instance provides flexibility and can reduce costs for specific traffic types.

We walked through the steps to set up a NAT instance, configure the virtual machine, enable necessary routing, and test the setup. By leveraging a combination of Cloud NAT and NAT instance, you can fine-tune your network configuration to suit your needs while maintaining robust connectivity for your workloads.

This approach is particularly useful when you need custom routing for specific traffic patterns without compromising the overall default behavior of Cloud NAT. If you have further questions or need a Terraform module to automate this setup, feel free to reach out!

Cost-Effective Networking for GKE - Setting Up a NAT Instance Alongside Cloud NAT

Intro

Why?

Scenario

How?