Contents

In House K8s Load Balancer Architecture On-Premises

Load Balancer in Kubernetes

Load Balancer in Kubernetes especially in on-premises setup is quite tricky to dealing with it. In cloud environment when we have service type LoadBalancer the cloud provider will setup the load balancer for you. They will provision the instance, configure the firewall, accessibility and distribute the traffic through your Kubernetes worker nodes. So in this article we will learn about, why we need to build that and how to do that in general. Let’s talk about it!

Why do not use Metal LB ?

For your information, Metal LB is the tool that allow you to have service type LoadBalancer in your non cloud environment. In general >MetalLB work with 2 mode layer 2 protocol (ARP, NDP) or using BGP to send the overall network to make your Kubernetes LoadBalancer works, >either it using discovery or peering method. You will need to have IP ranges that will be attached into the LoadBalancer services when the >manifest applied in Kubernetes. But how if we don’t have that ? You need to create your own architecture.

Architecture and Flow Diagram

https://miro.medium.com/v2/resize:fit:720/format:webp/1*bKercpJPVmb5X_DButEL-Q.png
Flow Diagram

From the architecture above, there are 2 object that we need to create by our own, the first is Node Watcher, and the last one is Webhook and Template engine. The node-watcher function is to watch the worker node in our Kubernetes cluster is ready or not. When the node is ready, it’s means that the node should be able to receive the traffic from the load balancer. Webhook is template engine function is to reload load balancer you use (e.g. Envoy, Nginx) with the new worker’s nodes IP set.

https://miro.medium.com/v2/resize:fit:640/format:webp/1*FIS_iSQ2qEiu4_gCf4rflA.png
Flow Chart

The flow of node watcher is quite simple, watch the event either it updated event / delete event, then check if node is ready or not, add / delete it from database and send signal hook to template engine to reload the new config, and loop this process until stopped.

How about the webhook and config template engine ?

The answer it’s depend, too much method you can use to templating the config, you if you use Go you can do with their standard template >library , if you use python / rust you can use jinja for template the config, and for receiving the hook, you can use from standard http >lib / Fastapi (python) / Axum (rust)

Implementation

The implementation is quite simple here is the example code that I’ve been built in Rust using kube-rs crate.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
async fn watcher_loop(
    &self,
    node_api: &Api<Node>,
    lp: &ListParams,
    is_leader: Arc<AtomicBool>,
    ) -> Result<(), anyhow::Error> {
    let obs = watcher(node_api.clone(), lp.clone()).backoff(ExponentialBackoff::default());
    pin_mut!(obs);
    while let Some(event) = obs.try_next().await? {
        if is_leader.load(Ordering::Relaxed) {
            match event {
                Event::Applied(node) => self.node_update(&node).await?,
                Event::Deleted(node) => self.delete_nodesvc(&node).await?,
                 => {}
            }
        }
    }
    Ok(())
 }

The database for this services we use MongoDB , all we need to do is to store the IP and the hostname of the worker nodes, this is the example of BSON document.

1
2
3
4
5
6
7
8
9
{
  "_id":
  {
    "$oid":"6356708e97eb796f4d0e7451"
  },
  "ip":"10.10.10.10",
  "hostname":"node-test-k8s-worker"
}

To check the node is ready or not, we can create the function

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
pub async fn node_readiness(&self, node: &Node) -> Option<bool> {
    let failed = node
        .status
        .as_ref()?
        .conditions
        .as_ref()?
        .iter()
        .filter(|c| {
            !(c.status != "True" && c.type_ != "Ready"
                || c.type_ == "Ready" && c.status != "False" && c.status != "Unknown")
        })
        .map(|c| c.message.as_ref().unwrap_or(&String::from("")).to_string())
        .collect::<Vec<String>>();
    if !failed.is_empty() {
        warn!(
            status = "unready",
            node = node.name_any(),
            reason = failed.join(","),
            scheduleable = false
        );
        return Some(false);
    }
    Some(true)
}

With all of these architecture and implementation, we can add / delete worker node from Load Balancer with ease. This implementation only focused on add / remove the worker nodes, not the provisioning Load Balancer, but if you have know how to build this, it should be easy to create provisioner (e.g. using Ansible).

How and what services that receive traffic from Load Balancer and how to configure the Load Balancer ?

Load Balancer that we use is used for Layer 4, and send the Layer 7 capability to our ingress. In our cases, we use nginx ingress controller and run as DaemonSet with host network, so the network load balancer send the traffic through our Nginx ingress.

Pros and Cons

The pros of this approach is we don’t need to confuse about IP ranges specific for load balancer, sometimes not all of these is easy to get especially if you work with multiple teams. Rather than that we only need to spawn 1 machine / virtual machine and use that for our load balancer. It’s better if we can automate provisioning method, will cover it it later :-).

The cons of this approach is the event could be missing if the node watcher application is not running / something goes wrong with Kubernetes control plane, since we are connect to those API. Another missing that we notice that is we should use Kubernetes Finalizer and append it to node worker. So before the Kubernetes delete the worker nodes from our cluster, it will be took out from the load balancer. To handling this situation we add sync schedule between Kubernetes API and Database every one minute, and use more replicas with leader election for more availability.

Conclusion

We’ve succeed built Node Watcher and Webhook for reloading network load balancer configuration. Now the Kubernetes worker node can be added or removed from the Load Balancer without any manual human intervention and avoid the mistakes. The next step is to auto provision the Load Balancer services, but for this section it will very depends on your on premises environment. With all of them and the stability of our Kubernetes Cluster, so far this approach works very well, and meets the expectation.