#HackEatofDay

Github Load Balancer : The GLB

#HackEatofDay

Hi, Guys Learner is here with GLB which is too important part of our life nowadays.
I think every developer is using Github and we should know how GitHub manages this type of traffic.
We all know about different kinds of load-balancing algorithms like round robin and all but GitHub has its own approach for balancing such a huge amount of traffic.
So, let’s start…
First of all, this is for newbies also so I’m gonna give a small introduction to a load-balancer.

What’s Load-balancer?

I think above video is sufficient for an introduction.
Now, come to….

What’s GLB?

It’s a GitHub’s new load-balancer for handling huge traffic in a more better way.

Image result for github load balancer glb

Wanna understand more?

In designing its load balancer, GitHub sought to improve on the common pattern for the traffic director tier. The company settled on a variant of rendezvous hashing that supports multiple lookups. Each proxy host is stored and assigned a state, which then handles connection draining. A fixed-size forwarding table is generated and each row filled with proxy servers using the ordering component of rendezvous hashing. The table and proxy states are sent to directory servers and kept in sync.

TCP packets, upon arrival at the director, have the source IP hashed to generate a consistent index into the forwarding table. The packet is encapsulated inside another IP packet destined to the internal IP of the proxy server and sent over the network. The proxy server receives the encapsulated packet, de-encapsulates it, and processes the original packet locally. Outgoing packets use Direct Server Return, so packets going to the client egress directly to the client and bypass the directory tier.

“We set out to design a new director tier that was stateless and allowed both director and proxy nodes to be gracefully removed from rotation without disruption to users wherever possible,” the engineers said. “Users live in countries with less than ideal internet connectivity, and it was important to us that long running clones of reasonably sized repositories would not fail during planned maintenance within a reasonable time limit.”

Goals of New System

  • Runs on commodity hardware
  • Scales horizontally
  • Supports high availability, avoids breaking TCP connections during normal operation and failover
  • Supports connection draining
  • Per service load balancing, with support for multiple services per load balancer host
  • Can be iterated on and deployed like normal software
  • Testable at each layer, not just integration tests
  • Built for multiple POPs and data centers
  • Resilient to typical DDoS attacks, and tools to help mitigate new attacks

Better Design

The design they settled on, and now use in production, is a variant of Rendezvous hashing that supports constant time lookups. They start by storing each proxy host and assign a state. These states handle the connection draining aspect of our design goals and will be discussed further in a future post. They then generate a single, fixed-size forwarding table and fill each row with a set of proxy servers using the ordering component of Rendezvous hashing. This table, along with the proxy states, are sent to all director servers and kept in sync as proxies come and go. When a TCP packet arrives on the director, they hash the source IP to generate consistent index into the forwarding table. they then encapsulate the packet inside another IP packet (actually Foo-over-UDP) destined to the internal IP of the proxy server, and send it over the network. The proxy server receives the encapsulated packet, decapsulates it, and processes the original packet locally. Any outgoing packets use Direct Server Return, meaning packets destined to the client egress directly to the client, completely bypassing the director tier.The design they settled on, and now use in production, is a variant of Rendezvous hashing that supports constant time lookups. They start by storing each proxy host and assign a state. These states handle the connection draining aspect of their design goals and will be discussed further in a future post. They then generate a single, fixed-size forwarding table and fill each row with a set of proxy servers using the ordering component of Rendezvous hashing. This table, along with the proxy states, are sent to all director servers and kept in sync as proxies come and go. When a TCP packet arrives on the director, they hash the source IP to generate consistent index into the forwarding table. They then encapsulate the packet inside another IP packet (actually Foo-over-UDP) destined to the internal IP of the proxy server, and send it over the network. The proxy server receives the encapsulated packet, decapsulates it, and processes the original packet locally. Any outgoing packets use Direct Server Return, meaning packets destined to the client egress directly to the client, completely bypassing the director tier.

I’ll give you more updates about this in future posts.

{Code with Code@ter}





Leave a Reply