Why a single BGP connection may break your NSX routing

When building an Active-/Passive-Tier-0 Gateway with NSX you often have the case of a single BGP connection to upstream routing or another appliance connecting internal subnets which need to be routed to the highly available NSX overlay.

In this specific case I encountered, my Tier-0-Gateway was connected via a (by design) not ECMP-capable link to a physical Firewall Active-Passive-Cluster for routing to other internal subnets connected to the Firewall appliance. The T0-GW also had separate uplinks directly to WAN for an “internet breakout” – this is done via a static route to the provided Gateway of the ISP which is not capable of static route BFD.

This is quite a common design for NSX-designs in a colocation or serverhousing environment.

When the Firewall-Cluster switched over, the customer noted a drop of all routing activity (including the NAT addresses hosted on the T0-GW) shortly after the switch.

This of course was a problem coming from the BGP timers present on the BGP connection.

The real problem discovered there was: A Tier-0-Gateway stops all (including static) routing activity when no BGP, OSPF or BFD sessions are active / connected!

This, of course, reflects in the NSX UI as well on the Edge Node CLI. You may see the error “Routing Down / All BGP/BFD sessions are down.” in the NSX UI, bound to the specific T0-GW.

When connecting to the Edges, you see via get edge-cluster status that, while the edge-cluster is up, all routing is suspended (Routing down):

A screenshot showing the Edge-Cluster status being “Up (Routing down)” in the NSX Edge CLI

When switching to the specific T0-GW service router context and executing get high-availability status you see the HA-state of the Service Router being down:

A screenshot showing the high-availability state of a Tier-0-Service-Router being down in the NSX Edge CLI

And lastly you may encounter the Uptime of the static route being reset to when the routing connection broke at get route.

So how to mend this? The answer lies in the NSX docs of course.

When looking in the NSX-T Reference Design Guide by the VMware Technical Product Management Team for NSX, towards the End of Chapter 4, you can see examples for supported routing topologies.

One of these topologies describes the use of a separate Tier-0-Gateway on a separate cluster to use as a “routing aggregator”. This Gateway can be built in Active-/Active-Configuration as its sole purpose will be the redistribution of routes.

In my specific example, the ISP is not providing BGP, so I’m still utilizing static routing to the ISP’s gateway. But as you can see, the routing between the firewall and the production Tier-0 Gateway is now ECMP-enabled, as we are connected to the highly available Aggregation Cluster above.

A diagram showing a tiered routing topology including an aggregation Tier-0 Gateway enabling highly available BGP sessions to its downlink routers

This design allows for a Firewall appliance failure and to still have an active BGP connection to the Aggregation cluster. It also allows for more scalability if you want to add more T0-Gateways as you only have to build the BGP connection to the aggregator, opposed to between all other T0-GWs.