-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add Consistent ECMP for vxlan tunnel HLD #2099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
Added YANG model enhancements for VNET_ROUTE_TUNNEL, including modifications for endpoints, MAC addresses, VNIs, and new field consistent hashing buckets.
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
| The details for enabling consistent hashing for Vxlan tunnel route(VNET_ROUTE_TUNNEL) are discussed in this document. | ||
|
|
||
| ###### Use-case: | ||
| Vxlan tunnel routes can contain a list of endpoints(next-hops) for overlay traffic to be routed to multiple underlay endpoints(next-hops). When there are multiple endpoints, ECMP is used to select the nexthop for this traffic to be encapsulated towards and sent out. This is primarily used in scenarios where throughput needs to be scaled beyond what a single vxlan endpoint is capable of. When these endpoints hold flow state, endpoint modifications(next-hop addition/removal), will result in most flows being rehashed and sent to a different endpoint than what they were originally going to, resulting in connection restart whenever a endpoint modification is performed. To limit connection restarts during endpoint/next hop modifications, we will enable consistent hashing for tunnel nexthops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anish-n Assuming a typical Leaf-Spine topology, do these tunnel endpoings all point to the same egress TOR? How is the host connected to these tunnel endpoints? A network topology diagram will quite helpful in explaining the proposed functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are L3 VXLAN nexthops, the tunnel endpoint can locate anywhere, on switches, hosts or DPUs as well. There is no physical topology assumed here since its L3 VXLAN(the endpoint may locate anywhere in the network as long as it is reachable over underlay)
| 6. For VNET_ROUTE_TUNNEL_TABLE modification where “consistent_hashing_buckets” is added for an existing tunnel route a transition from non fine grained to fine grained ecmp must occur and when “consistent_hashing_buckets” is removed then a transition from fine grained to non fine grained ecmp occurs. Both of these transitions result in a sai route update with new nexthop group/nexthop along with deleting any left over stale nexthop groups. | ||
|
|
||
|
|
||
| # 5 Test Plan for the enhacements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: enha*cements
| The overall data flow diagram is captured in Section 3 for all TABLE updates. | ||
| Refer to section 4 for detailed information about redistribution performed during runtime scenarios. | ||
|
|
||
| ### vnetorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anish-n Since this approach of internally creating FGNhgEntry differs from the existing method of configuring the feature through the FG_NHG, FG_NHG_PREFIX, and FG_NHG_MEMBER tables, it would be helpful to add a brief explanation of the rationale behind introducing this new approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to enable Fine Grained ECMP creation directly using VNET_ROUTE_TUNNEL, vs requiring a user to setup FG_NHG/FG_NHG_PREFIX, as this would be redundant config from a user perspective, we program this via a SDN controller so we have a need to simplify config model where possible. If we needed to support such a feature using FG_NHG and FG_NHG_PREFIX it would require complex schema changes with addition of vrf/vnet on the existing schema, which also comes with the need for this schema to be backwards compatible.
| - Data Plane tests via pytest + PTF | ||
|
|
||
|
|
||
| ## SWSS unit tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you also planning to test the base non-default VRF case without any tunnels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not intending to add support in this feature for VNET/VRF routes which are non-tunnel routes
|
/azp run |
|
No pipelines are associated with this pull request. |
This PR introduces the design to add support for consistent hashing for VXLAN tunnel endpoints, such that routes destined to tunnel endpoints can make use of consistent hashing to limit flow rehashing upon nexthop addition/removal. This will help significantly reduce protocol connection restarts as a result of nexthop changes, when tunnel endpoints have flow state