Skip to content

Conversation

@anish-n
Copy link
Contributor

@anish-n anish-n commented Nov 7, 2025

This PR introduces the design to add support for consistent hashing for VXLAN tunnel endpoints, such that routes destined to tunnel endpoints can make use of consistent hashing to limit flow rehashing upon nexthop addition/removal. This will help significantly reduce protocol connection restarts as a result of nexthop changes, when tunnel endpoints have flow state

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

Added YANG model enhancements for VNET_ROUTE_TUNNEL, including modifications for endpoints, MAC addresses, VNIs, and new field consistent hashing buckets.
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

The details for enabling consistent hashing for Vxlan tunnel route(VNET_ROUTE_TUNNEL) are discussed in this document.

###### Use-case:
Vxlan tunnel routes can contain a list of endpoints(next-hops) for overlay traffic to be routed to multiple underlay endpoints(next-hops). When there are multiple endpoints, ECMP is used to select the nexthop for this traffic to be encapsulated towards and sent out. This is primarily used in scenarios where throughput needs to be scaled beyond what a single vxlan endpoint is capable of. When these endpoints hold flow state, endpoint modifications(next-hop addition/removal), will result in most flows being rehashed and sent to a different endpoint than what they were originally going to, resulting in connection restart whenever a endpoint modification is performed. To limit connection restarts during endpoint/next hop modifications, we will enable consistent hashing for tunnel nexthops.
Copy link
Contributor

@ashutosh-agrawal ashutosh-agrawal Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anish-n Assuming a typical Leaf-Spine topology, do these tunnel endpoings all point to the same egress TOR? How is the host connected to these tunnel endpoints? A network topology diagram will quite helpful in explaining the proposed functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are L3 VXLAN nexthops, the tunnel endpoint can locate anywhere, on switches, hosts or DPUs as well. There is no physical topology assumed here since its L3 VXLAN(the endpoint may locate anywhere in the network as long as it is reachable over underlay)

6. For VNET_ROUTE_TUNNEL_TABLE modification where “consistent_hashing_buckets” is added for an existing tunnel route a transition from non fine grained to fine grained ecmp must occur and when “consistent_hashing_buckets” is removed then a transition from fine grained to non fine grained ecmp occurs. Both of these transitions result in a sai route update with new nexthop group/nexthop along with deleting any left over stale nexthop groups.


# 5 Test Plan for the enhacements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: enha*cements

The overall data flow diagram is captured in Section 3 for all TABLE updates.
Refer to section 4 for detailed information about redistribution performed during runtime scenarios.

### vnetorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anish-n Since this approach of internally creating FGNhgEntry differs from the existing method of configuring the feature through the FG_NHG, FG_NHG_PREFIX, and FG_NHG_MEMBER tables, it would be helpful to add a brief explanation of the rationale behind introducing this new approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to enable Fine Grained ECMP creation directly using VNET_ROUTE_TUNNEL, vs requiring a user to setup FG_NHG/FG_NHG_PREFIX, as this would be redundant config from a user perspective, we program this via a SDN controller so we have a need to simplify config model where possible. If we needed to support such a feature using FG_NHG and FG_NHG_PREFIX it would require complex schema changes with addition of vrf/vnet on the existing schema, which also comes with the need for this schema to be backwards compatible.

- Data Plane tests via pytest + PTF


## SWSS unit tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you also planning to test the base non-default VRF case without any tunnels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not intending to add support in this feature for VNET/VRF routes which are non-tunnel routes

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants