-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add Consistent ECMP for vxlan tunnel HLD #2099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
anish-n
wants to merge
9
commits into
sonic-net:master
Choose a base branch
from
anish-n:tunnelNhFgEcmp
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
11f7842
Add Consistent ECMP for vxlan tunnel HLD
anish-n 2c0021c
Add images for programming flow
anish-n 7cf01a1
Add scale details
anish-n 00bb843
Add yang model
anish-n 82940a2
Update fine_grained_next_hop_hld
anish-n d8ca8ea
Rename doc
anish-n 9e641c9
Update fine_grained_next_hop_hld
anish-n e56e7a4
Update Vxlan HLD
anish-n b0335b4
Incorporate comments
anish-n File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,202 @@ | ||
| # Consistent ECMP for Vxlan Tunnels | ||
| ### Rev 1.0 | ||
|
|
||
| # Table of Contents | ||
|
|
||
| - [Revision](#revision) | ||
| - [Scope](#scope) | ||
| - [Overview](#1-overview) | ||
| - [Schema Changes](#2-schema-changes) | ||
| - [Config and APP DB](#21-config-and-appdb) | ||
| - [STATE DB](#22-state-db) | ||
| - [CLI](#23-cli) | ||
| - [YANG model](#24-yang-model) | ||
| - [Programming Flow](#3-programming-flow) | ||
| - [SWSS orchagent design](#4-swss-orchagent-design) | ||
| - [Test Plan](#5-test-plan) | ||
|
|
||
|
|
||
| # Revision | ||
|
|
||
| | Rev | Date | Author | Change Description | | ||
| |:---:|:-----------:|:------------------:|-----------------------------------| | ||
| | 1.0 | 11/04/2025 | Anish Narsian | Added Consistent hashing support | | ||
|
|
||
|
|
||
| # Scope | ||
|
|
||
| This document goes over an enhancement to VXLAN tunnel endpoint ECMP to add support for consistent hashing towards a group of tunnel endpoints that are nexthops for a given tunnel route. This is an extension to the existing VNET Vxlan support as defined in the [Vxlan HLD](https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/Vxlan_hld.md) | ||
|
|
||
| - In-scope: Enabling consistent hashing for VNET_TUNNEL_ROUTEs, ie towards L3 VXLAN nexthops | ||
| - Out of scope: Enabling consistent hashing for other route types within a VNET/VRF, and modifying consistent hashing orchagent to generically support VRF/VNET are out of the scope of this document | ||
|
|
||
|
|
||
| # Abbreviations | ||
|
|
||
| | Abbreviation | Meaning | | ||
| |--------------------------|-----------------| | ||
| | NH | Next hop | | ||
| | NHG | Next hop Group | | ||
| | NHGM | Next hop Group Member | | ||
| | FG | Fine Grained | | ||
| | ECMP | Equal Cost MultiPath | | ||
|
|
||
| # 1 Overview | ||
| The details for enabling consistent hashing for Vxlan tunnel route(VNET_ROUTE_TUNNEL) are discussed in this document. | ||
|
|
||
| ###### Use-case: | ||
| Vnet Vxlan tunnel routes can contain a list of endpoints(next-hops) for overlay traffic to be routed to multiple underlay endpoints(next-hops). When there are multiple endpoints, ECMP is used to select the nexthop for this traffic to be encapsulated towards and sent out. This is primarily used in scenarios where throughput needs to be scaled beyond what a single vxlan endpoint is capable of. When these endpoints hold flow state, endpoint modifications(next-hop addition/removal), will result in most flows being rehashed and sent to a different endpoint than what they were originally going to, resulting in connection restart whenever a endpoint modification is performed. To limit connection restarts during endpoint/next hop modifications, we will enable consistent hashing for tunnel nexthops. | ||
|
|
||
| ###### Scale: | ||
| | Component | Expected value | | ||
| |--------------------------|-----------------------------| | ||
| | NHG size| 512 - 2048 next hop group members(NHGMs) | | ||
|
|
||
| # 2 Schema Changes | ||
|
|
||
| ## 2.1 Config and APP DB | ||
|
|
||
| We modify Config DB's **VNET_ROUTE_TUNNEL** and correspondingly APP_DB's **VNET_ROUTE_TUNNEL_TABLE** to support consistent hashing, the schema can be found below: | ||
|
|
||
| The following new fields have been added the **VNET_ROUTE_TUNNEL_TABLE** | ||
| - consistent_hashing_buckets | ||
|
|
||
| ``` | ||
|
|
||
| VNET_ROUTE_TUNNEL_TABLE:{{vnet_name}}:{{prefix}} | ||
| “endpoint”: {{ip_address1},{ip_address2},...} | ||
| “endpoint_monitor”: {{ip_address1},{ip_address2},...} (OPTIONAL) | ||
| “mac_address”: {{mac_address1},{mac_address2},...} (OPTIONAL) | ||
| “monitoring”: {{“custom”}} (OPTIONAL) | ||
| “vni”: {{vni1},{vni2},...} (OPTIONAL) | ||
| “weight”: {{w1},{w2},...} (OPTIONAL) | ||
| “profile”: {{profile_name}} (OPTIONAL) | ||
| “primary”: {{ip_address1}, {ip_address2}} (OPTIONAL) | ||
| “profile”: {{profile_name}} (OPTIONAL) | ||
| “adv_prefix”: {{prefix}} (OPTIONAL) | ||
| “rx_monitor_timer”: {time in milliseconds} (OPTIONAL) | ||
| “tx_monitor_timer”: {time in milliseconds} (OPTIONAL) | ||
| “check_directly_connected”: {{true|false}} (OPTIONAL) | ||
| “consistent_hashing_buckets”: {{bucket_size}} (OPTIONAL) -> newly introduced | ||
| ``` | ||
|
|
||
|
|
||
| ``` | ||
| consistent_hashing_buckets = DIGITS ; if specified, consistent hashing will be used for nexthops to the vnet route tunnel, the bucket size should be determined by the caller based on # of nexthops and redundancy factor, which will define how many bucket entries each nexthop receives (Optional) | ||
| ``` | ||
|
|
||
| ## 2.2 STATE DB | ||
|
|
||
| The existing Fine grained ecmp state DB table will be modified to store a VRF/VNET name, so that IP space collisions across VRFs/VNETs can be supported | ||
|
|
||
| ``` | ||
| FG_ROUTE_TABLE|{{VRF/VNET-name}}|{{IPv4 OR IPv6 prefix}}: | ||
| "0": {{next-hop-key}} | ||
| "1": {{next-hop-key}} | ||
| ... | ||
| "{{hash_bucket_size -1}}": {{next-hop-key}} | ||
| ``` | ||
|
|
||
| ## 2.3 CLI | ||
| *CLI command enhancement to be able to see consistent hashing buckets for a partricular VRF/VNET and prefix:* | ||
|
|
||
| ``` | ||
| show fgnhg hash-view <vnet/vrf name> <prefix name> | ||
| show fgnhg active-hops <vnet/vrf name> <prefix name> | ||
| ``` | ||
|
|
||
| *CLI output format: show fgnhg hash-view <vnet/vrf name> <prefix name>* | ||
| ``` | ||
| -----------+-----------------+--------------------+----------------+ | ||
| | VNET/VRF | FG_NHG_PREFIX | Next Hop | Hash buckets | | ||
| ===========+=================+====================+================+ | ||
| ``` | ||
|
|
||
| *CLI output format: show fgnhg hash-view <vnet/vrf name> <prefix name>* | ||
| ``` | ||
| -----------+-----------------+--------------------+ | ||
| | VNET/VRF | FG_NHG_PREFIX | Active Next Hops | | ||
| ===========+=================+====================+ | ||
| ``` | ||
|
|
||
| ## 2.4 YANG Model | ||
| The following enhancements to the VNET_ROUTE_TUNNEL YANG model will be made, specifically endpoints, mac and vni are converted into a comma separated list as a string type, and consistent_hashing_buckets is added: | ||
|
|
||
| ``` | ||
| container VNET_ROUTE_TUNNEL { | ||
| description "ConfigDB VNET_ROUTE_TUNNEL table"; | ||
|
|
||
| list VNET_ROUTE_TUNNEL_LIST { | ||
| key "vnet_name prefix"; | ||
| leaf vnet_name { | ||
| description "VNET name"; | ||
| type leafref { | ||
| path "/svnet:sonic-vnet/svnet:VNET/svnet:VNET_LIST/svnet:name"; | ||
| } | ||
| } | ||
|
|
||
| leaf prefix { | ||
| description "IPv4 prefix in CIDR format"; | ||
| type stypes:sonic-ip4-prefix; | ||
| } | ||
|
|
||
| leaf endpoint { | ||
| description "Comma separated list of endpoint/next hop tunnel IPs if multiple nexthops, or a single IP address"; | ||
| type string; | ||
| mandatory true; | ||
| } | ||
| leaf mac_address { | ||
| description "Comma separated list of inner dest mac in encapsulated packet if there are multiple nexthops/endpoints, or a single mac address"; | ||
| type string; | ||
| } | ||
| leaf vni { | ||
| description "Comma separated list of VNIs if there are multiple nexthops/endpoints, or a single VNI for the route/nh"; | ||
| type string; | ||
| } | ||
| leaf consistent_hashing_buckets { | ||
| description "Number of consistent hashing buckets to use, if consistent hashing is desired"; | ||
| type unit16; | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| # 3 Programming flow | ||
| *E2E creation flow for VNET_ROUTE_TUNNEL with consistent hashing* | ||
|  | ||
|
|
||
| *E2E flow for updating tunnel endpoints list with consistent hashing* | ||
|  | ||
|
|
||
| # 4 SWSS orchagent design | ||
| 1. vnetorch will receive a call to create a VNET_ROUTE_TUNNEL_TABLE | ||
| 2. vnetorch will check if consistent_hashing_buckets is set and if so call fgnhgorch to create internal FgNhgEntry with the following parameters: | ||
| 2.a FGMatchMode will be PREFIX_BASED | ||
| 2.b max_next_hops = configured_bucket_size = consistent_hashing_buckets | ||
| 2.c The prefix for Fine grained behavior = prefix of the VNET_ROUTE_TUNNEL_TABLE | ||
| 3. Next, vnetorch will call fgnhgorch to do the nexthop group creation with consistent hashing | ||
| 4. For subsequent next-hop changes, vnetorch will continue calling fgnhgorch to handle the nexthop changes | ||
| 5. At the time of VNET_ROUTE_TUNNEL_TABLE deletion, the nexthop and the internal FgNhgEntry will be deleted/cleaned up | ||
| 6. For VNET_ROUTE_TUNNEL_TABLE modification where “consistent_hashing_buckets” is added for an existing tunnel route a transition from non fine grained to fine grained ecmp must occur and when “consistent_hashing_buckets” is removed then a transition from fine grained to non fine grained ecmp occurs. Both of these transitions result in a sai route update with new nexthop group/nexthop along with deleting any left over stale nexthop groups. | ||
|
|
||
|
|
||
| # 5 Test Plan | ||
| The following testing is planned for this feature: | ||
| - SWSS unit tests via virtual switch testing | ||
| - Data Plane tests via pytest + PTF | ||
|
|
||
|
|
||
| ## SWSS unit tests: | ||
anish-n marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 1. Add VNET_ROUTE_TUNNEL_TABLE with consistent_hashing_buckets, and check that SAI objects are created for next-hop group and next-hop group member as fine grained ecmp | ||
| 2. Remove endpoint in VNET_ROUTE_TUNNEL_TABLE, and ensure that only the next-hop group member associated with the removed endpoint is modified with another nexthop tunnel, and that the hash buckets are balanced | ||
| 3. Add endpoint in VNET_ROUTE_TUNNEL_TABLE, and ensure that only total hash buckets/total endpoints buckets are impacted as a result of the change | ||
| 4. Remove consistent_hashing_buckets paramater from VNET_ROUTE_TUNNEL_TABLE, and ensure that the fine grained next-hop group is cleaned up and a regular next-hop group is created, with the route pointing to the regular next-hop group | ||
| 5. Add consistent_hashing_buckets paramater to VNET_ROUTE_TUNNEL_TABLE, and ensure that a fine grained next-hop group is created and the original regular next-hop group is cleaned up, with the route pointing to the fine grained next-hop group | ||
|
|
||
| ## Dataplane tests: | ||
| 1. Do a base setup with VXLAN_TUNNEL, VNET, interface binded to the vnet | ||
| 2. Add VNET_ROUTE_TUNNEL_TABLE with consistent_hashing_buckets, with 10 endpoints | ||
| 3. Send 1000 unique flows and check that the resultant packet which goes out of the DUT contains varying outer dst IPs, track the flow to outer dst IP | ||
| 4. Modify VNET_ROUTE_TUNNEL_TABLE to remove 1 endpoint IP, check that the only flows impacted in the 1000 unique flow to outer dst IP mapping are the ones associated with the withdrawn endpoint | ||
| 5. Modify VNET_ROUTE_TUNNEL_TABLE to add 1 endpoint IP, check that only a small % of flows, ie <10% are impacted by this endpoint addition. | ||
| 6. Validate that in all cases the flow distribution per endpoint is roughly equal | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.