-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Expected Behavior
Current Behavior
The current logic infers the Service IP family from its primary ClusterIP:
svcIsIPv6 := isIPv6(svc.Spec.ClusterIP)In a dual-stack Service:
- if
ipFamilies: [IPv6, IPv4], the ClusterIP will be IPv6 - if
ipFamilies: [IPv4, IPv6], the ClusterIP will be IPv4
Then endpoint validation is done as:
if isIPv6(address) != svcIsIPv6 {
continue
}This is where the mismatch happens:
- Service has ipFamilies: [IPv6, IPv4] -> ClusterIP is IPv6 ->
svcIsIPv6 = true. - The IPv6 EndpointSlice event update arrives first all IPv6 endpoints match.
2025-12-04 12:09:35.495 [DEBUG][61] Checking routes for service advertise=true svc="default/nginx-lb"
2025-12-04 12:09:35.495 [DEBUG][61] Setting routes routes=["2001:db8:20::1/128", "203.0.113.81/32", "2001:db8:900::1/128", "203.0.113.16/32"]- The IPv4 EndpointSlice event update arrives afterwards.
- All IPv4 endpoints are skipped because
isIPv6(address) != svcIsIPv6.
2025-12-04 12:09:35.504 [DEBUG][61] Skipping service with no local endpoints svc="default/nginx-lb"
2025-12-04 12:09:35.504 [DEBUG][61] Checking routes for service advertise=false svc="default/nginx-lb"- The function returns false and the service is removed from advertisement.
If the update events order remains consistent (e.g., the IPv6 ep slice always updates first), the service may never be advertised at all, because the IPv4 ep slice will always be evaluated against the wrong inferred family and will always be skipped, since it never contains the IPv6 address.
Possible Solution
A possible fix using Service.spec.ipFamilies and EndpointSlice.addressType is provided in PR #11503
Steps to Reproduce (for bugs)
- Create a dual-stack Kubernetes cluster
- Create a Deployment with two replicas
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:stable- Deploy a Service with ipFamilies:
[IPv6, IPv4]
apiVersion: v1
kind: Service
metadata:
name: nginx-lb
namespace: default
annotations:
"projectcalico.org/loadBalancerIPs": '["2001:db8:20::1","203.0.113.81"]'
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ipFamilies: [IPv6, IPv4]
selector:
app: nginx
ports:
- port: 80- Trigger rollouts to reproduce the race condition between two
endpointSlice
kubectl rollout restart deployment/nginxContext
We observed this behavior in a production environment, where some dual-stack LoadBalancer Services with externalTrafficPolicy: Local intermittently lost their local route advertisements. This prompted an investigation that led us to identify a race in the Service IP family matching logic.
Your Environment
- Calico version:
v3.31.2 - Calico dataplane (bpf, nftables, iptables, windows etc.):
bpf - Orchestrator version (e.g. kubernetes, openshift, etc.):
v1.33.6 - Operating System and version:
Ubuntu 24.04.4 - Link to your project (optional):