BGP multipath allows you to install multiple internal BGP paths and multiple external BGP paths to the forwarding table. Selecting multiple paths enables BGP to load-balance traffic across multiple links. Equal-cost multipath (ECMP) is a network routing strategy that allows for traffic of the same session, or flow—that is, traffic with the same source and destination—to be transmitted across multiple paths of equal cost.
We’ll show the way we have activated ECMP from MetalLB LoadBalancer service for some applications and also what we have set up from SRL leaf routers to make it work. The lab setup is described in “CALICO AND METALLB WORKING TOGETHER WITH BGP”
Next, you have a picture with the topology of the lab: MetalLB and ECMP with SRLinux
Nokia SRLinux ECMP settings
Those are the settings I am using in the srlinux leaf router connected to all k8s nodes:
--{ + running }--[ network-instance ip-vrf1 protocols ]-- A:leaf2# info bgp-evpn { bgp-instance 1 { admin-state enable vxlan-interface vxlan1.4 evi 4 ecmp 4 } } bgp { admin-state enable autonomous-system 65320 router-id 6.5.3.2 dynamic-neighbors { accept { match 6.4.5.0/26 { peer-group metallb-bgp allowed-peer-as [ 65201 ] } match 192.168.101.0/24 { peer-group calico-bgp allowed-peer-as [ 64512 ] } } } ebgp-default-policy { import-reject-all false export-reject-all false } group calico-bgp { admin-state enable export-policy export-calico import-policy import-all timers { minimum-advertisement-interval 1 } transport { local-address 6.5.3.2 } } group metallb-bgp { admin-state enable export-policy export-all import-policy import-all timers { minimum-advertisement-interval 1 } transport { local-address 6.5.3.2 } } ipv4-unicast { multipath { allow-multiple-as true max-paths-level-1 64 max-paths-level-2 64 } } } bgp-vpn { bgp-instance 1 { route-target { export-rt target:65123:4 import-rt target:65123:4 } } }
As you can see, k8s nodes are connected via a EVPN Layer2 domain. In order to use ECMP, you have to activate multipath in BGP as follow
ipv4-unicast { multipath { allow-multiple-as true max-paths-level-1 64 max-paths-level-2 64
In this case, for MetalLB and ECMP with SRLinux, you can use a value under 64. In this case, I am using the max value for testing purposes.
Also. I recommend to enable ecmp in bgp-evpn if you are working with EVPN.
Kubernetes MetalLB service settings
Since we are using BGP with ECMP, you have to skip kube-proxy. When announcing over BGP, MetalLB respects the service’s externalTrafficPolicy option, and implements two different announcement modes depending on what policy you select. If you’re familiar with Google Cloud’s Kubernetes load balancers, then you know what we are talking here: MetalLB’s behaviors and tradeoffs are identical.
“Local” traffic policy
With the Local traffic policy, nodes will only attract traffic if they are running one or more of the service’s pods locally. The BGP routers will load balance incoming traffic only across those nodes that are currently hosting the service. On each node, the traffic is forwarded only to local pods by kube-proxy, there is no “horizontal” traffic flow between nodes.
Based on this info. we’ll define the following LoadBalancer service for ingress controller we defined in our last post: Ingress and MetalLB
apiVersion: v1 kind: Service metadata: name: ingress-ctl-lb annotations: externalTrafficPolicy: local namespace: ingress-nginx spec: type: LoadBalancer ports: - name: http port: 80 targetPort: 80 selector: app.kubernetes.io/name: ingress-nginx app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/component: controller
Once this service is created, you will defined a service like this:
[root@ctl-a1 ~]# kubectl get svc -n ingress-nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-ctl-lb LoadBalancer 10.111.14.140 10.254.254.241 80:31541/TCP 21h ingress-nginx-controller NodePort 10.96.20.11 <none> 80:31053/TCP,443:31764/TCP 22h ingress-nginx-controller-admission ClusterIP 10.100.46.67 <none> 443/TCP 13d
Final results
From the leaf routers ‘leaf1’ you will se the following regarding the received routes form k8s nodes and other leaf nodes:
A:leaf1# /show network-instance ip-vrf1 protocols bgp routes ipv4 prefix 10.254.254.241/32 detail ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Show report for the BGP routes to network "10.254.254.241/32" network-instance "ip-vrf1" ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Network: 10.254.254.241/32 Received Paths: 4 Path 1: <Best,Valid,Used,> Route source : neighbor 0.0.0.0 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 0.0.0.0 Path : ? Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: none Path 2: <Valid,> Route source : neighbor 6.4.5.20 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 6.4.5.20 Path : ? [65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: as-path-length Path 3: <Valid,> Route source : neighbor 6.4.5.22 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 6.4.5.22 Path : ? [65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: peer-router-id Path 4: <Valid,> Route source : neighbor 6.4.5.31 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 6.4.5.31 Path : ? [65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: peer-router-id Path 1 was advertised to: [ 6.4.5.20, 6.4.5.22, 6.4.5.31 ] Route Preference: MED is -, LocalPref is 100 Path : ? [65310] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --{ + running }--[ ]--
From the leaf routers ‘leaf2’ you will se the following regarding the received routes form k8s nodes and other leaf nodes:
A:leaf2# /show network-instance ip-vrf1 protocols bgp routes ipv4 prefix 10.254.254.241/32 detail ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Show report for the BGP routes to network "10.254.254.241/32" network-instance "ip-vrf1" ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Network: 10.254.254.241/32 Received Paths: 3 Path 1: <Best,Valid,Used,> Route source : neighbor 6.4.5.21 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 6.4.5.21 Path : ? [65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: none Path 2: <Best,Valid,Used,> Route source : neighbor 6.4.5.30 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 6.4.5.30 Path : ? [65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: peer-router-id Path 3: <Best,Valid,Used,> Route source : neighbor 6.4.5.32 Route Preference: MED is -, LocalPref is 100 BGP next-hop : 6.4.5.32 Path : ? [65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None Invalid Reason : None Tie Break Reason: peer-router-id Path 3 was advertised to: [ 6.4.5.30, 6.4.5.32 ] Route Preference: MED is -, LocalPref is 100 Path : ? [65320, 65201] Communities : None RR Attributes : No Originator-ID, Cluster-List is [ - ] Aggregation : Not an aggregate route Unknown Attr : None ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --{ + running }--[ ]--
then, the route in leaf1 for this prefix will be:
A:leaf1# /show network-instance ip-vrf1 route-table ipv4-unicast prefix 10.254.254.241/32 detail ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ IPv4 Unicast route table of network instance ip-vrf1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Destination : 10.254.254.241/32 ID : 0 Route Type : bgp-evpn Route Owner : bgp_evpn_mgr Metric : 0 Preference : 170 Best : true Last change : 2021-09-28T18:05:23.902Z Resilient hash: false ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Next hops: 1 entries 1.1.1.2 (indirect) resolved by 1.1.1.2/32 (vxlan) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And the routes in the leaf2 for the same prefix:
A:leaf2# /show network-instance ip-vrf1 route-table ipv4-unicast prefix 10.254.254.241/32 detail ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ IPv4 Unicast route table of network instance ip-vrf1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Destination : 10.254.254.241/32 ID : 0 Route Type : bgp Route Owner : bgp_mgr Metric : 0 Preference : 170 Best : true Last change : 2021-09-28T18:05:21.922Z Resilient hash: false ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Next hops: 3 entries 6.4.5.21 (indirect) resolved by 6.4.5.21/32 (static) via 192.168.101.21 (indirect) resolved by 192.168.101.0/24 (local) via 192.168.101.1 (direct) via [irb0.0] 6.4.5.30 (indirect) resolved by 6.4.5.30/32 (static) via 192.168.101.30 (indirect) resolved by 192.168.101.0/24 (local) via 192.168.101.1 (direct) via [irb0.0] 6.4.5.32 (indirect) resolved by 6.4.5.32/32 (static) via 192.168.101.32 (indirect) resolved by 192.168.101.0/24 (local) via 192.168.101.1 (direct) via [irb0.0]
As you can see, we have multiple routes to different nodes in the Kubernetes Cluster. In the case of ‘leaf1’, is taking the route to ‘leaf2’ as valid, and leaving ‘leaf2’ to manage all the routes to different k8s nodes.
See ya!