MetalLB and ECMP with SRLinux

MetalLB and ECMP with SRLinux post thumbnail image

BGP multipath allows you to install multiple internal BGP paths and multiple external BGP paths to the forwarding table. Selecting multiple paths enables BGP to load-balance traffic across multiple links. Equal-cost multipath (ECMP) is a network routing strategy that allows for traffic of the same session, or flow—that is, traffic with the same source and destination—to be transmitted across multiple paths of equal cost.

We’ll show the way we have activated ECMP from MetalLB LoadBalancer service for some applications and also what we have set up from SRL leaf routers to make it work. The lab setup is described in “CALICO AND METALLB WORKING TOGETHER WITH BGP”

Next, you have a picture with the topology of the lab: MetalLB and ECMP with SRLinux

 

Nokia SRLinux ECMP settings

Those are the settings I am using in the srlinux leaf router connected to all k8s nodes:

--{ + running }--[ network-instance ip-vrf1 protocols ]--
A:leaf2# info
    bgp-evpn {
        bgp-instance 1 {
            admin-state enable
            vxlan-interface vxlan1.4
            evi 4
            ecmp 4
        }
    }
    bgp {
        admin-state enable
        autonomous-system 65320
        router-id 6.5.3.2
        dynamic-neighbors {
            accept {
                match 6.4.5.0/26 {
                    peer-group metallb-bgp
                    allowed-peer-as [
                        65201
                    ]
                }
                match 192.168.101.0/24 {
                    peer-group calico-bgp
                    allowed-peer-as [
                        64512
                    ]
                }
            }
        }
        ebgp-default-policy {
            import-reject-all false
            export-reject-all false
        }
        group calico-bgp {
            admin-state enable
            export-policy export-calico
            import-policy import-all
            timers {
                minimum-advertisement-interval 1
            }
            transport {
                local-address 6.5.3.2
            }
        }
        group metallb-bgp {
            admin-state enable
            export-policy export-all
            import-policy import-all
            timers {
                minimum-advertisement-interval 1
            }
            transport {
                local-address 6.5.3.2
            }
        }
        ipv4-unicast {
            multipath {
                allow-multiple-as true
                max-paths-level-1 64
                max-paths-level-2 64
            }
        }
    }
    bgp-vpn {
        bgp-instance 1 {
            route-target {
                export-rt target:65123:4
                import-rt target:65123:4
            }
        }
    }

As you can see, k8s nodes are connected via a EVPN Layer2 domain. In order to use ECMP, you have to activate multipath in BGP as follow

ipv4-unicast {
    multipath {
        allow-multiple-as true
        max-paths-level-1 64
        max-paths-level-2 64

In this case, for MetalLB and ECMP with SRLinux, you can use a value under 64. In this case, I am using the max value for testing purposes.

Also. I recommend to enable ecmp in bgp-evpn if you are working with EVPN.

Kubernetes MetalLB service settings

Since we are using BGP with ECMP, you have to skip kube-proxy. When announcing over BGP, MetalLB respects the service’s externalTrafficPolicy option, and implements two different announcement modes depending on what policy you select. If you’re familiar with Google Cloud’s Kubernetes load balancers, then you know what we are talking here: MetalLB’s behaviors and tradeoffs are identical.

“Local” traffic policy

With the Local traffic policy, nodes will only attract traffic if they are running one or more of the service’s pods locally. The BGP routers will load balance incoming traffic only across those nodes that are currently hosting the service. On each node, the traffic is forwarded only to local pods by kube-proxy, there is no “horizontal” traffic flow between nodes.

Based on this info. we’ll define the following LoadBalancer service for ingress controller we defined in our last post: Ingress and MetalLB

apiVersion: v1
kind: Service
metadata:
  name: ingress-ctl-lb
  annotations:
    externalTrafficPolicy: local
  namespace: ingress-nginx
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/component: controller

Once this service is created, you will defined a service like this:

[root@ctl-a1 ~]# kubectl get svc -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
ingress-ctl-lb                       LoadBalancer   10.111.14.140   10.254.254.241   80:31541/TCP                 21h
ingress-nginx-controller             NodePort       10.96.20.11     <none>           80:31053/TCP,443:31764/TCP   22h
ingress-nginx-controller-admission   ClusterIP      10.100.46.67    <none>           443/TCP                      13d

Final results

From the leaf routers ‘leaf1’ you will se the following regarding the received routes form k8s nodes and other leaf nodes:

A:leaf1# /show network-instance ip-vrf1 protocols bgp routes ipv4 prefix 10.254.254.241/32 detail
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Show report for the BGP routes to network "10.254.254.241/32" network-instance  "ip-vrf1"
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Network: 10.254.254.241/32
Received Paths: 4
  Path 1: <Best,Valid,Used,>
    Route source    : neighbor 0.0.0.0
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 0.0.0.0
    Path            :  ?
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: none
  Path 2: <Valid,>
    Route source    : neighbor 6.4.5.20
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 6.4.5.20
    Path            :  ? [65201]
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: as-path-length
  Path 3: <Valid,>
    Route source    : neighbor 6.4.5.22
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 6.4.5.22
    Path            :  ? [65201]
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: peer-router-id
  Path 4: <Valid,>
    Route source    : neighbor 6.4.5.31
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 6.4.5.31
    Path            :  ? [65201]
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: peer-router-id

Path 1 was advertised to:
[ 6.4.5.20, 6.4.5.22, 6.4.5.31 ]
Route Preference: MED is -, LocalPref is 100
Path            :  ? [65310]
Communities     : None
RR Attributes   : No Originator-ID, Cluster-List is [ - ]
Aggregation     : Not an aggregate route
Unknown Attr    : None
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--{ + running }--[  ]--

From the leaf routers ‘leaf2’ you will se the following regarding the received routes form k8s nodes and other leaf nodes:

A:leaf2# /show network-instance ip-vrf1 protocols bgp routes ipv4 prefix 10.254.254.241/32 detail
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Show report for the BGP routes to network "10.254.254.241/32" network-instance  "ip-vrf1"
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Network: 10.254.254.241/32
Received Paths: 3
  Path 1: <Best,Valid,Used,>
    Route source    : neighbor 6.4.5.21
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 6.4.5.21
    Path            :  ? [65201]
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: none
  Path 2: <Best,Valid,Used,>
    Route source    : neighbor 6.4.5.30
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 6.4.5.30
    Path            :  ? [65201]
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: peer-router-id
  Path 3: <Best,Valid,Used,>
    Route source    : neighbor 6.4.5.32
    Route Preference: MED is -, LocalPref is 100
    BGP next-hop    : 6.4.5.32
    Path            :  ? [65201]
    Communities     : None
    RR Attributes   : No Originator-ID, Cluster-List is [ - ]
    Aggregation     : Not an aggregate route
    Unknown Attr    : None
    Invalid Reason  : None
    Tie Break Reason: peer-router-id

Path 3 was advertised to:
[ 6.4.5.30, 6.4.5.32 ]
Route Preference: MED is -, LocalPref is 100
Path            :  ? [65320, 65201]
Communities     : None
RR Attributes   : No Originator-ID, Cluster-List is [ - ]
Aggregation     : Not an aggregate route
Unknown Attr    : None
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--{ + running }--[  ]--

then, the route in leaf1 for this prefix will be:

A:leaf1# /show network-instance ip-vrf1 route-table ipv4-unicast prefix 10.254.254.241/32 detail
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IPv4 Unicast route table of network instance ip-vrf1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Destination   : 10.254.254.241/32
ID            : 0
Route Type    : bgp-evpn
Route Owner   : bgp_evpn_mgr
Metric        : 0
Preference    : 170
Best          : true
Last change   : 2021-09-28T18:05:23.902Z
Resilient hash: false
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Next hops: 1 entries
1.1.1.2 (indirect) resolved by 1.1.1.2/32 (vxlan)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

And the routes in the leaf2 for the same prefix:

A:leaf2# /show network-instance ip-vrf1 route-table ipv4-unicast prefix 10.254.254.241/32 detail
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
IPv4 Unicast route table of network instance ip-vrf1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Destination   : 10.254.254.241/32
ID            : 0
Route Type    : bgp
Route Owner   : bgp_mgr
Metric        : 0
Preference    : 170
Best          : true
Last change   : 2021-09-28T18:05:21.922Z
Resilient hash: false
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Next hops: 3 entries
6.4.5.21 (indirect) resolved by 6.4.5.21/32 (static)
  via 192.168.101.21 (indirect) resolved by 192.168.101.0/24 (local)
  via 192.168.101.1 (direct) via [irb0.0]
6.4.5.30 (indirect) resolved by 6.4.5.30/32 (static)
  via 192.168.101.30 (indirect) resolved by 192.168.101.0/24 (local)
  via 192.168.101.1 (direct) via [irb0.0]
6.4.5.32 (indirect) resolved by 6.4.5.32/32 (static)
  via 192.168.101.32 (indirect) resolved by 192.168.101.0/24 (local)
  via 192.168.101.1 (direct) via [irb0.0]

As you can see, we have multiple routes to different nodes in the Kubernetes Cluster. In the case of ‘leaf1’, is taking the route to ‘leaf2’ as valid, and leaving ‘leaf2’ to manage all the routes to different k8s nodes.

See ya!

 

 

 

 

Leave a Reply

Related Post