How to Enable, Disable, and Verify LACP on AHV hosts
How to Enable, Disable, and Verify LACP on AHV hosts
How to Enable, Disable, and Verify LACP on AHV hosts
Description
This article guides you through configuring, enabling, disabling, and verifying Link Aggregation Control Protocol (LACP) on AHV hosts with the following:
- Benefits/Advantages of LACP
- Recommended Switch Configuration
- Solution
- Configuring, verifying, and disabling LACP
For more information, see AHV Networking Best Practices Guide in Nutanix Portal for complete documentation on configuring networking for AHV hosts.
Benefits/Advantages of LACP
- A single user VM with multiple TCP streams could use up to 20 Gbps of bandwidth in an AHV node with two 10 GB adapters.
- A traffic-hashing algorithm such as balance-TCP can split traffic between multiple links in an active-active fashion. Because the uplinks appear as a single L2 link, the algorithm can balance the traffic among bond members without any regard for switch MAC address tables.
- With LACP, multiple links to separate physical switches appear as a single layer-2 link.
Note: To use multiple upstream switches, you must configure MLAG or vPC on the physical switch.
Recommended Switch Configuration
It is recommended to enable LACP fallback on the switch used to connect the AHV nodes. Sample commands can be found below. For other switch vendors, refer to their product manuals.
Arista:
port-channel lacp fallback individual
Cisco Nexus:
no lacp suspend-individual
**Fallback mode is disabled by default
Cisco Catalyst:
no port-channel standalone-disable
**Fallback mode is disabled by default
Juniper QFX:
force-up
**See Nutanix KB-15541 for specific considerations with Juniper switches.
Solution
AOS 5.19 or newer
It is possible to enable LACP using Prism Element (PE) or Prism Central (PC) UI. Refer to the About Virtual Switch chapter of the AHV Admin Guide for more information about Virtual switches. Refer to Creating or Updating a Virtual Switch chapter for information on how to manage Virtual switches.
There is a possibility that a virtual switch will not be deployed properly due to inconsistent bond config across nodes,and then bond config has to be manually updated to make bond / bridge config match on all nodes. DO NOT use ovs-vsctl commands to make the bridge level changes (like Disable or Enable LACP). Instead, use manage_ovs commands.
Before using manage_ovs to make OVS level changes, the virtual switch needs to be temporarily disabled.
Below sequence of steps can be followed:
- List virtual switch
nutanix@cvm$ acli net.list_virtual_switch
- Disable virtual switch on cluster by running below command on a Controller VM (CVM):
nutanix@cvm$ acli net.disable_virtual_switch
- Then use manage_ovs commands mentioned in "How to disable LACP on the host" section to change bond config from LACP to active-backup/balance-slb.
NOTE: For AOS >= 5.19 and < 5.20.2- There is a known issue in virtual switch because of which virtual switch gets automatically re-created even after disabling it. This issue is fixed in AOS 5.20.2 onwards.
- It gets auto re-created in scenario when virtual switch is not present on cluster and one of the first AHV node in cluster disconnects from network and restores network connectivity. You can possibly hit this scenario during the process of changing bond configuration on AHV and/or doing port configuration changes on physical switch side.
- If virtual switch gets re-enabled due to this issue, then manage_ovs commands will fail with following error:
2021-08-24 17:46:20,254Z INFO manage_ovs:400 UUID for local host is ecb39f18-fdfe-465e-a944-2506f189ee72 2021-08-24 17:46:20,261Z CRITICAL manage_ovs:450 Bridge name: br0 is used by virtual switch: vs0. OVS bridge: br0 used by virtual switch can't be modified with manage_ovs.
- In such a situation, follow steps (a) and (b) to check if virtual switch is present on cluster and then disable it again before moving to next node for making bond config changes
- Once the bond config is made consistent across all nodes, migrate the corresponding bridge(s) to virtual switch(s) as shown in example below.
nutanix@cvm$ acli net.migrate_br_to_virtual_switch brX vs_name=vsX Example: nutanix@cvm$ acli net.migrate_br_to_virtual_switch br0 vs_name=vs0
Manual Method
- Refer to AHV Networking Best Practices Guide
- Bonded ports aggregate the physical interfaces on the AHV host. By default, the system creates a bond named br0-up in bridge br0 containing all physical interfaces. Changes to the default bond (br0-up) using manage_ovs commands can rename it to bond0 when using older examples, so keep in mind that your system may be named differently than the examples below.
- Previous versions of this guide used the bond name bond0 instead of br0-up (bond0 used to be the default). Nutanix recommends using the name br0-up to quickly identify this interface as the bridge br0 uplink. Using this naming scheme allows to easily distinguish uplinks for additional bridges from each other.
In the below examples, we are using br0-up as a bond name instead of bond0 or any other custom naming scheme.
WARNING: Updating uplinks using "manage_ovs" will delete and recreate the bond with the default configuration.
Consider the following before updating uplinks:
- If active-backup load balancing mode is used, an uplink update can cause a brief host network disconnect.
- If balance-slb or balance-tcp (LACP) load balancing mode is used, an uplink update will reset the configuration to active-passive. Network links relying on LACP will go down at this point as the host stops responding to keepalives. This condition can be mitigated by running the following command on the local CVM in AOS versions 5.5.3+ and 5.6.1+:
nutanix@cvm$ manage_ovs --bridge <bridge name> --interfaces <interface names> --bond_name <bond name> --bond_mode balance-tcp --lacp_mode fast --lacp_fallback true update_uplinks
It is strongly recommended to perform changes on one node at a time after making sure that the cluster can tolerate node failure. The use of allssh manage_ovs update_uplinks command may lead to a cluster outage. Only use it if the cluster is not in production and has no user VMs running.
Workflow overview
- Connect to CVM (Controller VM) via SSH. Make sure you are connected to the correct CVM by checking its name and IP.
- Follow Verifying the Cluster Health chapter in the AHV Administration Guide to make sure that cluster can tolerate node being down. Do not proceed if the cluster cannot tolerate the failure of at least 1 node.
- Put the node and CVM in the maintenance mode:
a. Check the availability of changing maintenance mode of target hosts:
nutanix@cvm$ acli host.enter_maintenance_mode_check <host ip>
b. Put the host into maintenance mode. This will migrate running VMs to other hosts:
nutanix@cvm$ acli host.enter_maintenance_mode <host ip>
c. Enable maintenance mode for the CVM on the target host. This step is for preventing CVM services from being impacted by possible connectivity issue. You may skip this step if the CVM services are not running, or cluster has stopped.
nutanix@cvm$ ncli host edit id=<host ID> enable-maintenance-mode=true
Note: You can find <host ID> in the output of "ncli host ls" command:
ncli host list
Id : 00058977-c18c-af17-0000-000000006f89::2872 <--- "2872" is host ID
Uuid : ddc9d93b-68e0-4220-85f9-63b73d08f0ff
...
- Connect to host via IPMI, as LACP configuration process might cause network disconnect.
- Perform required configuration steps (How to configure LACP in AHV, How to disable LACP on the host).
- Once the configuration is completed, make sure both host and CVM are accessible via network. Also, make sure that all NICs in the bond are operational by shutting down links one by one and verifying connectivity.
- If all tests are successfully completed, remove CVM and node from maintenance mode:
a. From one of the other CVMs, run the following command to exit the affected CVM from maintenance mode:
nutanix@cvm$ ncli host edit id=<host ID> enable-maintenance-mode=false
b. Exit host into maintenance mode. This will restore VM locality:
nutanix@cvm$ acli host.exit_maintenance_mode <host ip>
How to check the bond name and bridge/switch name?
Run the command from CVM to get the bond name and switch/bridge name:
nutanix@cvm$ manage_ovs show_uplinks
Example
nutanix@cvm$ manage_ovs show_uplinks Bridge br1: ---> Bridge/Switch name Uplink ports: br1-up ---> bond name Uplink ifaces: eth1 eth0 Bridge br0: ---> Bridge/Switch name Uplink ports: br0-up ---> bond name Uplink ifaces: eth3 eth2
In this example, the bond name is br0-up for bridge/switch br0 and br1-up for bridge/switch br1.
How to configure LACP in AHV
To reconfigure specific uplinks remotely from another non-maintenance mode CVM in the cluster when the target host has network access, use manage_ovs command. Use variables as necessary:
nutanix@cvm$ manage_ovs --bridge <bridge name> --interfaces <interface names> --bond_name <bond name> --host <target AHV host IP address> --bond_mode balance-tcp --lacp_mode fast --lacp_fallback true update_uplinks
See below for an example:
nutanix@cvm$ manage_ovs --bridge br0 --interfaces eth2,eth3 --bond_name br0-up --host hh.hh.hh.hh --bond_mode balance-tcp --lacp_mode fast --lacp_fallback true update_uplinks
Note: Consider the LACP time options (slow and fast). If the switches have a fast configuration, keep in mind that you need to set the LACP of Nutanix Cluster on fast mode, too. Otherwise, you could get an outage due to a mismatch in LACP speed.
How to verify if LACP is configured in AHV
Run the following commands.
root@ahv# ovs-appctl bond/show br0-up
root@ahv# ovs-appctl lacp/show br0-up
Working examples:
For the ovs-appctl bond/show, note "lacp_status: negotiated"
root@ahv# ovs-appctl bond/show br0-up
---- br0-up ----
bond_mode: balance-tcp
bond may use recirculation: yes, Recirc-ID : 301
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 6757 ms
lacp_status: negotiated
active slave mac: 0c:c4:7a:1e:3e:6e(eth2)
slave eth2: enabled
active slave
may_enable: true
hash 78: 127 kB load
hash 108: 13 kB load
hash 244: 94 kB load
slave eth3: enabled
may_enable: true
hash 9: 6 kB load
hash 11: 11 kB load
hash 23: 12 kB load
...
For ovs-appctl lacp/show, note "status: active negotiated" and "current attached" for each interface:
root@ahv# ovs-appctl lacp/show br0-up
---- br0-up ----
status: active negotiated
sys_id: 0c:c4:7a:2f:4f:9d
sys_priority: 65534
aggregation key: 1
lacp_time: slow
slave: eth2: current attached
port_id: 1
port_priority: 65535
may_enable: true
actor sys_id: 0c:c4:7a:2f:4f:9d
actor sys_priority: 65534
actor port_id: 1
actor port_priority: 65535
actor key: 1
actor state: activity aggregation synchronized collecting distributing
partner sys_id: 00:2b:21:45:2d:12
partner sys_priority: 32768
partner port_id: 39
partner port_priority: 32768
partner key: 115
partner state: activity aggregation synchronized collecting distributing
slave: eth3: current attached
port_id: 2
port_priority: 65535
may_enable: true
actor sys_id: 0c:c4:7a:2f:4f:9d
actor sys_priority: 65534
actor port_id: 2
actor port_priority: 65535
actor key: 1
actor state: activity aggregation synchronized collecting distributing
partner sys_id: 00:2b:21:45:2d:12
partner sys_priority: 32768
partner port_id: 40
partner port_priority: 32768
partner key: 115
partner state: activity aggregation synchronized collecting distributing
Non-working examples
For ovs-appctl bond/show, note "lacp_status: configured":
root@ahv# ovs-appctl bond/show br0-up
---- br0-up ----
bond_mode: balance-tcp
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: configured
lacp_fallback_ab: true
active-backup primary: <none>
active slave mac: 00:e0:ed:8b:72:a1(eth2)
< truncated >
"configured" indicates that LACP has been configured on the AHV host, but the host has not been able to negotiate LACP with the switch ports.
For ovs-appctl lacp/show br0-up, note "status: active" and "defaulted attached" for each interface:
root@ahv# ovs-appctl lacp/show br0-up
---- br0-up ----
status: active
sys_id: ac:2f:7b:b6:fe:2e
sys_priority: 65534
aggregation key: 1
lacp_time: fast
slave: eth0: defaulted attached
port_id: 2
port_priority: 65535
may_enable: true
actor sys_id: ac:2f:7b:b6:fe:2e
actor sys_priority: 65534
actor port_id: 2
actor port_priority: 65535
actor key: 1
actor state: activity timeout aggregation synchronized collecting distributing defaulted
partner sys_id: 00:00:00:00:00:00
partner sys_priority: 0
partner port_id: 0
partner port_priority: 0
partner key: 0
partner state:
slave: eth1: defaulted attached
port_id: 3
port_priority: 65535
may_enable: true
actor sys_id: ac:2f:7b:b6:fe:2e
actor sys_priority: 65534
actor port_id: 3
actor port_priority: 65535
actor key: 1
actor state: activity timeout aggregation synchronized collecting distributing defaulted
The output indicates that LACP has not been negotiated. Defaulted indicates that the AHV host has not received LACP PDU's from the switch. If you see this configuration on your host, check the LACP configuration on the switches the host is connected to.
How to disable LACP on the host
Perform the following steps to safely disable LACP on the host. Check the "Workflow overview" section above for steps to perform pre- and post-configuration tasks.
- Configure hosts to use a bonding mode that does not require LACP using one of the below commands.
a. The following command sets the load balancing policy to active-backup, which means only one active uplink is used and standby adapters are used only when the active adapter fails.
nutanix@cvm$ manage_ovs --bridge <bridge name> --interfaces <interface names> --bond_name <bond name> --host <target AHV host IP address> --bond_mode active-backup update_uplinks
All members of the bond must be physically connected, or the manage_ovs command produces a warning and exits without configuring the bond. To avoid this error and provision interfaces of the bond even if they are not connected, use the require_link=false flag.
nutanix@cvm$ manage_ovs --bridge <bridge name> --interfaces <interface names> --bond_name <bond name> --host <target AHV host IP address> --bond_mode active-backup --require_link=false update_uplinks
b. The following command sets the load balancing policy to balance-slb, which rebalances the VM traffic from highly used to less used interfaces, and it uses all available uplinks. You can reference AHV Networking Best Practices Guide to understand more about the load balancing policies.
- Turn off LACP on the switch ports.
How to find MAC address of an AHV host NIC
To find MAC address of an AHV host NIC, use either of the following commands:
- Execute following command on AHV host
[root@ahv ~]# ethtool -P <interface>
Sample output:
[root@ahv ~]# ethtool -P eth3
Permanent address: 00:25:90:cb:39:27
- Execute following command on AHV host
[root@ahv ~]# ifconfig <interface>
Sample output:
[root@ahv ~]# ifconfig eth3
eth3 Link encap:Ethernet HWaddr 00:25:90:CB:39:27
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:46857327754 errors:0 dropped:228250 overruns:0 frame:0
TX packets:49134503170 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:49893674683483 (45.3 TiB) TX bytes:54855610562476 (49.8 TiB)
Additional Information
- Nutanix KB 3263 - Original document in Nutanix Portal
- Nutanix landing page
- Lenovo ISG Support Plan - ThinkAgile HX Appliance and Lenovo Converged HX Series