You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the sonic_vxlan module is used with state: overridden and a config dict containing both vlan_map and vrf_map, it will flap (delete and re-create) all VLAN to L2VNI mappings, even though these were correctly configured to begin with. This causes a severe service disruption.
The flapping behaviour goes away if state: replaced or state: merged is used instead, however in these cases the module still falsely reports there are changes required. I suspect that the two issues have identical (or at least related) root causes, so I describe both in the same bug report.
This does not happen if the config dict does not contain vrf_map.
Product Name
SONiC-OS-4.2.0-Enterprise_Base
Component or Module Name
sonic_vxlans
DellEMC Enterprise SONiC Ansible Collection Version
Only the first iteration of the Map VLAN 10 to L2VNI 10 task should return changed:, the subsequent ones should be idempotent and return ok:.
Only the first iteration of the Additionally map Vrf_twenty to L3VNI 2020 task should return changed:, the subsequent ones should be idempotent and return ok:.
No iteration of the Additionally map Vrf_twenty to L3VNI 2020 should cause any change in state to the VLAN 10/L2VNI 10 mapping, as this part of the config: dict is unchanged from the preceding Map VLAN 10 to L2VNI 10 task.
Actual Behavior
Only the first iteration of the Map VLAN 10 to L2VNI 10 task should return changed:, the subsequent ones should be idempotent and return ok: - this is as expected, and shows that the bug is dependent on the presence of vrf_map.
All three iterations of the Additionally map Vrf_twenty to L3VNI 2020 task reports changed:. This is unexpected, as the config: dict used does not change between the iterations. This also happens if the task is changed to use state: replaced or state: merged.
All three iterations of the Additionally map Vrf_twenty to L3VNI 2020 task results in the deletion and re-addition of the VLAN 10 to L2VNI 10 mapping. This is unexpected, as this part of the config dict does not change from the Map VLAN 10 to L2VNI 10 task (or between individual iterations of Additionally map Vrf_twenty to L3VNI 2020 task for that matter). This caused a critical outage in our production network.
For what it is worth, the resulting configuration at the end of the playbook run appears to be correct:
A single run of the Additionally map Vrf_twenty to L3VNI 2020 task yields the following relevant output logged to /var/log/ramfs/in-memory-syslog-info.log, of particular interest are the DELETE calls:
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-72] User "admin@10.10.10.1:45032" request "GET /restconf/data/sonic-vxlan:sonic-vxlan" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-73] User "admin@10.10.10.1:45046" request "GET /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-74] User "admin@10.10.10.1:45062" request "GET /restconf/data/sonic-vrf:sonic-vrf/VRF/VRF_LIST" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-75] User "admin@10.10.10.1:45068" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL_MAP/VXLAN_TUNNEL_MAP_LIST=vtep1,map_10_Vlan10" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-76] User "admin@10.10.10.1:45070" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL/VXLAN_TUNNEL_LIST=vtep1/src_ip" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-77] User "admin@10.10.10.1:45076" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST=nvo1" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-78] User "admin@10.10.10.1:38586" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL/VXLAN_TUNNEL_LIST=vtep1" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-79] User "admin@10.10.10.1:38590" request "PATCH /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-80] User "admin@10.10.10.1:38600" request "PATCH /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-81] User "admin@10.10.10.1:38606" request "PATCH /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL_MAP" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-82] User "admin@10.10.10.1:38618" request "PATCH /restconf/data/sonic-vrf:sonic-vrf/VRF/VRF_LIST=Vrf_twenty/vni" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-83] User "admin@10.10.10.1:38630" request "GET /restconf/data/sonic-vxlan:sonic-vxlan" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-84] User "admin@10.10.10.1:38646" request "GET /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-85] User "admin@10.10.10.1:38654" request "GET /restconf/data/sonic-vrf:sonic-vrf/VRF/VRF_LIST" status - 200
During the above run, the following was logged by a running ip montor link session:
149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 qdisc noqueue master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
Deleted 149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
76: Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP group default event FEATURE CHANGE
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
Deleted 149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 qdisc noop state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
76: Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP group default event FEATURE CHANGE
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 qdisc noop master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc noqueue master Bridge state UNKNOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 master Bridge state UNKNOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 master Bridge state UNKNOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 master Bridge state UNKNOWN
link/ether 0c:eb:33:95:00:49
Screenshots
No response
Additional Information
Identical behaviour is observed with dellemc.enterprise_sonic 2.2.0
The text was updated successfully, but these errors were encountered:
The expected behaviour here is the removal of the map vni 2020 vrf Vrf_twenty (as this mapping does not appear in the config: dict passed to this task), but this does not happen at all - it is left intact. Instead, the task behaves how I would have expected it to behave had state: merged been specified..
Bug Description
When the
sonic_vxlan
module is used withstate: overridden
and aconfig
dict containing bothvlan_map
andvrf_map
, it will flap (delete and re-create) all VLAN to L2VNI mappings, even though these were correctly configured to begin with. This causes a severe service disruption.The flapping behaviour goes away if
state: replaced
orstate: merged
is used instead, however in these cases the module still falsely reports there are changes required. I suspect that the two issues have identical (or at least related) root causes, so I describe both in the same bug report.This does not happen if the
config dict
does not containvrf_map
.Product Name
SONiC-OS-4.2.0-Enterprise_Base
Component or Module Name
sonic_vxlans
DellEMC Enterprise SONiC Ansible Collection Version
dellemc.enterprise_sonic 2.4.0
SONiC Software Version
4.2.0-Enterprise_Base
Configuration
Steps to Reproduce
Expected Behavior
Map VLAN 10 to L2VNI 10
task should returnchanged:
, the subsequent ones should be idempotent and returnok:
.Additionally map Vrf_twenty to L3VNI 2020
task should returnchanged:
, the subsequent ones should be idempotent and returnok:
.Additionally map Vrf_twenty to L3VNI 2020
should cause any change in state to the VLAN 10/L2VNI 10 mapping, as this part of theconfig:
dict is unchanged from the precedingMap VLAN 10 to L2VNI 10
task.Actual Behavior
Map VLAN 10 to L2VNI 10
task should returnchanged:
, the subsequent ones should be idempotent and returnok:
- this is as expected, and shows that the bug is dependent on the presence ofvrf_map
.Additionally map Vrf_twenty to L3VNI 2020
task reportschanged:
. This is unexpected, as theconfig:
dict used does not change between the iterations. This also happens if the task is changed to usestate: replaced
orstate: merged
.Additionally map Vrf_twenty to L3VNI 2020
task results in the deletion and re-addition of the VLAN 10 to L2VNI 10 mapping. This is unexpected, as this part of the config dict does not change from theMap VLAN 10 to L2VNI 10
task (or between individual iterations ofAdditionally map Vrf_twenty to L3VNI 2020
task for that matter). This caused a critical outage in our production network.For what it is worth, the resulting configuration at the end of the playbook run appears to be correct:
Logs
This is the console log from running the playbook:
A single run of the
Additionally map Vrf_twenty to L3VNI 2020
task yields the following relevant output logged to/var/log/ramfs/in-memory-syslog-info.log
, of particular interest are theDELETE
calls:During the above run, the following was logged by a running
ip montor link
session:Screenshots
No response
Additional Information
Identical behaviour is observed with dellemc.enterprise_sonic 2.2.0
The text was updated successfully, but these errors were encountered: