NSXT Edge Cluster VM_DEPLOYMENT_FAILED

When expanding NSXT edge cluster via SDDC manager, you will face an issue wherein the deployment will fail with below errors:-

We see the edge node creation tank begin here:

“creationTime”:1691054326558, “taskId”:”c4e9bfed-d285-48c0-8194-337a07b42a24″, “taskModel”: “FSM”, “taskRetry”:{“errorCodes”: [404,500,501 1. “method”: “PATCH”, “succensCode”:202, “url”: “http://localhost/domainmanager/workflowa/c4e9bfed-d285-48c0-0194-337a07b42a24”), “taskType” “NSXT_EDGECLUSTER CREATION”, “taskURL”: “http://localhost/domainmanager/workflowa/c4e9bfed-d285-48c0-8194-337a07b42a24”)

Subsequently ping tests to one of the edge nodes (or possible the VIF IP) fails:

14 2023-08-03T09:19:14.638+0000 DEBUG [vcf dm,f9b59c3a3529488b,df34] [c.v.v.n.h.NextEdgeClusterValidationUtil,dm-exec-7) Network pool overlapa: None2023-08-03T09:19:21.778+0000 DEBUG [vof_dm, f9b59c3a3529488b, df34] [c.v.e.s.c.util.HostValidationUtil.dm-exec-7] Trying to ping to 1.1.1.1.1612023-06-03T09:19:21.776+0000 DEBUG [vef_dm, f9b59c3a3529488b,df34) [c.v.e.a.c.util.HostValidationUtil, dm-exec-71 Verify ping connectivity to 1.1.1.1 with command ping 1.1.1.1-0 52023-08-03T09:19:21.779+0000 DEBUG [vcf dm, f9b59c3a3529488b, df34] [c.v.e.s.c.util.Local ProcessService, dm-exec-7] Executing then Local command: ping 1.1.1.1 0 52023-08-03709:19:33.351+0000 DEBUG [vef dm, 0000000000000000,0000] [c.v.v.s.c.s.SecurityConfigurationServiceImpl, pool-1-thread-1] Security config retrieved “certificateValidationEnabled”: false, “fipsMode”:false) 2023-08-03T09:19:33.351+0000 DEBUG (vcf dm, 0000000000000000,00001 c.v.v.secure.config.LazyTrustManager, pool-1-thread-1] Check if cert validation is enabled false2023-05-03T09:19:35.779+0000 DEBUG [vcf dm, f9b59c3a3529488b, bobb] [c.v.v.n.h.NextEdgeClusterValidationUtil,pool-3-thread-41] PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. , bobb)25 2023-08-03T09:19:35.775+0000 DEBUG (vcf dm, f9b59c3a3529488b [c.v.v.n.h.NextEdgeClusterValidationUtil, pool-3-thread-4112023-08-03T09:19:35.779+0000 DEBUG (vcf dm, f9b59c3a3529488b, babbl [c.v.v.n.h.NextEdgeClusterValidationUtil,pool-3-thread-41] 1.1.1.1 ping statistics-2023-08-03T09:19:35.779+0000 DEBUG [vcf_dm, 19b59c3a3529488b,b0bb] [c.v.v.n.h.NextEdgeClusterValidationUtil, pool-3-thread-411 5 packets transmitted, O received, 100%packet loss, time 1000ms2023-08-03T09:19:35.779+0000 DEBUG [vcf dm, f9b59c3a3529488b, b0bb] c.v.v.n.h.NextEdgeClusterValidationUtil, pool-3-thread-4112023-08-03T09:19:35.779+0000 ERROR ( vcf dm, f9b59c3a3529498b, df341 [ Ic.v.e.s.c.util.Local ProcessService, dm-exec-7] Local Command Failed with exit value 1. Output Logs Local Process Output: 2023-08-03 09:19:35 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. Local Process Output: 2023-08-03 09:19:35 -Local Process Output: 2023-08-03 09:19:35 1.1.1.1 ping statistics LocalProcess Output: 2023-08-03 09:19:35 5 packets transmitted, 0 received, 100% packet loss, time 1000ms

Then the workflow is deemed a failure as the nodes cannot be contacted as the Edge node OVF did not deploy:

19 2023-08-03T09:24:59.249+0000 DEBUG [vof_dm, f9b59c3a3529488b, 113c] [c.v.v.c.n.s.c.c.ApiConnection, dm-exec-18] Closed ApiClient connection.2023-08-03T09:24:59.249+0000 ERROR [vef_dm, f9b59c3a3529488b, 113c] [c.v.v.c.f.p.n.a.CreateNsxtEdgeNodeVmAction.dm-exec-181 Edge node nsxedgenode101 creation failed, node state is pending, VM deployment state is VM DEPLOYMENT FAILED 2023-08-03T09:24:59.249+0000 DEBUG [vcf_dm, 19b59c3a3529488b, 113c] [c.v.v.c.n.s.c.c.ApiConnection,dm-exec-18) Closed ApiClient
connection.2023-08-03T09:24:59.251+0000 ERROR (vcf dm, f9b59c3a3529488b, 113c] [c.v.e.s.o.model.error.ErrorFactory.dm-exec-18] [44F6DD] DEPLOY_NSXT_EDGE FAILED Failed to deploy NSX-T Edge nsxedgenode101 on nsxt-manager.com com.vmware.evo.addc.orchestrator.exceptions.OrchTaskException: Failed to deploy NSX-7 Edge node nsxedgenode101 on nsxt-manager.com at com.vmware.vcf.common.fmm.plugins.naxt.action.CreateNsxtEdgeNodeVmAction.execute (CreateNsxtEdgeNodeVmAction.java:438) at com.vmware.vcf.common.fsm.plugins.naxt.action.CreateNsxtEdgeNodeVeAction.execute (CreateNext. EdgeNodeVmAction.java:59) at com.vmware.evo.addc.orchestrator.platform.action.FamActionState.invoke (FomActionState.java:62) at com.vmware.evo.addc.orchestrator.platform.action. FamActionPlugin. invoke (FamActionPlugin.java:159)
Caused by: java.lang.IllegalArgumentException: Edge node nsxedgenode101 creation failed, node state is pending, VM deployment state is VM DEPLOYMENT FAILED

The SDDC then tries to delete the failed Edge Cluster object, unsuccessfully

2023-08-03T09:35:09.481+0000 DEBUG [vof_dm, f9b59c3a3529488b,2e8c] [c.v.v.c.n.s.c.c.NoxtManagerTransportNodeOperations, dm-exec-161 Error occurred while trying to get transport node nsxedgenode101 : Unable to find transport node with name nsxedgenode101
2023-08-03T09:35:09.481+0000 DEBUG (vef dm, f9b59c3a3529488b, 2e8c] [c.v.v.c.f.p.n.a.CreateNsxtEdgeNodeVeAction, de-exec-16] Getting 32 state for edge node nsxedgenode101 2023-08-03T09:35:09.401+0000 DEBUG (vcf dm, f9b59c3a3529488b,2e8c] [c.v.v.c.n.a.c.c.ApiConnection, dm-exec-16] Closed ApiClient connection. 2023-08-03T09:35:09.481+0000 DEBUG (vef_dm, 19b59c3a3529488b,2e8c] [c.v.v.c.f.p.n.h.HaxtCommonOperations, dm-exec-16] Timeout waiting for Edge node nsxedgenode101 to be deleted 34 2023-08-03T09:35:09.481+0000 DEBUG (vcf dm, f9b59c3a3529488b, 2e8c] [c.v.v.c.f.p.n.a.CreateNsxtEdgeNodeVmAction, dm-exec-16) Edge node nsxedgenode101 still exists
2023-08-03709:35:09.481+0000 ERROR (vcf dm, f9b59c3a3529408b, 2e8c] [c.v.e.a.o.model.error.ErrorFactory, dm-exec-16] [G919JV] DEPLOY NSXT EDGE_UNDO_FAILED Failed to undo NSX-T Edge nsxedgenode101 deployment on nsxt-manager.com

Cause:-

This appears to be a niche case where the NSXT target upgrade is 3.2.1.2.0 or higher and the source NSXT version is 3.1.3.7.0 66 3.2.1.2 Edge OVF defines more network than Manager (at version “node_version”: “3.1.3.7.0.19380402”,) is aware of, So when the NSXT is in this “upgrading” state, the NSXT expects the OVE used to deploy the edge node(s) to include 5 NICS. As the NSXT is still essentially at version 3.1.3.7.0, the OVE used to deploy the edge node only has a network configuration for 4 NICS.

Workaround:-

Currently there is no resolutions available for the above issues but there is a workaround for the same which is:-
Create a standard switch on ALL hosts in the target cluster and create a portgroup named ‘VM Network” (no need for any uplinks).
Try the Edge deployment again and it should complete!
Once the Edge(s) are deployed, you can delete the switches on the hosts. To do this, edit the settings of each Edge node VM and disconnect NIC.
Finally proceed to delete the standard switches.

Leave a Comment