Operational Defect Database

BugZero updated this defect 34 days ago.

VMware | 91932

NSX Application Platform (NAPP) disk usage increases with generated alarm 'Analytics and Data Storage disk usage is growing faster than expected'

Last update date:

4/16/2024

Affected products:

NSX-T

Affected releases:

4.x

Fixed releases:

No fixed releases provided.

Description:

Symptoms

You are running NAPP.The NSX-T UI presents and alarm: Analytics and Data Storage disk usage is growing faster than expected If NSX Application Platform estimates that the disks won't be able to store flows for 30 days, an alarm will be raised in the NSX-T UI: Analytics and Data Storage is expected to be full in {predicted_full_period} days, which is lower than the than current data retention period of {current_retention_period} days.

Cause

NAPP assumes that traffic flows being analyzed, will have some patterns and therefore a certain degree of aggregation. When there is too much uniqueness in the flows, for example in the case of too many unique IPs or ports, it does not efficiently compact the data. This results in the disk usage growing faster than expected.When NAPP calculates the disk is unable to store flows for 30 days, the alarms is raised.

Resolution

This is a known issue impacting NAPP.

Workaround

There are a couple of options which can be used to help alleviate the issue.Option 1: Configure Data Collection in NSX Intelligence If you can identify the ESXi hosts and vSphere clusters with mostly East-West (EW) traffic, for example over 90% of traffic is EW and 10% is North-South (NS), you can enable data collection for those EW first and gradually enable for NS. North-south traffic tends to have more unique IPs, which is more likely to adversely affect the data compaction. This will help alleviate the high storage growth, while other tuning options are explored below. Procedure: By default, NSX Intelligence collects network traffic data on all standalone hosts and clusters of hosts. If necessary, you can optionally stop data collection from a standalone host or cluster of hosts. From your browser, log in with Enterprise Administrator privileges to an NSX Manager at https://<nsx-manager-ip-address>.In the NSX Manager UI, select System and in the Settings section, select NSX Intelligence.To manage traffic data collection for one or more hosts, perform one of the following steps. The system updates the Collection Status value for each affected host to Deactivated or Activated, depending on the data collection mode you had set. To stop traffic data collection, select the host or hosts in the Standalone Host section, click Deactivate, and click Confirm when prompted if you are sure.To start traffic data collection, select the host or hosts, click Activate, and click Confirm when prompted if you are sure. To manage traffic data collection for one or more clusters of hosts, perform one of the following steps. To stop data collection for one or more clusters, select the cluster or clusters in the Cluster section, click Deactivate, and click Confirm when prompted if you are sure.To start traffic data collection, select the cluster or clusters, click Activate, and click Confirm when prompted if you are sure. For reference please review the following guide: https://docs.vmware.com/en/VMware-NSX-Intelligence/4.0/install-upgrade/GUID-095E8F9A-C385-4F19-BD57-018DF1690BE2.html Option 2: Filter out broadcast and/or multicast flows.Note: This option can be used where broadcast and/or multicast flows are not required for security policy or similar guidance. If broadcast and/or multicast flows are important to you, do no enable this option. You can disable broadcast and/or multicast flows from getting stored in NSX Intelligence to reduce disk usage. This will only affect new flows which are not yet processed by NSX Intelligence. Existing broadcast/multicast flows will still be visible, until the retention period (30 days) is reached. The following steps can be used together or by themselves. 1. Disable broadcast and multicast flows at hosts Login as "root" on the NSX Manager and run the following commands. Enter the password for NSX Manager when prompted. curl -X PATCH 'https://<nsx-manager-ip-address>/policy/api/v1/infra/sites/default/intelligence/transport-node-profile' -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{"flow_exclusion_filter": [{"type": "BCAST"},{"type": "MCAST"}]}' -k -uadmin 2. Disable broadcast and multicast flows on NSX Intelligence Login as "root" on the NSX Manager and perform the following steps. Obtain the configurations for raw flow processing from secret rawflow-override-properties and save to a file called props: napp-k get secret rawflow-override-properties -o jsonpath='{.data.appliance\-override\.properties}’ | base64 -d > props Note: The above command is reading the property, converting it from base64 and saving the result in the file for later use.Now we edit the and change the values from false to true:Before: flowFilter.excludeMulticast=false flowFilter.excludeBroadcast=false After: flowFilter.excludeMulticast=true flowFilter.excludeBroadcast=true Then convert the file contents bask to base64 value: cat props | base64 -w 0 Use the resulting base64 string from above to replace the original appliance-override.properties in secret rawflow-override-properties. This command will open a vim editor, which you can use to edit the content and save. napp-k edit secret rawflow-override-properties Finally, restart the rawflow-driver: napp-k delete pod spark-app-rawflow-driver Option 3: Scale out Analytics and Data Storage services Traffic flows are stored in both Analytics and Data Storage services. Analytics requires a minimum of four nodes to scale out, Data Storage requires a minimum of eight nodes to scale out. Since Analytics requires a lower node count, you may start with scaling out Analytics first. If the alarm doesn't get resolved, scale out both Analytics and Data Storage after you have 8 nodes.Prerequisites: All existing nodes in your Tanzu Kubernetes Cluster (TKC) or upstream Kubernetes cluster must be in a healthy and ready state before you can scale out the NSX Application Platform.Before proceeding with the scale-out procedures, ensure that your infrastructure administrator has already allocated the minimum number of nodes required for scaling out the NSX Application Platform services. Procedure From your browser, log in with Enterprise Admin privileges to an NSX Manager at https://<nsx-manager-ip-address>.Navigate to System - NSX Application Platform.In the bottom-left corner of the NSX Application Platform section of the UI page, click Actions and select Scale Out from the drop-down menu. Note: The Scale Out action is only supported if you deployed the NSX Application Platform using the Advanced form factor. The action is not supported for Standard form factor deployment. If all of the services are scaled out already, the Scale Out button is disabled on the drop-down menu. In this case, it indicates that your cluster nodes have reached the maximum number of nodes allocated. Initially, the advanced form factor is deployed with three nodes. You must first request for your infrastructure administrator to add five more nodes to your current cluster before you can continue with the next steps. To scale out all of the services, you must have a total of eight worker nodes in your cluster. Select the All checkbox.In the Advanced Options section, ensure that all of the services available for the scale-out action are selected. Unless specifically advised by the VMware support team, ensure that all of the core services are selected so that the system can decide which of the core services must be scaled out. Scaling out one core service arbitrarily can lead to more resources being used without any improvement to the system performance. Before proceeding with single-category service scale out procedure, consult the VMware support team or confirm that you know clearly what can happen if you scale out a single-category service. Click Scale Out. The UI displays the progress of the scale out operation. For reference please review the following guide: https://docs.vmware.com/en/VMware-NSX/4.1/nsx-application-platform/GUID-8CC7E83F-C59F-4B61-9F4D-F0151ACACD96.html Option 4: Enable External IP aggregationIf you have large volume of north-south traffic, but you don't need the details of individual external (public) IPs, you can reduce the amount of data sent to NSX Intelligence by performing External IP aggregation at the host. This will aggregate all external IP addresses to one value: 255.255.255.255. Note: The external (public) IP addresses that get affected are those outside the private IP ranges. Please refer to the section below to Optimize configuration of Private IP Ranges.ATTENTION: This will affect how new external flows are stored and used in NSX Intelligence. In Discover & Take Action, in compute view, when you right click on Public and select IP Addresses, you will not see the individual IP addresses of the new external flows.In Discover & Take Action, in group view or compute view, when you right click on Public or an entity connected to Public, and select Flow Details, or when you click on a connection connected to Public, you will not see individual IP addresses of the new external flows.Recommendation will not use the individual IP addresses of the new external flows. ProcedureLogin as root via ssh on the NSX Manager and run the following command and enter the admin password for NSX Manager when prompted. curl --location --request PATCH 'https://<nsx-manager-ip-address>/api/v1/intelligence/host-config' -H 'Content-Type: application/json' -d data-raw '{"enable_external_ip_aggregation": true}' -k -u admin Optimize configuration of Private IP Ranges If you know the private IP ranges used by east-west traffic in your network, it is recommended to set them as granular as possible. It is not recommended to create unnecessarily large IP ranges.To maximum the benefit, use this in conjunction with option 4: Enable External IP aggregation. You can manage the private IP Ranges using the Private IP Ranges tab in Security - General Security Settings user interface. These private IP ranges are applicable for use by the NSX Intelligence and the NSX Network Detection and Response features when you activate either feature. To enter an IPv4 IP range, click inside the IPv4 IP Range text box and enter the values, using IPv4 IP CIDR notation format shown below the box. Press Enter for each entry, and click Save when finished. To enter an IPv6 IP range, click inside the IPv6 IP Range text box and enter the values, using the IPv6 CIDR notation format shown below the box. Press Enter for each entry and click Save when finished. The NSX Intelligence feature categorizes an IP address belonging to one of the CIDR notations listed in the dialog box as a private IP address. Any IP address that does not belong to any of these CIDR notations is classified as a public IP address. If the IP address of your VM or physical server does not fall into one of these CIDR notations, consider adding your CIDR notation using this Private IP Ranges UI. For reference please review the following guide: https://docs.vmware.com/en/VMware-NSX-Intelligence/4.0/user-guide/GUID-7E59E413-42D9-4738-98ED-A02D3F4CD993.htmlIf you are still experiencing issues after the above options, please open a support request with VMware NSX-T GSS and reference this KB.

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Learn More

Search:

...