AppDynamics Infrastructure Visibility: Solving Hardware Problems Quickly
In this article
Infrastructure Visibility is the module within AppDynamics that provides insights into machine and infrastructure-level metrics like disk I/O, throughput, CPU utilization and memory usage with the use of a machine agent. Machine agents are installed on the servers or virtual machines (VMs) and runs with the app server agent to allow AppDynamics to monitor hardware and operating system (OS) performance. This article will guide you through the process of identifying and isolating an on-premise hardware-related issue within AppDynamics using Infrastructure Visibility.
AppDynamics is an Application Performance Management (APM) platform that provides real-time monitoring of applications to detect anomalies, monitor application environment performance and collect and analyze metrics. AppDynamics helps determine the root cause of application issues by looking at application, network, server and machine metrics that measure infrastructure utilization.
How can we navigate Infrastructure Visibility and discover a hardware-related issue?
When conducting a root cause analysis in an on-premises hosted application, you'll need to consider is the hardware running your application. Without the preventive monitoring that AppDynamics Infrastructure Visibility provides, finding the root cause can be time consuming. AppDynamics Infrastructure Visibility gives you the necessary tools to identify, isolate and solve problems quickly.
Scenario
AD-Air-Travel is our airline's primary application. Our customers use it to search for and reserve flights, choose seats and get flight and airport details. Recently we installed the AppDynamics platform to manage the application environment and have used it effectively to address many unknown issues and streamline the development process. Regardless of this success, customers are complaining that the application is acting erratically or having longer than normal load times. Upon initial inspection, the application appears to be working under normal operating parameters, however, the problems still exist. Thankfully, AppDynamics also has an Infrastructure Visibility module that can help us with this problem.
Start with dashboards
We'll start at our home screen after logging into our AppDynamics controller.
Notice the Applications section. For this scenario, you will only see the AD-Air-Travel application. Here, AppDynamics tells us that this application is in a normal state. This is misleading because we are having a problem but we're not sure what is causing it. The application is working correctly but because users are still experiencing erratic errors and long load times, we have not yet identified the root cause.
Let's take a look at the same home dashboard from our AppDynamics controller and find the Servers section. This shows us a partial list of servers in our application environment. We can see that a server with the name AD-Air-Travel-MA, which belongs to our AD-Air-Travel application, displays a warning sign from AppDynamics.
Here, we can complete either of the following tasks:
- Click the server in the list to view its dashboard for further details.
- Take a look into the application first to double-check that there aren't any other applications issues first. Let's do this one.
When we select the AD-Air-Travel application to display its dashboard, we immediately see that everything appears normal because all our nodes are looking green.
We're not going to call this good because we see that the application dashboard looks normal. Because we observed an issue with an application server on the home screen, we should drill down into that server to further investigate the situation. We can find this in the Transaction Scorecard section to the right of the application dashboard.
Notice how we're getting a warning on a server for this application. This is similar to the warning we received in the server list on the AppDynamics home screen. Let's investigate further by clicking on the Server Health link.
Our next stop: Servers
We're now at the full server list used by this application. In this case there is one server, but there are often many.
In this view, we are given at-a-glance details of the server(s) that can help us in quickly determine if there is an issue we need to address and investigate further.
- The operating system is represented by a symbol. In this case, a flavor of Linux is displayed.
- Name of the server that has the machine agent installed on it. Often, this is the same server with the App Server agent installed on it.
- Server Health. In this instance, we have a warning symbol within AppDynamics that lets us know there is something we need to investigate. We've seen this warning before on the home servers list along with our Transaction Scorecard section.
- With AppDynamics Infrastructure Visibility, a default health rule is created that triggers a warning when utilization of memory, CPU or storage is above 80 percent.
- The disk usage percentage of the hard disk drive or virtual hard disk drive that the server is using.
- This is currently over 80 percent, which is why the server health in #3 is showing a warning.
- CPU usage percentage that the server is using.
- Memory usage percentage that the server is using.
- Disk input/output (I/O) percentage usage the server is using.
- Sometimes during the capture of the details in this window, we don't always see the information we expect in this view.
- Network bandwidth percentage usage the server is using.
- Server Visibility Enabled is stated as "Yes" which requires a separate license and a configuration change on the controller-info.xml file in order to enable this feature.
We can either double-click the server we want to view more details on or just click the server and then click the view details eye in the menu above the list.
We now see the server dashboard with details on the server we chose from the prior list. There is a ton of metrics present that can help us make decisions on this server including load average, CPU usage, incoming and outgoing network traffic and the top 10 processes consuming CPU and memory.
Turn up the volume
As part of our server dashboard, we get a host of metrics on the server that can be important to us including load average, top 20 process classes consuming CPU and memory and network incoming and outgoing bandwidth. Since we already know that there is a warning and it is related to disk usage (percentage), we can just click the volumes menu link to get some more information.
Right away we see that there is an issue with two partitions on this hard drive.
- We see here that the root volume is 83 percent full, but that's not exactly what is causing the problem. AppDynamics give us more information so we can preemptively address potential issues before they become problems.
- This metric provides us the full size of the server's only drive. If there were multiple drives, we would see those as well.
- We see here that the /snap/core/9665 partition as well as the /snap/core/9804 partition are 100 percent full. This is definitely our problem: we need more storage!
- The space of both drives. We've set this low on purpose to force this particular problem but in a real-world scenario, the drive would be much larger while still causing the same issues to the customer base if the drive was full.
We have identified the root cause. We need to expand the storage of both of those partitions in order to clear up the erratic issues and longer than normal load times our customers were experiencing.
See how quickly we discovered that there was an issue? It's that simple with AppDynamics and Infrastructure Visibility.
What's next?
Since you now know how easy it is to solve a hardware problem quickly using AppDynamics and Infrastructure Visibility, let's dive deeper into Infrastructure Visibility with these other topics:
- Building health rules and polices to notify us on hardware-related issues.
- Building custom extensions with the machine agent.
- Network visibility