Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Towards effective performance diagnosis for distributed applications

Abstract

Cloud computing provides elastic and on-demand resources for distributed ap-plications to deliver high-quality services. However, the dynamism of underlying cloud infrastructures and complex dependencies between services introduce abnormal performance phenomena, e.g., degradation, which severely affect the quality of services and the user experience. To make services in applications continuously operational, performance diagnosis systems aiming to detect performance anomalies, such as slow response times, and localize their root causes are required. Such kinds of systems have been studied in recent years. A typical performance diagnosis system comprises components for collecting and pre-processing monitoring data, detecting performance anomalies, and localizing root causes. The data collection and pre-processing components reduce noise in the monitoring data and make it available for the performance anomaly detection component to diagnose the system, e.g., using statistical or machine learning methods. To be effective, anomaly detection has to be accurate, robust in fitting different data distributions in real scenarios, and predictive to prevent potential application violations. The root cause localization component aims to accurately identify the underlying causes of performance anomalies, such as resource-related metrics in faulty services. However, a large number of anomalous metrics and complex anomaly propagation paths make it challenging to determine the root cause. To tackle above challenges, we first review the state-of-the-art research and methods for creating a reliable performance diagnosis system from a technical perspective. Furthermore, we propose a comprehensive performance diagnosis system that can effectively detect performance anomalies and localize their root causes to provide actionable insights to operators

Similar works

Full text

thumbnail-image

International Migration, Integration and Social Cohesion online publications

redirect
Last time updated on 26/10/2023

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.