Using GenAI to Advance Azure AIOps
In today’s rapidly evolving digital landscape, maintaining high service quality is crucial. To achieve this, Microsoft is leveraging cutting-edge technologies such as Generative AI and Large Language Models (LLMs) to enhance service reliability and developer productivity. LLMs excel at understanding and reasoning over large volumes of data. They can generalize across diverse tasks and domains, making them invaluable for generating models, insights, and automating intricate tasks. By integrating LLMs and Generative AI into the complex domains of incident response and root cause analysis, teams can focus on efficiency by reducing human toil. This presentation will cover two AI systems developed to improve incident response: 1. Auto-triaging incoming incidents and routing them to the appropriate team. 2. Enabling auto-comparison and summarization of unhealthy logs compared to healthy logs. In this session, Sarvani will share her team's learnings from designing and implementing these solutions at scale.
Sarvani Sathish Kumar is a Principal TPM Manager at Microsoft, with over 20 years of experience in the industry. She is a leader in the AI Operations (AIOps) platform of Azure Core to detect, diagnose, mitigate, repair, prevent and predict VM availability issues at Azure fleet scale. She also leads the GenAI/LLM investments to improve developer productivity and service reliability. She is an experienced speaker having presented at multiple conferences in the US and Europe.