Using GenAI to Advance Azure AIOps
In today’s rapidly evolving digital landscape, maintaining high service quality is important. To maintain this quality and reliability, Microsoft is leveraging cutting edge technologies such as Generative AI and Large Language Models (LLM) to improve their service reliability and developer productivity. LLMs excel at understanding and reasoning over large volumes of data. Further they are also able to generalize across a diverse set of tasks and domains, making it an invaluable tool for generating models and insights and automating complex tasks. LLMs have a remarkable ability to analyze large volumes of data, identifying patterns and anomalies that might otherwise go unnoticed. By integrating LLMs and GenAI, they can automate more complex tasks, such as incident response and root cause analysis allowing the team to focus on efficiency. This presentation is about two AI systems developed that improve incident response by 1) auto-triaging the incoming incidents and routing them to the proper team and 2) enabling auto comparison and summarization of unhealthy logs as compared to healthy logs. Through this presentation Sarvani will share her team's learnings from designing and implementing these solutions for scale.
Sarvani Sathish Kumar is a Principal TPM Manager at Microsoft, with over 20 years of experience in the industry. She is a leader in the AI Operations (AIOps) platform of Azure Core to detect, diagnose, mitigate, repair, prevent and predict VM availability issues at Azure fleet scale. She also leads the GenAI/LLM investments to improve developer productivity and service reliability. She is an experienced speaker having presented at multiple conferences in the US and Europe.