Engineering AI Infrastructure for Efficient Inference at Scale
As AI models grow in complexity and scale, inference efficiency has emerged as a critical engineering challenge for enterprise deployment. Traditional infrastructure built for training workloads often fails to meet the latency, throughput, and cost demands of large-scale inference operations. In this session, Sandeep will be sharing practical insights from engineering AI infrastructure at Broadcom, focusing on the end-to-end optimization of compute, networking, and storage subsystems. The talk explores techniques such as dynamic workload placement, adaptive batching, model quantization, and software-level optimization for maximizing inference performance. Attendees will gain an understanding of how to architect resilient, scalable inference platforms that integrate with enterprise systems while reducing operational costs. The session concludes with key lessons on balancing innovation and efficiency—equipping teams to design infrastructure ready for the next generation of AI workloads.
Sandeep Kaipu is an Engineering Manager at Broadcom with 20 years of experience in software engineering leadership across AI infrastructure, cloud platforms, and enterprise systems. At Broadcom, he leads teams focused on building scalable AI and cloud infrastructure for enterprise deployments. Previously at VMware, he directed multiple R&D initiatives within VMware Cloud Foundation and Workspace ONE platforms. His career spans key engineering roles at Nokia, Samsung, and IBM, contributing to large-scale distributed systems and enterprise-grade software solutions. Sandeep holds a Master’s in Software Systems from BITS Pilani. He the author of the book AI Engineering Leadership and is a recognized speaker on AI infrastructure and engineering leadership topics.