AI Con USA 2025 - Data Engineer
Customize your AI Con USA 2025 experience with sessions covering data engineering.
Monday, June 9
Beginning Data Analysis and Machine Learning with Jupyter Notebooks
In this beginner-friendly workshop you'll see how you can get started with data analytics and data science using Jupyter Notebooks. Matt will start with the basics of notebooks and then move on to using Python, Pandas, and NumPy to perform basic exploratory data analysis. See how you can use Plotly Express to create interactive charts and visuals with only a minimal amount of code. Once you've grasped the basics of understanding and visualizing the data Matt will move on to machine learning with SciKit-Learn as you train and evaluate predictive regression and classification models. The...
Get Your Data Ready for AI/ML
Understanding the readiness of your source data before you launch an expensive AI/ML project lets you take corrective data engineering measures that will streamline the project and give you the best probability of a successful outcome. Artificial Intelligence (AI) and Machine Learning (ML) projects can provide significant returns on investment when they are applied to narrow but difficult business problems and are supported by adequate amounts of relevant, quality data. Many such projects start with high hopes but get derailed due to fundamental problems with source data, which were...
Introduction to RAG Applications: Building Conversational AI for Domain-specific Search
NewThis beginner-friendly workshop introduces participants to the fundamentals of Retrieval-Augmented Generation (RAG) applications. Using a pre-configured Docker environment featuring Python, Elasticsearch for vector storage, and OpenAI as the LLM, attendees will learn how to build a RAG-powered conversational portal. Throughout the session, participants will create a RAG application to consume and query a sample dataset of Washington State regulation documents. Replace these sample documents with your own PDF files, and you’ll interact with your data in no time! By the end, attendees will...
Tuesday, June 10
Clean Your Filthy RAGS! Optimizing, Accelerating, and Evaluating RAG Applications
NewRetrieval-Augmented Generation (RAG) applications are becoming essential for companies, combining AI with real-time data retrieval to enhance customer experiences. While Large Language Models (LLMs) handle general conversation well, they struggle with domain-specific, up-to-date information, often producing inaccurate or unhelpful responses. This workshop will empower participants with the necessary skills to optimize RAG applications using existing best practices. Justin will walk through integrating RAGAS, a framework designed to evaluate, monitor, and fine-tune the performance of RAG...
Common Software, Data, and AI Architectures and the Ways They Fail
NewThis tutorial will examine the complexity that lives in software systems, data ingestion workflows, MLOps pipelines, and artificial intelligence systems. This session blends together cloud architecture, quality assurance, risk management, and security mindsets as Matt Eland explores how modern systems are structured, the problems their complexity helps us solve, and the ways these systems can break - or be broken. The session will alternate between interactive lectures with practical illustrations and group exercises around case studies as you explore how existing systems can fail and what...
AI Deep Dive: Exploring AWS Using Real-World Scenarios
Deepen your AI and machine learning expertise using AWS in an Immersive, hands-on workshop. You’ll use real-world AI challenges while leveraging AWS services like Amazon SageMaker, Bedrock, and Lambda to build and optimize AI-driven solutions. As the session unfolds, new constraints and data anomalies will emerge, mirroring the complexities of real-world AI/ML implementation. Gain insight into how AI solutions perform under evolving conditions, learning to adapt, optimize, and troubleshoot unexpected challenges. Learn the importance of collaboration, strategic thinking, problem-solving,...
Wednesday, June 11
Best Practices for Using AI for Structured Data Extraction
Structured data extraction, or data tagging, is one of the easiest and impactful applications of modern AI. Before the wide availability of pre-trained AI models, the process of “understanding” unstructured data either required constructing complex heuristic logic or investing in a machine learning team who could train models in-house. Now, cheap and powerful tagging machines are an API call away, redefining what is possible for how we can understand our data. In this talk, I'll share how we’ve used AI at Yahoo News to improve our content understanding pipelines. Yahoo was a pioneer in...
The Impact of AI on Developer Productivity
Generative AI tools hold promise to increase human productivity. In the world of software development, GitHub Copilot was one of the first practical applications of the use of generative AI to support developer productivity. However, measuring software productivity is non-trivial. For example, developer productivity gains is more than just producing code faster. If these artifacts don’t meet quality standards or themselves bring cost efficiency challenges then there may not be much overall improvement. To truly understand the benefits and challenges of AI-powered copilots requires real-...
Building Reliable AI Agent Flows
AI agents are revolutionizing how we interact with software, but their reliability remains a critical challenge. In this talk, Jason Arbon explores the key principles and techniques for designing AI agent flows that are predictable, testable, and robust. Attendees will learn how to structure AI-driven interactions to minimize failure points, leverage validation mechanisms, and integrate automated testing strategies to ensure smooth execution. Drawing from real-world applications and lessons learned in AI-driven software testing, this session will provide practical insights for engineers,...
Testing Powered by AI/ML Synthetic Test Data: A Game Changer
The session covers a testing approach that utilizes a new age Test Data Management (TDM) technique which learns from production data. Machine learning analyzes the data to create a model capable of generating synthetic data with identical source data attributes. Multiple learning iterations refine the model, enhancing its accuracy with each cycle. The data model incorporates security measures like differential privacy, enabling safe movement to lower environments. Later, generative AI leverages this model to produce desired volumes of test data for various testing types, including...
Navigating AI Governance: Building Trust in a Regulated Future
As artificial intelligence systems increasingly influence critical decisions across industries, ensuring compliance with evolving governance and regulatory standards is both a challenge and a necessity. This presentation will explore the complexities of AI governance, focusing on balancing innovation with compliance in a rapidly changing regulatory environment. Vijay Panwar will examine real-world challenges such as bias mitigation, data privacy adherence, and ethical transparency, providing actionable strategies to design AI systems that comply with global standards like GDPR and emerging...
Five Ways to Operationalize AI at Scale
Enterprises often struggle with how to incorporate AI and machine learning in a repeatable, sustainable manner. In today’s competitive landscape, AI is no longer just a trend but a necessity. Generative AI has quickly become an essential tool for businesses—and with companies expected to explain its usage and justify any lack thereof—organizations are looking for ways to leverage AI at scale. This keynote provides a strategic roadmap to unlock the full potential of AI within organizations, focusing on cultural readiness, enablement, ethical considerations, and addressing biases. Attendees...
AI for Real People—Panel
We have heard a LOT of hype about AI and AGI and ASI and all of the nonsense. But in year 3 of Generative AI, people are actually moving beyond talking and finding real value in AI. Come and hear from REAL AI implementers about how AI is having an impact on businesses across the board including small businesses, non-profits, big companies, and more.
Thursday, June 12
AWS Public Health Modernization: Leveraging GenAI for Government Innovation
Join Venkata Kampana, Senior Solutions Architect from the AWS Health and Human Services team, and Tim Collinson, the CTO of 11:59, an AWS consulting partner, for an insightful discussion on transforming public health systems across federal, state, and local governments. This session will showcase real-world implementations of GenAI and AWS technologies that are revolutionizing public health operations. They will demonstrate innovative solutions, including their IDP implementation utilizing Bedrock's Data Automation (BDA) feature with confidence scoring and bounding box capabilities,...
AI at Scale: Balancing Innovation, Governance, and Risk in Large Organizations
As AI reshapes industries, large organizations must scale innovation while upholding governance, security, and ethical responsibility. Deploying AI at scale isn’t just a technical challenge—it’s a strategic balancing act between agility and compliance, risk and reward. Andreas Bohman, CIO at the University of Washington, will discuss strategies to drive AI-powered innovation without compromising regulatory obligations, operational effectiveness, or public trust. He will share governance strategies that enable innovation rather than restrict it. He’ll also talk about addressing critical...
RAG Has Evolved - Enhance Your RAG Pipeline with These Concepts
The majority of businesses today can set up a fundamental RAG pipeline that effectively handles most use cases. However, this basic setup eventually reaches its limitations in terms of functionality and accuracy, hindering further advancements. Matt Payne aims to detail the necessary pipeline components for building advanced RAG pipelines. For each component, he will explain the what, when, why, and how and provide real-world examples. Key areas of focus include leveraging tools and function calling, which enables you to create a systematic approach to using knowledge from multiple sources...