SEE PRICING & PACKAGES

Wednesday, June 11, 2025 - 8:30am to 9:30am

Best Practices for Using AI for Structured Data Extraction

Structured data extraction, or data tagging, is one of the easiest and impactful applications of modern AI. Before the wide availability of pre-trained AI models, the process of “understanding” unstructured data either required constructing complex heuristic logic or investing in a machine learning team who could train models in-house. Now, cheap and powerful tagging machines are an API call away, redefining what is possible for how we can understand our data. In this talk, I'll share how we’ve used AI at Yahoo News to improve our content understanding pipelines. Yahoo was a pioneer in applying machine learning to large-scale content understanding problems, giving us a fascinating case study to compare systems built with traditional ML approaches to solutions built on modern AI techniques. (Preview: LLM-only solutions are not always the best tool for the job). I’ll share our successes, failures, how the hell to evaluate these things and what we’ve learned along the way.