SEE PRICING & PACKAGES

Wednesday, June 11, 2025 - 8:30am to 9:30am

Best Practices for Using AI for Structured Data Extraction

Structured data extraction, or data tagging, is one of the easiest and impactful applications of modern AI. Before the wide availability of pre-trained AI models, the process of “understanding” unstructured data either required constructing complex heuristic logic or investing in a machine learning team who could train models in-house. Now, cheap and powerful tagging machines are an API call away, redefining what is possible for how we can understand our data. In this talk, I'll share how we’ve used AI at Yahoo News to improve our content understanding pipelines. Yahoo was a pioneer in applying machine learning to large-scale content understanding problems, giving us a fascinating case study to compare systems built with traditional ML approaches to solutions built on modern AI techniques. (Preview: LLM-only solutions are not always the best tool for the job). I’ll share our successes, failures, how the hell to evaluate these things and what we’ve learned along the way.

erica_greene_headshot
Yahoo News

Erica Greene is a Director of Engineering at Yahoo where she works on the personalized recommendation systems that power Yahoo News. Erica has worked in the tech industry for 15 years at companies including Etsy, The New York Times, and Google Jigsaw. She writes a weekly-ish newsletter about AI and the media called Machines on Paper.