PyData & PyCon Yerevan 2026

Talk

From Voice to Data: Designing Real-Time Pipelines on Top of Scraped Sources

Track: Data Science Duration: 25 minutes View on Schedule

LLMs Python Data Science Data Pipelines

This talk presents a practical approach to building real-time data systems on top of scraped sources, using a voice-driven interface as the motivating example.
voice interfaces powered by modern AI systems are becoming more common-but they typically rely on structured or pre-indexed data. This creates a gap between how users want to interact (natural language, real-time) and how data is actually available (unstructured, fragile, slow).

This session explores how to bridge that gap.

We will walk through an end-to-end system built in Python that:

Converts voice input into structured queries
Dynamically retrieves data via scraping pipelines (MCP-style abstraction)
Processes and validates incomplete or inconsistent data
Returns responses via text-to-speech under real-time constraints

About the Speaker

I’m a Python developer working on data extraction systems and real-time AI applications. My recent work focuses on building scraping pipelines that turn unstructured web data into usable, structured information.

I use LLMs to improve data extraction and interpretation, designing agent-like systems that can handle ambiguous inputs and adapt to changing data sources. This includes building workflows where language models assist in parsing, validating, and structuring scraped data.

In parallel, I’ve been developing voice-driven interfaces using tools like LiveKit, connecting speech-to-text, LLM-based processing, and text-to-speech into interactive assistants.

I’m increasingly interested in combining AI with traditional backend systems-using it not as a replacement, but as a layer that makes existing systems more flexible, adaptive, and capable of handling real-world complexity.

Recording

Video will be available after the conference.

Arina Kostanyan

Talk

About the Speaker

Recording