Talk
Modern ML systems rarely use one data format end to end. The same sample may start as an object containing various forms of data including text, images, video, audio and other bespoke modalities. However, this form must be serialized for LLM input; and tensors properly organized for training and inference efficiency.
This talk walks through that evolution step by step. Rather than treating preprocessing as a black box, it shows how different representations serve different purposes: validation and authoring at the boundary, normalization and rendering in the middle, and efficient batching and sequence construction as we approach the model. Along the way, the talk highlights practical design questions: where to enforce which invariants, how tradeoffs change across the sample's lifecycle, how to align temporal and visual information, and avoid losing meaning as data becomes more model-friendly.
The broader goal is to share a pattern for building multimodal training data pipelines that are ergonomic for humans and efficient for machines.
About the Speaker
Philipp Guevorguian is a machine learning researcher based in Yerevan, Armenia. He is a Member of Technical Staff at Perceptron AI, where he develops state-of-the-art machine learning systems for understanding the physical world.
Previously, he worked at YerevaNN, focusing on deep learning applications in scientific domains, particularly drug design and biomolecular discovery. His work included training specialized language models to accelerate complex chemical analysis.
Earlier in his career, he contributed to statistical modeling for cancer mortality prognostication, conducted meta-analyses of gene expression data, and developed on-device algorithms that convert wearable accelerometer signals into actionable health insights.