PyData & PyCon Yerevan 2026

Talk

Building Multimodal Similarity Search with Gemini

Track: Software Engineering Duration: 25 minutes View on Schedule

Vector Search Multimodal AI Python Embeddings Similarity Search

Traditional search engines rely heavily on exact keyword matching, often failing to capture the true semantic intent behind user queries. To build next-generation applications, developers must bridge the gap between text, images, and other unstructured media.

In this practical, code-driven session, we will demystify Multimodal Embeddings-the core technology that translates different data types into a shared vector space where semantic relationships can be calculated. We will explore how Google's Gemini Multimodal API can be leveraged to represent both visual and textual information in a way that computer algorithms can instantly compare.

Step-by-step, we will build a fully functional similarity search using Python. We will cover:
The Foundation: How multimodal embeddings represent diverse data formats in a unified vector space.
The Pipeline: Connecting to the Gemini API, generating embeddings, and storing them for rapid retrieval.
The Search: Executing complex, multi-faceted queries (such as matching an image with a descriptive text modifier).

Whether you are an AI enthusiast, a backend developer, or a data engineer, you will walk away with a solid understanding of vector embeddings, practical Python code templates, and the architectural knowledge to deploy multimodal search in your own projects.

About the Speaker

Biography coming soon.

Recording

Video will be available after the conference.

Marton Kodok

Talk

About the Speaker

Recording