Iaroslav

Iaroslav

Senior Software Engineer

Quantori

Yerevan, Armenia

Talk

GEO Survival Analysis Web Tool
Track: Software Engineering Duration: 50 minutes View on Schedule
LLMs Vector Search Pandas Python Statistics

GEO Survival Analysis is a full-stack bioinformatics web application that automates cross-cohort biomarker discovery from the NCBI Gene Expression Omnibus (GEO). It enables researchers to identify genes significantly associated with patient survival outcomes - without writing code or manually curating datasets.

Users submit a natural-language query such as "triple-negative breast cancer overall survival." An LLM-powered ranking engine searches GEO for relevant datasets, scores them by suitability (sample size, clinical annotation quality, platform type), and selects the top candidates. The system then downloads expression matrices, detects data formats with AI-assisted heuristics, maps probe IDs to gene symbols using cached platform annotation files, extracts survival metadata from clinical tables, and runs Cox proportional hazards regression on every expressed gene across every qualifying dataset - fully automatically.

The core value proposition is cross-dataset meta-analysis. Existing tools like KMplot, GEPIA2, and OncoLnc are restricted to curated TCGA/GTEx cohorts and analyze one study at a time. GEO Survival Analysis searches the entire GEO archive (thousands of studies) and ranks genes by how consistently they reach significance across independent cohorts. A gene flagged in eight independent datasets carries far more evidential weight than one from a single large study.

Results are presented through interactive visualizations: Kaplan-Meier survival curves with confidence intervals, forest plots with pooled hazard ratios and I² heterogeneity statistics, and a volcano plot mapping effect size against significance. Users can filter to COSMIC cancer driver genes, supply custom gene lists (up to 500 symbols), export CSV tables and PNG plots, generate publication-ready ZIP packages with a pre-written Methods section, and share results via permanent links. An integrated AI chat assistant provides biological interpretation and follow-up analysis.

The technology stack comprises FastAPI (Python 3.13+) with lifelines, pandas, and scipy on the backend; React 18 with TypeScript, Redux Toolkit, and Recharts on the frontend; PostgreSQL with pgvector for persistence and RAG-based chat; and LLM integration via LangChain with Mistral and Anthropic Claude models. The application is deployed as a single Docker container on Railway with persistent volumes for dataset caches and platform mappings.

About the Speaker

Short CV
Results-driven Senior Software Engineer with 10+ years of experience, specializing in Python, data engineering, and scientific
visualization. Proven track record at globally recognized companies - including Siemens and outstaff for several US Top-10
pharmaceutical corporations - delivering full-stack analytical platforms and ML-powered pipelines. Published researcher and
named inventor on a US patent. Confident in English (C1).
Motivation
Here I would like to present my pet project connected with bioinformatics. I would like to tell about the purpose of the project, the difficulties that I have, the AI tools that I use and how I want to promote my project with the usage of AI tools

Recording

Video will be available after the conference.