PyData & PyCon Yerevan 2026

Talk

From SFT to GRPO: Fine-Tuning Open LLMs for Reasoning on Real GPU Budgets

Track: Data Science Duration: 25 minutes View on Schedule

LLMs Reinforcement Learning GPU Computing Open Source Python

Fine-tuning talks often stop at “use LoRA” or “use RL,” but practitioners still need an answer to a more useful question: what post-training recipe should I choose for my data, my task, and my GPU budget? This session gives a practical answer by comparing supervised fine-tuning (SFT), preference optimization (DPO, with a brief ORPO note), and reasoning-oriented reinforcement learning (GRPO) for open-weight LLMs.

A reasoning-focused example is used to make the trade-offs explicit: SFT from demonstrations, DPO from preference pairs, and GRPO from verifiable rewards. The talk introduces the core ideas and a few lightweight formulas, then moves into hands-on Python implementations.

A major focus is computational cost. Rather than speaking abstractly about “small” and “large” models, the session maps model sizes to practical hardware tiers, explaining what is realistic for SFT, DPO, and GRPO in each setting. A reproducible GitHub repository with scripts, configurations, evaluation notebooks, and a cost sheet will be provided.

Attendees should be comfortable with Python and have basic familiarity with transformers and PyTorch, but prior RLHF experience is not required. They will leave with a clear framework for selecting methods, understanding feasibility, and starting post-training in a practical and resource-aware way.

About the Speaker

Aram Butavyan is a lecturer at the Akian College of Science and Engineering (ACSE) at the American University of Armenia and a researcher at the Engineering Research Center, where he focuses on the intersection of academic theory and practical industrial application. Specializing in machine learning and deep learning, his research addresses complex challenges within natural language processing, computer vision, and data-driven decision-making.
With a strong technical foundation in informatics and quantitative analysis alongside over fifteen years of research experience, he contributes to innovative projects that transform raw data into actionable insights. This work is supported by various prestigious research grants and involves collaboration with global partners to develop robust, scalable AI solutions.

Recording

Video will be available after the conference.