Talk
Fine-tuning talks often stop at “use LoRA” or “use RL,” but practitioners still need an answer to a more useful question: what post-training recipe should I choose for my data, my task, and my GPU budget? This session gives a practical answer by comparing supervised fine-tuning (SFT), preference optimization (DPO, with a brief ORPO note), and reasoning-oriented reinforcement learning (GRPO) for open-weight LLMs.
A reasoning-focused example is used to make the trade-offs explicit: SFT from demonstrations, DPO from preference pairs, and GRPO from verifiable rewards. The talk introduces the core ideas and a few lightweight formulas, then moves into hands-on Python implementations.
A major focus is computational cost. Rather than speaking abstractly about “small” and “large” models, the session maps model sizes to practical hardware tiers, explaining what is realistic for SFT, DPO, and GRPO in each setting. A reproducible GitHub repository with scripts, configurations, evaluation notebooks, and a cost sheet will be provided.
Attendees should be comfortable with Python and have basic familiarity with transformers and PyTorch, but prior RLHF experience is not required. They will leave with a clear framework for selecting methods, understanding feasibility, and starting post-training in a practical and resource-aware way.
About the Speaker
Aram Butavyan is a lecturer at the Akian College of Science and Engineering (ACSE) at the American
University of Armenia and a researcher at the Engineering Research Center, where he focuses on the
intersection of academic theory and practical industrial application. Specializing in machine
learning and deep learning, his research addresses complex challenges within natural language
processing, computer vision, and data-driven decision-making.
With a strong technical foundation in informatics and quantitative analysis alongside over fifteen
years of research experience, he contributes to innovative projects that transform raw data into
actionable insights. This work is supported by various prestigious research grants and involves
collaboration with global partners to develop robust, scalable AI solutions.