Building a Training Area for Kubernetes Machine Learning Models

Applied Computing Research Labs Computer Science/Mathematics, 2025–26

Liaison(s): Dr. David Morrison ’08
Advisor(s): Erin Talvitie
Students(s): Cate Lewison (TL-S), Diya Gangwar (TL-F), Bob Gao, Omar Jimenez, Rui Zhang

Applied Computing Research Labs (ACRL) is a leader in building solutions to optimize and manage distributed systems applications. ACRL’s work centers on Kubernetes, the ubiquitous platform that drives modern cloud infrastructure. However, Kubernetes remains a complicated ecosystem to debug. Using SimKube, ACRL’s Kubernetes simulator, the Clinic team designed a training arena where reinforcement learning agents learn to handle resource constraints and system errors. This training arena would make SimKube the only complete platform for training RL agents on Kubernetes simulations.