Date of Award

5-2026

Degree Type

Report

Degree Name

Master of Computer Science (MCS)

Department

Computer Science

Committee Chair(s)

Shuhan Yuan

Committee

Shuhan Yuan

Committee

Shah Muhammad Hamdi

Committee

Tian Xie

Abstract

This project investigates how a reinforcement learning (RL) agent can develop territorial strategies in a highly stochastic, grid-based approximation of the television game show The Floor. I design a custom Gymnasium-compatible environment that models the show’s core mechanics on a 10×10 board, including probabilistic duels governed by player skill, adjacency-constrained attacks, chain-attack rules, and a Randomizer mechanism for selecting new initiating players. A Maskable Proximal Policy Optimization (Maskable PPO) agent is trained under several reward configurations and evaluated against stochastic non learning opponents as well as random and “always pass” baselines.

Across experiments, the best-performing configuration achieves a win rate of approximately 5.2%, outperforming the win rate of a random agent despite acting under the same duel probability model. Behavioral analysis of this agent reveals a consistent, interpretable strategy: it passes whenever possible, never initiates chain attacks, and almost exclusively targets opponents on edge or corner regions of the board. Rather than favoring weaker opponents by skill, the agent prioritizes attacks that move its territory toward low-exposure regions, thereby reducing the probability of being challenged by multiple neighbors and improving its long-term survival.

These findings demonstrate that, even with simplified duel mechanics, a single RL agent can discover robust spatial heuristics in a large, stochastic multi-player environment. The environment and results provide a foundation for future work on more realistic opponent models, richer state representations, and multi-agent training in The Floor and related territorial games.

Share

COinS