AI Models Develop Internal Representations Through Othello Game Study

A recent study conducted by researchers at the University of Copenhagen has provided compelling evidence for the 'world model' hypothesis, which posits that large language models (LLMs) can create internal representations of their training environments without explicit instructions. The researchers focused on training several language models using sequences of moves from the board game Othello, revealing that these models not only learned the game rules but also independently constructed a model of the 8x8 game board.

Background on the World Model Hypothesis

The world model hypothesis investigates whether language models go beyond simple predictions of the next word or token, suggesting they may build an underlying model of the reality that generates the data they encounter. Previous experiments, particularly the initial "Othello-GPT" study, hinted at this capability. In that study, a GPT-variant was trained solely on Othello move sequences and, despite not being explicitly taught the rules, learned to make valid moves with high accuracy. This indicated that the model had developed an internal representation of the board state, which could be used to predict the positions of game pieces.

Expanded Research Findings

The new research, led by Yifei Yuan and Anders Søgaard, expands on earlier findings by evaluating seven different language models, including both encoder-decoder architectures like T5 and BART, as well as decoder-only models such as GPT-2, Mistral, and LLaMA-2. The models were trained on two datasets: one consisting of thousands of real championship Othello games and another with millions of synthetically generated games. The primary task was to predict the next legal move in a given sequence.

The results were striking, with all seven models achieving up to 99% accuracy in predicting subsequent moves when trained on a large dataset. Furthermore, the researchers utilized representation alignment tools to analyze the board features learned by each model, finding significant similarities across different architectures. This convergence suggests that the models independently arrived at similar internal representations of the game's structure.

Implications for AI Understanding

The findings from this study have profound implications for the field of artificial intelligence. The Othello experiments serve as a controlled environment to explore whether LLMs understand the world or merely act as "stochastic parrots" that mimic data patterns. The evidence from the Copenhagen study strongly supports the idea that these models can induce underlying principles from raw sequential data, which could extend to more complex real-world concepts and relationships.

However, the research also highlights limitations, such as the models' struggles to generate complete valid game sequences from scratch and their reliance on extensive datasets to achieve high accuracy. This points to the computational challenges involved in developing these world models.

Conclusion

The expanded Othello experiment at the University of Copenhagen provides robust evidence that large language models can form internal world models. By demonstrating this capability across a diverse range of models and showing convergence in their learned representations, the research suggests a fundamental emergent property of these systems. While Othello is a simplified environment, it serves as a valuable laboratory for understanding AI mechanics, challenging the perception of LLMs as mere pattern-matchers and prompting a reevaluation of their potential to build and utilize internal models of the world.

AI Research | 6/21/2025

AI Models Develop Internal Representations Through Othello Game Study