OpenAI Model Scores 135 on Mensa IQ Test, Raising AI Intelligence Debate

OpenAI Model Achieves High IQ Score

A recent report indicates that an OpenAI model, referred to as "o3" in some analyses, has achieved a score of 135 on a version of the Mensa IQ test. This score, which places the AI in the "genius" category, has sparked discussions about the evolving reasoning capabilities of artificial intelligence.

Details of the Achievement

The "o3" model, as reported by Tracking AI and covered by outlets like Visual Capitalist and Analytics India Magazine, is described as a text-only model. This model outperformed other AI systems, including multimodal models that process both text and images, which scored lower on the same tests. For instance, the GPT-4o (Vision) model reportedly scored 63 on the Mensa Norway IQ test.

Implications and Expert Opinions

The achievement of a 135 score suggests significant advancements in AI's ability to tackle abstract problem-solving tasks, traditionally challenging for AI systems. However, experts caution that while the score is impressive, it does not equate to human-like intelligence or consciousness. IQ tests measure a specific subset of intelligence, and AI's performance on these tests doesn't necessarily reflect generalized thinking ability.

Methodology and Limitations

The score was reportedly calculated from a seven-run rolling average on the Mensa Norway test. However, there is a lack of detailed methodological transparency, particularly regarding the exact prompting strategies used and how scoring scales are converted for AI. This limits the reproducibility and interpretation of the results.

Future Prospects

The prospect of AI models achieving high scores on human IQ tests points towards a future where AI could tackle increasingly complex problems. However, the AI community remains cautious, emphasizing the need for reliable and safe AI systems. Researchers are exploring other benchmarks to better understand AI's generalization capabilities.

Conclusion

The reported high IQ score of the "o3" model underscores the rapid advancements in AI's cognitive abilities. While these developments open new possibilities, they also highlight the ongoing debate about measuring and comparing AI intelligence to human intellect. The distinction between text-only and multimodal models' performance on such tests further emphasizes the complexity of developing truly general and robust AI systems.

AI Research | 6/10/2025