KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

KORGym: A dynamic game platform offering over fifty games for comprehensive LLM reasoning evaluation

Abstract

Existing benchmarks are often domain-specific and thus cannot fully capture an LLM’s general reasoning potential. KORGym offers over fifty games in either textual or visual formats and supports interactive, multi-turn assessments with reinforcement learning scenarios. The research involved testing 19 LLMs and 8 VLMs, revealing consistent reasoning patterns within model families and demonstrating the superior performance of closed-source models. The platform examines various factors affecting model performance including modality, reasoning strategies, reinforcement learning techniques, and response length.

Type
Publication
Technical Report
Jiangjie Chen
Jiangjie Chen
Researcher

His research interests mainly include large models and their reasoning abilities.