The paper introduces MemAgent, a novel approach to handling extremely long documents in language models. The system reads text in segments and updates the memory using an overwrite strategy and extends the DAPO algorithm for training. Key achievements include extrapolating from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieving 95%+ in 512K RULER test. This represents significant progress in long-context language model capabilities, demonstrating substantial scalability improvements through reinforcement learning-based memory management with multi-conversation generation training.