Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
Published in EMNLP 2025 Findings, 2025
We propose Curriculum Reinforcement Learning for improving reasoning and out-of-distribution generalization of small-scale Vision-Language Models.
To address sparse rewards in RL post-training, we introduce a curriculum strategy from the perspective of reward acquisition difficulty. Combined with self-refinement fine-tuning, our approach significantly enhances multimodal perception and reasoning.
Experiments show that the resulting Qwen2.5VL-3B model consistently outperforms InternVL2.5-26B across multiple general benchmarks, demonstrating the effectiveness of RL-based curriculum learning for compact multimodal models.
