Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning

Published in EMNLP 2025 Findings, 2025

We propose Curriculum Reinforcement Learning for improving reasoning and out-of-distribution generalization of small-scale Vision-Language Models.

To address sparse rewards in RL post-training, we introduce a curriculum strategy from the perspective of reward acquisition difficulty. Combined with self-refinement fine-tuning, our approach significantly enhances multimodal perception and reasoning.

Experiments show that the resulting Qwen2.5VL-3B model consistently outperforms InternVL2.5-26B across multiple general benchmarks, demonstrating the effectiveness of RL-based curriculum learning for compact multimodal models.

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)