Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View
Published in AAAI 2026, 2026
This paper revisits multimodal data sampling strategies for reinforcement learning based post-training from a difficulty-distinguish perspective.
We propose two complementary difficulty metrics:
- PISM: Progressive Image Semantic Masking, effective for perception-intensive tasks
- CMAB: Cross-Modal Attention Balance, effective for reasoning-intensive tasks
Based on these metrics, we design a hierarchical post-training framework supporting both GRPO-only and SFT+GRPO paradigms, enabling effective fusion of perception and reasoning abilities in multimodal large models.
