Adaptive Spectral Smoothing Curricula for Visual Reinforcement Learning in Sparse-Reward Environments

internal feature map smoothing static curriculum scheduling decaying augmentation curriculum sparse reward problem Novelty: 2.1

In visual reinforcement learning, agents operating under sparse rewards frequently fail to explore effectively because they overfit to high-frequency visual noise and irrelevant textures early in training. By forcing the agent to initially perceive only the coarse, low-frequency structural elements of its environment, we can drastically simplify the state space and facilitate the discovery of foundational behaviors. Gradually reintroducing high-frequency details as the agent's competence improves ensures robust credit assignment and stable policy convergence without requiring dense reward engineering.

Approach

We propose an adaptive frequency curriculum that integrates internal feature map smoothing into the visual encoder of an RL agent. Inspired by [Curriculum By Smoothing](/paper/art_ade7d4b8c4684cde8eb8eff20fee22a3), we insert Gaussian low-pass filters after the convolutional layers of the policy network to suppress high-frequency spatial details during early exploration. Unlike static annealing schedules, the variance of the smoothing kernel is dynamically decayed based on the agent's temporal difference (TD) error, ensuring the curriculum progresses only when the agent has mastered the current difficulty level. This forces the agent to learn broad spatial navigation before attempting fine-grained manipulation, directly addressing the exploration bottlenecks highlighted in [PushWorld: A benchmark for manipulation planning with tools and movable obstacles](/paper/art_efa23f423c374ef5beaf1bb524b43f1f).

Experimental Plan

We will evaluate our method on the pixel-based manipulation tasks in the Meta-World benchmark and the sparse-reward navigation environments in MiniGrid. The primary hypothesis is that adaptive spectral smoothing will achieve higher success rates and faster sample efficiency than standard end-to-end RL and static curriculum baselines by preventing early-stage optimization traps. We will compare our approach against standard Soft Actor-Critic (SAC), SAC with a static [Curriculum By Texture](/paper/art_7782e22ab11a4d09b567ddb175f46f59) applied to the encoder, and an intrinsic motivation baseline like [Adversarial Intrinsic Motivation for Reinforcement Learning](/paper/art_8034f2f3d49c4fc78e3f74ec72afbf5c). Metrics will include episodic success rate, sample efficiency (environment steps to convergence), and zero-shot robustness to novel visual distractors.

Open Questions

How can progressively annealing the high-frequency visual details of an environment accelerate policy convergence in sparse-reward reinforcement learning tasks?
Does dynamically scheduling internal feature map smoothing based on an agent's value network error prevent premature overfitting to visual noise during early exploration?
Can a curriculum that transitions from low-frequency structural representations to high-frequency fine details bridge the credit assignment gap in long-horizon manipulation tasks?
Explore this research direction with an AI assistant
Powered by Althea