Spectral Expansion Forces Feature Learning: Overcoming the Expressivity-Learnability Gap in Boolean Logic

Deep neural networks possess the theoretical capacity to represent complex logical functions, yet gradient descent frequently fails to discover these solutions, highlighting a severe expressivity-learnability gap. This failure often occurs because networks remain trapped in a lazy, kernel-like training regime where internal representations fail to adapt to the target distribution. By understanding how the structural complexity of weight matrices evolves during the transition to rich feature learning, we can design optimization strategies that unlock a network's full representational power.

Approach

We propose a novel training framework that actively monitors and regularizes the spectral complexity of weight matrices to force networks out of the lazy regime and into a rich feature-learning state. Building on the concept of weight expansion from [Weight Expansion: A New Perspective on Dropout and Generalization](/paper/art_0e2821aaa9ed44dcaf878db5a49d0922), we introduce a loss penalty that maximizes the normalized determinant of the weight covariance matrix during the initial training epochs. This explicit volume expansion prevents the network from acting as a static kernel, a failure mode identified in [Lecture notes: From Gaussian processes to feature learning](/paper/art_bc2cb6266150455e9833715034887ebf). By tracking the effective rank of the neural tangent kernel as proposed in [Implicit Regularization via Neural Feature Alignment](/paper/art_fc9efec14d724374b0beb67f55028a44), we dynamically adjust this regularization to ensure the network successfully navigates the non-convex landscape required to learn complex functions.

Experimental Plan

We evaluate our approach on the Majority Boolean Logic benchmark and synthetic parity tasks, where standard gradient descent provably fails as shown in [Provable Failure of Language Models in Learning Majority Boolean Logic via Gradient Descent](/paper/art_0218d60cf3d442969ad081fe3ebead79). Our primary hypothesis is that spectral volume regularization will enable standard MLPs and Transformers to achieve high accuracy on these tasks by forcing early feature alignment, whereas unregularized models will remain at random chance. We compare our method against standard weight decay, dropout, and the $\mu$P initialization scheme from [Non-Gaussian Tensor Programs](/paper/art_8dbfeb034b9b495eae9f4f16ca5f8e5a). Metrics include final test accuracy, generalization gap, and the layer-wise effective rank measured at epoch 10 to validate the early transition into the rich learning phase.

Open Questions

How do early-stage spectral complexity dynamics dictate whether a network enters a feature-learning regime capable of solving hard logical tasks?

Can we overcome the expressivity-learnability gap in deep networks by explicitly regularizing the effective rank of weight matrices to force a transition out of the lazy training regime?

What is the relationship between the statistical volume of a network's weight space and its ability to discover complex Boolean logic during gradient descent?

Explore this research direction with an AI assistant

Spectral Expansion Forces Feature Learning: Overcoming the Expressivity-Learnability Gap in Boolean Logic

Approach

Experimental Plan

Open Questions

Continue with Althea

Bridged Directions