Spatially Pooled Feature Stacking: Parameter-Efficient Meta-Learning for Data-Scarce Ensembles

stacked generalization global average pooling Novelty: 2.0

Ensembling diverse neural architectures consistently improves predictive performance, but traditional stacking methods that aggregate raw feature maps introduce massive parameter overheads that lead to severe overfitting in data-scarce domains. While using output logits reduces this dimensionality, it strips away the rich representational context needed for dynamic, sample-aware model weighting. Bridging the gap between feature-rich meta-learning and extreme architectural constraint is essential for deploying robust ensembles in specialized fields like medical imaging and neural decoding.

Approach

We propose Spatially Pooled Feature Stacking (SPFS), a meta-learning framework that aggregates base-learner representations using Global Average Pooling (GAP) prior to ensemble weighting. Instead of concatenating high-dimensional feature maps or relying solely on output logits, SPFS extracts the pre-classification feature maps from diverse base models and applies GAP to generate compact, spatially invariant summaries. These pooled vectors are then concatenated and fed into a lightweight multi-layer perceptron meta-learner, adapting the feature-based weighting strategy from [Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models](/paper/art_98b9d176d3dc4047b40fa89db19e8207) but with drastically reduced fan-in requirements. This approach provides the meta-learner with rich, label-agnostic visual context to dynamically weight base models while acting as a strong structural regularizer against overfitting, echoing the architectural constraints seen in [Zhejiang University at ImageCLEF 2019 Visual Question Answering in the Medical Domain](/paper/art_0010063fae5649b19c6255cfe8595ab3).

Experimental Plan

We will evaluate SPFS on data-scarce medical imaging benchmarks, specifically the ISIC 2017 skin lesion classification dataset and the ImageCLEF VQA-Med challenge. The primary hypothesis is that SPFS will outperform both simple logit-averaging and dense feature-concatenation stacking by preventing meta-learner overfitting while maintaining sample-aware dynamic weighting. Baselines will include standard logit-based SVM stacking as used in [RECOD Titans at ISIC Challenge 2017](/paper/art_4a60769bd2de488a894141545a1b2875), simple unweighted averaging, and dense feature-based stacking without GAP. Performance will be measured using AUC-ROC for classification and BLEU/accuracy for VQA, alongside a comparative analysis of meta-learner parameter counts and validation loss trajectories to explicitly quantify the regularization benefits of the GAP integration.

Open Questions

How can global average pooling be integrated into the feature-aggregation layer of a meta-learner to prevent parameter explosion in stacked ensembles?
Does replacing dense feature concatenation with spatially pooled representations in stacked generalization improve ensemble robustness in data-scarce domains?
Can a meta-learner dynamically weight base models more effectively by evaluating their spatially invariant feature summaries rather than their raw output logits?
Explore this research direction with an AI assistant
Powered by Althea