Abstract : As massive MU-MIMO systems scale, managing high computational demands and ensuring fairness among users become crucial challenges. Traditional scheduling methods often fall short in dynamically adapting to changing environments and balancing multi-dimensional performance metrics. To address these challenges, we enhance the Advantage Actor-Critic (A2C) framework by integrating Convolutional Neural Networks (CNNs) and Transformers for massive MU-MIMO systems. The CNN components specialize in extracting localized channel-state features, while the Transformers dynamically model inter-user dependencies through attention mechanisms. Specifically, we innovatively embed convolutional layers within the Transformer encoder and employ an auto-regressive decoder, reformulating the user group selection as a sequential decision-making process based on conditional probabilities. This represents the first application of a hybrid CNN-Transformer architecture for discrete scheduling decisions in MU-MIMO systems.To ensure balanced performance, we introduce a multi-metric reward function that incorporates multiple metrics rather than a single performance indicator. Our reward is calculated as the product of the selected user’s spectral efficiency (SE) and the Jain Fairness Index (JFI) during scheduling. Simulations demonstrate the reward function's effectiveness in achieving both high SE and fair resource distribution. We further enhance reward convergence speed through an improved policy network that boosts user scheduling performance while accelerating reward convergence. The proposed model stabilizes reward curves faster than existing frameworks, enabling quicker convergence on optimal strategies and reducing training time, thereby enhancing the framework’s suitability for dynamic MU-MIMO scenarios. Additionally, integrating digital and analog Maximal Ratio Combining (MRC) and Zero-Forcing (ZF) beamforming techniques offers ppractical, scalable solutions tailored to future massive MU-MIMO systems.
Index terms : Massive MU-MIMO, User scheduling, Advantage Actor-Critic, Conformer, Low Complexity Beamforming