University of Oulu

F. W. Murti, S. Ali and M. Latva-aho, "Constrained Deep Reinforcement Based Functional Split Optimization in Virtualized RANs," in IEEE Transactions on Wireless Communications, doi: 10.1109/TWC.2022.3179811.

Constrained deep reinforcement based functional split optimization in virtualized RANs

Saved in:
Author: Murti, Fahri Wisnu1; Ali, Samad1; Latva-aho, Matti1
Organizations: 1Centre for Wireless Communications, University of Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 2.7 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2022
Publish Date: 2022-09-15


In virtualized radio access network (vRAN), the base station (BS) functions are decomposed into virtualized components that can be hosted at the centralized unit or distributed units through functional splits. Such flexibility has many benefits; however, it also requires solving the problem of finding the optimal splits of functions of the BSs in such a way that minimizes the total network cost. The underlying vRAN system is complex and precise modelling of it is not trivial. Formulating the functional split problem to minimize the cost results in a combinatorial problem that is provably NP-hard, and solving it is computationally expensive. In this paper, a constrained deep reinforcement learning (RL) approach is proposed to solve the problem with minimal assumptions about the underlying system. Since in deep RL, the action selection is the outcome of inference of a neural network, it can be done in real-time while training to update the neural networks can be done in the background. However, since the problem is combinatorial, even for a small number of functions, the action space of the RL problem becomes large. Therefore, to deal with such a large action space, a chain rule-based stochastic policy is exploited in which a long short-term memory (LSTM) network-based sequence-to-sequence model is applied to estimate the policy that is selecting the functional split actions. However, the utilized policy is still limited to an unconstrained problem, and each split decision is bounded by vRAN’s constraint requirements. Hence, a constrained policy gradient method is leveraged to train and guide the policy toward constraint satisfaction. Further, a search strategy by greedy decoding or temperature sampling is utilized to improve the optimality performance at the test time. Simulations are performed to evaluate the performance of the proposed solution using synthetic and real network datasets. Our numerical results show that the proposed RL solution architecture successfully learns to make optimal functional split decisions with the accuracy of the solution is up to 0.05% of the optimality gap. Moreover, our solution can achieve considerable cost savings compared to C-RAN or D-RAN systems and a faster computational time than the optimal baseline.

see all

Series: IEEE transactions on wireless communications
ISSN: 1536-1276
ISSN-E: 1558-2248
ISSN-L: 1536-1276
DOI: 10.1109/TWC.2022.3179811
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Funding: This research has been supported by the Academy of Finland, 6G Flagship program under Grant 346208.
Academy of Finland Grant Number: 346208
Detailed Information: 346208 (Academy of Finland Funding decision)
Copyright information: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see