University of Oulu

R. Hashemi, S. Ali, N. H. Mahmood and M. Latva-Aho, "Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems," in IEEE Internet of Things Journal, vol. 10, no. 10, pp. 8931-8943, 15 May15, 2023, doi: 10.1109/JIOT.2022.3232962

Deep reinforcement learning for practical phase-shift optimization in RIS-aided MISO URLLC systems

Saved in:
Author: Hashemi, Ramin1; Ali, Samad1; Mahmood, Nurul Huda1;
Organizations: 1Centre for Wireless Communications, University of Oulu, 90014 Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 2.3 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2022
Publish Date: 2023-08-25


We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a nonideal reconfigurable intelligent surface (RIS)-aided ultrareliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by leveraging a deep reinforcement learning (DRL) algorithm named twin-delayed deep deterministic policy gradient (TD3). First, assuming an industrial automation system, the signal-to-interference-plus-noise ratio and achievable rate in the FBL regime are identified for each actuator. Next, the joint active/passive beamforming and CBL optimization problem (OP) is formulated where the objective is to maximize the total achievable FBL rate in all actuators, subject to nonlinear amplitude response at the RIS elements, BS transmit power budget, and total available CBL. Since the formulated problem is highly nonconvex and nonlinear, we resort to employing an actor–critic policy gradient DRL algorithm based on TD3. The considered method relies on interacting RIS with the industrial automation environment by taking actions which are the phase shifts at the RIS elements, CBL variables, and BS beamforming to maximize the expected observed reward, i.e., the total FBL rate. We assess the performance loss of the system when the RIS is nonideal, i.e., with nonlinear amplitude response, and compare it with ideal RIS without impairments. The numerical results show that optimizing the RIS phase shifts, BS beamforming, and CBL variables via the TD3 method with deterministic policy outperforms conventional methods and it is highly beneficial for improving the network total FBL rate considering finite CBL size.

see all

Series: IEEE internet of things journal
ISSN: 2372-2541
ISSN-E: 2327-4662
ISSN-L: 2327-4662
Volume: 10
Issue: 10
Pages: 8931 - 8943
DOI: 10.1109/JIOT.2022.3232962
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Funding: This work was supported by the Academy of Finland, 6G Flagship Program under Grant 346208. Ramin Hashemi would like to acknowledge the support of the Nokia Scholarship Foundation.
Academy of Finland Grant Number: 346208
Detailed Information: 346208 (Academy of Finland Funding decision)
Copyright information: © The Author(s) 2022. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see