S. Ali, A. Ferdowsi, W. Saad, N. Rajatheva and J. Haapola, "Sleeping Multi-Armed Bandit Learning for Fast Uplink Grant Allocation in Machine Type Communications," in IEEE Transactions on Communications, vol. 68, no. 8, pp. 5072-5086, Aug. 2020, doi: 10.1109/TCOMM.2020.2989338
Sleeping multi-armed bandit learning for fast uplink grant allocation in machine type communications
|Author:||Ali, Samad1; Ferdowsi, Aidin2; Saad, Walid2;|
1Centre for Wireless Communications (CWC), University of Oulu, Finland
2Wireless@VT, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA
|Online Access:||PDF Full Text (PDF, 9.9 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2020052538960
Institute of Electrical and Electronics Engineers,
|Publish Date:|| 2020-05-25
Scheduling fast uplink grant transmissions for machine type communications (MTCs) is one of the main challenges of future wireless systems. In this paper, a novel fast uplink grant scheduling method based on the theory of multi-armed bandits (MABs) is proposed. First, a single quality-of-service metric is defined as a combination of the value of data packets, maximum tolerable access delay, and data rate. Since full knowledge of these metrics for all machine type devices (MTDs) cannot be known in advance at the base station (BS) and the set of active MTDs changes over time, the problem is modeled as a sleeping MAB with stochastic availability and a stochastic reward function. In particular, given that, at each time step, the knowledge on the set of active MTDs is probabilistic, a novel probabilistic sleeping MAB algorithm is proposed to maximize the defined metric. Analysis of the regret is presented and the effect of the prediction error of the source traffic prediction algorithm on the performance of the proposed sleeping MAB algorithm is investigated. Moreover, to enable fast uplink allocation for multiple MTDs at each time, a novel method is proposed based on the concept of best arms ordering in the MAB setting. Simulation results show that the proposed framework yields a three-fold reduction in latency compared to a maximum probability scheduling policy since it prioritizes the scheduling of MTDs that have stricter latency requirements. Moreover, by properly balancing the exploration versus exploitation tradeoff, the proposed algorithm selects the most important MTDs more often by exploitation. During exploration, the sub-optimal MTDs will be selected, which increases the fairness in the system, and, also provides a better estimate of the reward of the sub-optimal MTD.
IEEE transactions on communications
|Pages:||5072 - 5086|
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
213 Electronic, automation and communications engineering, electronics
This work was supported by the Academy of Finland 6Genesis Flagship under grant 318927, and in part by 5G-FORCE project, and in part by the U.S. National Science Foundation under Grant CNS-1836802.
|Academy of Finland Grant Number:||
318927 (Academy of Finland Funding decision)
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.