Joint caching and computing service placement for edge-enabled IoT based on deep reinforcement learning |
|
Author: | Chen, Yan1; Sun, Yanjing1; Yang, Bin2; |
Organizations: |
1School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116 China 2School of Computer and Information Engineering, Chuzhou University, Anhui, 239000 China 3Center of Wireless Communications, University of Oulu, 90570 Oulu, Finland
4Department of Computer and Information Security, Sejong University, Seoul 05006, South Korea
|
Format: | article |
Version: | accepted version |
Access: | open |
Online Access: | PDF Full Text (PDF, 3.5 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2022090257038 |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers,
2022
|
Publish Date: | 2022-09-02 |
Description: |
AbstractBy placing edge service functions in proximity to IoT facilities, edge computing can satisfy various IoT applications’ resource and latency requirements. Sensing-data-driven IoT applications are prevalent in IoT systems, and their task processing relies on sensing data from sensors. Therefore, to ensure the quality of service (QoS) of such applications in an edge-enabled IoT system, dedicated caching functions (CFs) are required to cache necessary sensing data. This paper considers an edge-enabled IoT system and investigates the joint caching and computing service placement (JCCSP) problem for sensing-data-driven IoT applications. Then, deep reinforcement learning (DRL) is exploited since it can adapt to a heterogeneous system with limited prior knowledge. In the proposed DRL-based approaches, a policy network based on the encoder-decoder model is constructed to address the issue of varying sizes of JCCSP states and actions caused by different numbers of CFs related to applications. Then, an on-policy REINFORCE-based method is adopted to train the policy network. After that, an off-policy training method based on the twin-delayed (TD) deep deterministic policy gradient (DDPG) is proposed to enhance the training efficiency and experience utilization. In the proposed DDPG-based method, a weight-averaged twin-Q-delayed (WATQD) algorithm is introduced to reduce the bias of Q-value estimation. Simulation results show that our proposed DRL-based JCCSP approaches can achieve converged performance that is significantly superior to benchmarks. Moreover, compared with the original TD method, the proposed WATQD method can significantly improve the training stability. see all
|
Series: |
IEEE internet of things journal |
ISSN: | 2372-2541 |
ISSN-E: | 2327-4662 |
ISSN-L: | 2327-4662 |
Volume: | 9 |
Issue: | 19 |
Pages: | 19501 - 19514 |
DOI: | 10.1109/jiot.2022.3168869 |
OADOI: | https://oadoi.org/10.1109/jiot.2022.3168869 |
Type of Publication: |
A1 Journal article – refereed |
Field of Science: |
213 Electronic, automation and communications engineering, electronics |
Subjects: | |
Funding: |
This work was supported in part by the National Natural Science Foundation of China under Grant 62071472; in part by the Fundamental Research Funds for the Central Universities under Grant 2020ZDPY0304; in part by the Chinese Government Scholarship under Grant 202006420096; in part by the Academy of Finland 6Genesis Project under Grant 318927; and in part by the IDEA-MILL under Grant 335936 |
Academy of Finland Grant Number: |
318927 |
Detailed Information: |
318927 (Academy of Finland Funding decision) |
Copyright information: |
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |