University of Oulu

H. Cha, J. Park, H. Kim, M. Bennis and S. -L. Kim, "Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning," in IEEE Intelligent Systems, vol. 35, no. 4, pp. 94-101, 1 July-Aug. 2020, doi: 10.1109/MIS.2020.2994942

Proxy experience replay : federated distillation for distributed reinforcement learning

Saved in:
Author: Cha, Han1; Park, Jihong2; Kim, Hyesung3;
Organizations: 1Yonsei University
2University of Oulu
3Samsung Electronics Co., Ltd.
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.5 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2020101684236
Language: English
Published: Institute of Electrical and Electronics Engineers, 2020
Publish Date: 2020-10-16
Description:

Abstract

Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience RM (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated RL (FRL), and policy distillation.

see all

Series: IEEE intelligent systems
ISSN: 1541-1672
ISSN-E: 1941-1294
ISSN-L: 1541-1672
Volume: 35
Issue: 4
Pages: 94 - 101
DOI: 10.1109/MIS.2020.2994942
OADOI: https://oadoi.org/10.1109/MIS.2020.2994942
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Subjects:
Copyright information: © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.