Predictive Control and Communication Co-Design via Two-Way Gaussian Process Regression and AoI-Aware Scheduling

This article studies the joint problem of uplink-downlink scheduling and power allocation for controlling a large number of actuators that upload their states to remote controllers and download control actions over wireless links. To overcome the lack of wireless resources, we propose a machine learning-based solution, where only a fraction of actuators is controlled, while the rest of the actuators are actuated by locally predicting the missing state and/or action information using the previous uplink and/or downlink receptions via a Gaussian process regression (GPR). This GPR prediction credibility is determined using the age-of-information (AoI) of the latest reception. Moreover, the successful reception is affected by the transmission power, mandating a co-design of the communication and control operations. To this end, we formulate a network-wide minimization problem of the average AoI and transmission power under communication reliability and control stability constraints. To solve the problem, we propose a dynamic control algorithm using the Lyapunov drift-plus-penalty optimization framework. Numerical results corroborate that the proposed algorithm can stably control $2$x more number of actuators compared to an event-triggered scheduling baseline with Kalman filtering and frequency division multiple access, which is $18$x larger than a round-robin scheduling baseline.

decisions both control and channel states to minimize the overall transmission time of the scheduled control systems. Therein, the scheduling decisions are based on communication reliability that is sensitive to the control state and system dynamics. However, in the absence of wireless communication, the outdated applied actions over ideal channels are based on the last received state, hindering control stability. While the previously mentioned scheduling methods in [8]- [12] are effective in small-scale control environments, as pointed out in [15], the communication and control designs in [8]- [12] are separated from each other, limiting their adoption for supporting a large number of control systems. This mandates a tight co-design of reliable communication and stable control operations for optimal resource allocation and scheduling.

A. Contributions
Spurred by the aforementioned motivation, in this work we consider a scenario consisting of a large number of control systems connected through wireless links. To overcome the lack of wireless resources, we propose a GPR based solution in which only a fraction of control systems is controlled by communicating with the controller, while the remaining control systems are controlled by locally predicting the missing state/action via GPR, using the previous received state/action. For each control system, a controller calculates the action using a linear quadratic regulator (LQR) whose input, i.e., either the estimated state using a minimum mean square error (MMSE) estimator if a sensor-controller pair of a control system is scheduled in the uplink (UL) or the predicted state using GPR at the controller when it is not scheduled. Then, to regulate the control system, an actuator applies either the estimated action using an MMSE estimator if the controller-actuator pair of a control system is scheduled in the downlink (DL), or the predicted action using GPR at the actuator if it is not scheduled. The centralized scheduler shared among all control systems schedules at most one sensor-controller pair in the UL and one controlleractuator pair in the DL, based on both control and channel states. Here, control stability depends on the GPR prediction credibility [16] that is determined by the age of information (AoI) of the latest received signal, and the reliability of the received signal that is dictated by transmission power, highlighting the importance of the communication and control co-design.
Given the proposed predictive control framework, we focus on the problem of jointly optimizing the UL-DL scheduling and power allocation so as to minimize the network-wide average AoI and transmission power while guaranteeing communication reliability and control stability. To solve the formulated non-convex stochastic optimization problem, we develop a dynamic control algorithm using Lyapunov optimization. Considering an inverted pendulum, numerical results demonstrate that the proposed scheduling method can stably support 2x more control systems compared to the event-triggered scheduling with Kalman filtering and frequency division multiple access (FDMA), which is 18x larger than a time-triggered scheduling baseline. Furthermore, the results show that the proposed predictive control algorithm is more communication efficient while achieving a faster control stability than the time-triggered and event-triggered control baselines, highlighting the effectiveness of the UL-DL decoupled scheduling and the use of two-way GPRs at both controller and actuator sides. The remainder of this paper is organized as follows. In Section II, we specify the WNCS architecture including the system models of the control, communication, and GPR based approach. In Section III, we formulate the communication, control, GPR based co-design optimization problem and propose the stability-aware scheduling algorithm by leveraging Lyapunov optimization framework to solve the co-design problem in Section IV. In Section V and Section VI, we present simulations results and conclude the paper.

A. Wireless Networked Control System Architecture
As depicted in Fig. 1, the WNCS architecture under study consists of a set M of M independent linear control systems over a shared wireless channel. Each control system comprises a plant, a sensor that measures the plant's state, and actuator that takes an action to control the plant's state. The action is computed by a remote controller based on the control and channel states. To this end, the plant's state is received by the controller in the uplink, and the controller's action is received by the actuator in the downlink. To avoid interference under limited communication bandwidth, each UL or DL channel is allocated to a single sensorcontroller pair or controller-actuator pair per unit time, respectively. The rest of the sensor-controller and controller-actuator pairs without receptions locally predict their missing control states and actions, respectively, based on their previously received information, to be detailed in Sec. II-D. To be specific, the plant's state of control system i ∈ M at discrete control time k ∈ Z + is denoted by x u i,k ∈ R D . For a received action at an actuator u a i,k ∈ R P based on the computed action at controller u d i,k ∈ R P , the state evolution of control system i at time k is described by the discrete-time linear time-invariant (LTI) system as follows: where A i ∈ R D×D is a fixed state transition matrix of the i-th control system, B i ∈ R D×P is a fixed control action matrix of the i-th control system, and w k ∈ R D is the plant noise at time k which is independent and identically distributed (IID) Gaussian noise with zero mean and covariance matrix W. Here, to avoid a non-trivial problem, A i is assumed to be unstable, i.e., is the D-th eigenvalue of A i . This implies that the plant's state infinitely grows over time unless a proper control action u a i,k is provided. To stabilize such control system, each time k, the following four phase operations are considered. 1) Sensing and Uplink Transmission (at a sensor): A centralized scheduler located at the base station (BS) shared among all control systems decides which sensor-controller pair is scheduled to transmit and close its sensing loop based on both the channel and control states. Then, the scheduled sensor transmits its state to its controller over a wireless UL fading channel using analog uncoded communication to be elaborated in Sec. II-B.
2) State Reception or Prediction (at a controller): If the sensor-controller pair is scheduled, the controller obtains the current estimated state using MMSE estimator, and predicts the next state via GPR to be discussed in Sec. II-D. Otherwise, the controller directly predicts the current and the next states based on the state history using GPR. The current predicted state by GPR is fed to the LQR to calculate the action unless the estimated state by the MMSE estimator is provided. The future predicted state is fed to the centralized scheduler to make the scheduling decisions.
3) Action Computation and Downlink Transmission (at a controller): For a given control state, the controller computes the optimal action using LQR [17]. The controller transmits the computed action to the scheduled actuator over a wireless DL fading channel using analog uncoded communication to be elaborated in Sec. II-B. (at an actuator): If the controller-actuator pair is scheduled, the actuator obtains the current estimated action using a MMSE estimator, and predicts the next action via an action GPR to be discussed in Sec. II-D. Otherwise, the actuator directly predicts the current and next actions based on the action history using GPR. For a given action, the actuator takes an action and subsequently the plant's state is updated according to the dynamics in (1).

B. State and Action Communications
The UL state and DL action communications are elaborated, in terms of the received signal, signal-to-noise ratio (SNR), scheduling, and age-of-information (AoI) as follows. Noisy State and Action Receptions: At the control time slot k, the received signal y l i,k at the l-th communication at the receiver of control system i is represented as where l ∈ {u, d} represents a communication indicator between the transmitter-receiver pair in which l = u refers to the UL state communication between the sensor-controller pair while l = d refers to the DL action communication between the controller-actuator pair. The transmitted signal, in the UL state communication (i.e., q = x and l = u), is the plant's state transmitted by the sensor of a control system i at time k such that E{|x u i,k (ι)| 2 } = 1, ∀ι ∈ {1, · · · , D}. The transmitted signal, in the DL action communication (i.e., q = u, ] is the action transmitted by the controller to an actuator of a control system i at time k such that E{|u d i,k (p)| 2 } = 1, ∀p ∈ {1, · · · , P }. The matrix H l i,k ∈ R F ×F represents the wireless channel of the l-th communication between the transmitterreceiver pair of a control system i at time k, and F ∈ {D, P } represents the dimensions of the transmitted state or action, respectively. The channel is modeled as a Rayleigh block fading which is static and flat-fading within either UL or DL transmission time. Channel statistics are perfectly known at the transmitters and receivers, and P l i,k ∈ 0, P l max is the transmission power of control system i at time k with total transmission power P l max . Lastly, n l k is the additive white Gaussian noise at the receiver with zero-mean and covariance matrix E{n l T k n l k } = N 0 I F , where N 0 is the measurement noise variance, and I F is the F × F identity matrix. The matrix C i ∈ R F ×F is the observation matrix of the control system i that equals the identity matrix in the DL action communication and equals the identity matrix in the UL state communication to characterize the full-state observations. The measurement noise n l k and the plant's noise w k in (1) are uncorrelated zero-mean Gaussian noise with unit variance. Hence, the SNR at the receiver of the l-th communication of a control system i at time k is given as DRAFT January 29, 2021 where SNR in (3) is equivalent to the signal-to-distortion ratio (SDR) as a result of the channel theoretical limit in which the rate-distortion function of the source equals to the channel capacity and the number of source samples is matched to the number of channels under analog uncoded communications [18]. In this work, we consider analog uncoded communications, in which the discrete-time continuous amplitude source samples are amplified and transmitted to a receiver over wireless channel. Compared to digital communications, analog communications are favorable for achieving low latency thanks to skipping encoding and decoding operations, at the cost of signal distortion during channel propagation [18]. Furthermore, analog uncoded communication is robust to channel conditions and performs well at different SNRs compared to digital communication that is sensitive to any degradation in channel conditions. To ensure reliable communication for control stability, the successful decoding of the transmitted signal is described by the indicator function I {SNR l i,k ≥SNR l th } for a target SNR threshold SNR l th . Scheduling and AoI: At time k, the centralized scheduler located at the BS and shared among all control systems schedules at most one sensor-controller pair of control system i in the UL state communications, and at most one controller-actuator pair of the control system i in the DL action communications. Let α l i,k ∈ {0, 1} be the scheduling variable of the l-th communication of the control system i at time k, where α l i,k = 1 when the transmitter-receiver pair of the l-th communication of control system i is scheduled at time k and α l i,k = 0 otherwise. The freshness of the received information is measured using AoI, i.e., the number of elapsed time since the generation of the latest received information [19]. AoI is composed of the interarrival time that is defined as the time elapsed between two consecutive update generations and the service time defined as the transmission time of update information. In analog uncoded communications, AoI depends only on the inter-arrival time since the service time is deterministic based on channel bandwidth. Hence, the AoI of the l-th communication of the control system i at the receiver linearly increases with time if it is not scheduled or its SNR is below a threshold. Formally, the AoI of the l-th communication of control system i at the receiver is: where β l i,k ∈ Z ++ is the AoI of the l-th communication of the control system i at time k at the receiver, and ξ l i,k = α l i,k I {SNR l i,k ≥SNR l th } is the transmission indicator variable of l-th communication of control system i at time k that depends on both scheduling variable and SNR indicator function.

C. State and Action Estimation Over Noisy Communication Links
The UL received states and DL received actions are distorted by Rayleigh fading channels. The original signals are estimated using the MMSE estimator as detailed next. When one transmitterreceiver pair of the l-th communication of control system i is scheduled, i.e., α l i,k = 1, the receiver applies the MMSE estimator to restore the original signal from the noisy received signal in (2).
The resultant estimated signalq l i,k is given as: where G l i,k ∈ R F ×F is the linear MMSE matrix at the receiver of the l-th communication of the control system i at time k that minimizes the mean-squared error (MSE) between the original and estimated signals as The term v l i,k in (5) is the MMSE estimation error following a zero-mean Gaussian random vector with the covariance matrix V l i,k ∈ R F ×F . Following [20], we assume that q l i,k follows a zero-mean Gaussian distribution with the covariance matrix S q ∈ R F ×F , then we have

D. State and Action Prediction Without Communication
When one transmitter-receiver pair of the l-th communication of the control system i at time k is not scheduled, i.e., α l i,k = 0, a receiver applies a number of parallel GPRs proportional to the missing signal dimensions to predict both the missing current signal and the next signal using the previously received signals. Each individual GPR learns the functional relationship g ∈ R between the control discrete-time k ∈ Z + and each output of the received signal. This means that each output of the MMSE estimated signalq l i,k in (5) is the state observationx u i,k ∈ R D in the UL and the actionū d i,k ∈ R P in the DL. This is accomplished by learning a latent function of the following regression modelq l i,k (j) = g j (k ) + , j ∈ {1, · · · , F}, ∀i, l, k , whereq l i,k (j) ∈ R is the j-th output of the estimated signal of the l-th communication of the control system i at time k , g j is a j-th output latent function, and ∼ N (0, σ 2 n ) is an IID Gaussian noise distribution with zero mean and variance σ 2 n that accounts for the measurements or modeling errors [21]. Specifically, to predict the missing received signalq l i,k of the l-th communication of the control system i at test time k, we exploit F individual GPRs, where F represents the MMSE estimated signal dimensions, and feeding each individual GPR with a training set D l,j i,n l of each output of the previous received signals associated with its observation time k , given as D l,j i,n l = {(k , ξ l i,k q l i,k (j))| j = 1, · · · , F, k = 1, · · · , n l , i = 1, · · · , M, l ∈ {u, d}}. Here, n l = k ξ l i,k counts the number of received signals of the l-th communication until time k in the training set of the control system i. Hence, the last time instant in which the transmitter of the l-th communication of the control system transmitted its observation to the receiver is given asñ l = k − β l i,k + 1. It is obvious that a large value of AoI decreases the number of observations at the receiver that affects the signal prediction credibility at a particular level.
In each individual GPR, according to the Gaussian process (GP) characteristics where any finite subset of random variables taken from a realization of a GP follows a joint Gaussian distribution, each j-th output latent function g j of the vector-valued latent function g(k) = [g 1 (k) · · · g F (k)] is assumed to follow a GP as g j (k) ∼ GP (m j (k) , R j (k, k )), where m j (k) is the mean function of the j-th output of the missing received signal which is usually taken as zero without loss of generality [22], and R j (k, k ) is the covariance function of the j-th output of the missing received signal between the outputs at time k and k that defines the correlation between the outputs according to the input times. It is noted that the stationary covariance function between the outputs is based on the difference between their corresponding input times |k − k | in which the two outputs are strongly correlated if their corresponding input times are sufficiently close to each other. Since we focus on time-series data, we utilize information from previously received signals to describe the current data depending on the past observations. Hence, we use a squared exponential kernel function coupled with a periodic kernel function, to model the correlation between the outputs according to their temporal behaviours, as defined in [22] where the first term represents the stationary covariance function that depends on when the signal |k − k | was received with h k and h q being hyperparameters representing the time-scaling and output-scaling of a squared exponential function, respectively, and the second term gives the periodicity with hyperparameter ν representing frequency. For a set of j-th output observations q l i (j) = {q l i,1 (j), · · · ,q l i,n l (j)} T and the associated observation times k = {1, · · · , n l } T , the joint distribution of the j-th output past observationsq l i,k (j) together with the j-th output g j (k) at test time k is given as where R j (k, k) ∈ R is the prior covariance function of j-th output observation at a test time k, and R j (k , k ) ∈ R n l ×n l is the symmetric and positive semi-definite covariance matrix of j-th output past observations with the elements R j (k (a), k (b)) for a, b = 1, · · · , n l . Following [21], we treat the prediction mean as the j-th output predicted signalq l i,k (j), the posterior distribution of g j (k) at test time k based on the training set D l,j i,n l can be analytically derived as Following [21], the j-th output prediction meanq l i,k (j), and the j-th output prediction variance σ 2 i,k (j) are respectively given aŝ where r j (k , k) ∈ R n l ×1 is j-th output observation covariance between the outputs at the n l observation times and a test time k, and the term e l i,k (j) is j-th output prediction error defined as the difference between true and predicted outputs. Moreover, Θ j in (10) is the j-th output hyperparameters of the covariance function R. Finally, the predicted signal at the receiver of the l-th communication of control system i at time k and its prediction error covariance matrix arê

E. Action Computation and Actuation
By feeding the estimated or predicted state, the controller computes the action using LQR.
Then, the actuator applies the estimated or predicted action to stabilize the state as detailed next. Action Computation After State Estimation/Prediction: For a given estimated state (i.e., MMSE output) in (5) or predicted state (i.e., GPR output) in (13), at time k, the state x c i,k available at the controller based on the UL transmission indicator variable is given as The received state x c i,k is used in the LQR located at the controller, and the optimal action of a control system i at time k is given by the following linear feedback control law as where u d i,k ∈ R P is the computed action at the controller, is a positive semi-definite weight DRAFT matrix of the state deviation cost, and Z u ∈ S P ×P ++ is a positive definite weight matrix of the action cost. The term  (16) to an actuator of control system i at time k in the DL, if ξ d i,k = 1, as discussed in Sec. II-B. Actuation After Action Estimation/Prediction: For a given estimated control action (i.e., MMSE output) in (5) or predicted action (i.e., GPR output) in (13), at time k, the action u a i,k available at the actuator based on the DL transmission indicator variable is given as  (15) and (17), the actuator takes a control action that changes the plant's state of the control system i at time k in (1) into four cases of state evolution as follows: Timing diagram: Based on the UL and DL transmission indicator variables within each unit control time duration, the timing diagram of a control system is illustrated in Fig. 2 (15) in which the predicted state, if ξ u i,k = 0, is applied to the LQR. Lastly, the action is transmitted by the controller if has a reliable DL communication and valuable information affecting the control stability (i.e, ξ d i,k = 1). This is a result of assuming that the UL and DL transmission indicator variables as being periodically transmitted by the centralized scheduler

A. Control-constrained Problem Formulation
Our objective is to minimize the total communication cost per control system subject to ensuring communication reliability and control stability. The total communication cost incorporates the AoI and transmission power since the AoI indirectly affects the control stability through the GPR prediction stability and wireless resources consumption, while transmit power affects communication reliability and energy consumption. Formally speaking, we have: where the non-decreasing concave functions G β (β) = log(1 + β) and G P (P ) = log(1 +P ) are proportionally fair cost functions of the AoI and the transmission power function for each control system, respectively [23]. The transmission power function that depends on the scheduling variable is given asP l i,k = α l i,k P l i,k , and the given positive weights ω β l and ω P l adjust the relative importance of the corresponding cost functions. Throughout this work, the following notation for the long-term time-averaged of any quantity z is defined asz lim sup In particular, β l i andP l i are the long-term time-averaged of β l i andP l i , respectively. To evaluate control stability, we consider the quadratic Lyapunov function that measures the performance of each control system as a function of the state expressed as where Z ∈ S D ++ is a unique positive definite solution to the discrete Lyapunov equation Because the centralized scheduler has only access to the predicted state, the expected current value of is calculated in the following lemma.
Lemma 1. Given the predicted statex u i,k and the state prediction error covariance matrix J u i,k at the controller, the expected current value of L(x u i,k ) is given as Proof. Please refer to Appendix.A Note that the expected current value of L(x u i,k ) of the control system i at time k naturally grows as the predicted state and the prediction error get larger as a result of increasing AoI and/or the insufficiency of received observations number in the training set. The requirement of control stability is that the expected future value of L(x u i,k+1 ) should decrease at a given rate ζ i ∈ (0, 1] of its expected current value of L(x u i,k ), which means the state of the control system is monotonically decreasing along trajectories, as where the expectation in the right hand side of (25) is with respect to the plant noise w k in (1), the signal estimation error v l i,k defined in (7), and the signal prediction error e l i,k defined in (13). According to the objective function in (22) and the control stability constraint in (25), the control-constrained optimization problem can be formulated as follows: where a l k = {α l i,k : ∀l ∈ {u, d}, i ∈ M} and P l k = {P l i,k : ∀l ∈ {u, d}, i ∈ M} are the UL-DL scheduling vector at time k, and the UL-DL transmission power vector at time k, respectively. The constraint in (26b) bounds the UL-DL transmission power allocation of a control system i at time k by the total available transmission power P l max , while the constraint in (26c) ensures the communication reliability that is based on SNR or SDR in analog uncoded communications. The constraints in (26d)-(26e) ensure at most one transmitter-receiver pair of a control system i is scheduled at time k. The constraint in (25) ensures the state is decreasing along trajectories that satisfies the control stability. It is noted that control stability constraint in (25) is independent with communication constraints in (26b)-(26e). However, the control stability constraint is affected and determined by the communication and scheduling variables, hence the original control-constrained problem P1 is rewritten after directly reflecting the communication control relationship of the constraint (25) in the following lemma.
Lemma 2. Given predicted statex u i,k , state prediction error covariance matrix J u i,k , the predicted actionû d i,k , the action prediction error covariance matrix J d i,k , the channel between the sensorcontroller pair H u i,k , the channel between the controller-actuator pair H d i,k , the UL transmission power P u i,k , and the DL transmission power P d i,k , the control stability constraint in (25) is satisfied IFF the following conditions on the transmission indicator variables hold, i.e., lim sup lim sup lim sup Proof. Please refer to Appendix. B Note that the conditions on the UL and DL transmission indicator variables in (27) and (28), respectively, ensure the control stability constraint in (25) in a decoupled scheduling between the UL and DL communications based on the current predicted control and channel states, while the condition on both the UL and DL scheduling variables in (29) ensures control stability in a coupled scheduling between the UL and DL communications. Intuitively, the growth of AoI at a controller/actuator leads to an increasing state/action prediction error increasing due to an outdated training set. Therefore, the transmitter-receiver pair of a control system should DRAFT be scheduled when it has a reliable communication link and the state/action prediction error is greater than the state/action estimation error to ensure control stability.
The actuator is physically decoupled from the centralized scheduler and the controller which is co-located at BS, and the DL indicator variable at the centralized scheduler relies on the action prediction error at the actuator. Hence, the controller leverages another GPR, where the input of this GPR is the discrete-time associated with the generated action by LQR plus the action estimation error asū dc i,k = u d i,k + v d i,k . As a result of the applied input to this GPR that yields a training set similar to the one at the actuator as D dc i,n d = {(k , ξ d i,k ū dc i,k )| k = 1, · · · , n d , i = 1, · · · , M }, we obtain the action prediction error similar to the one generated at actuator side.

B. Joint Communication and Control Problem
According to the UL and DL transmission indicator variables constraints in (27)-(29) that result from the control stability constraint in (25) in problem P1, problem P1 is rewritten as Moreover, the scheduling decision constraint in (26d) not only depends on its own decision but on all others scheduling decisions. Hence, a dynamic control algorithm is proposed in the next section to find the optimal scheduling vector and the optimal transmission power vector of problem P2 utilizing Lyapunov optimization framework.

IV. DYNAMIC CONTROL ALGORITHM USING LYAPUNOV OPTIMIZATION
In this section, we propose a dynamic control algorithm using the stochastic Lyapunov optimization framework to solve problem P2. However, the problem involves minimizing a weighted sum of non-decreasing concave functions of the time-averaged AoI and transmission power. Based on the dynamic stochastic optimization theory [24], it can be transformed into an equivalent problem that involves minimizing a time-averaged cost function of instantaneous AoI and transmission power. This transformation is achieved through the use of non-negative auxiliary variables γ β l i,k and γ P l i,k and corresponding virtual queues Q β l i,k and Q P l i,k with queue dynamics as whereP l i,k will be optimized at each time k. Then, the transformed problem is given as where r β l k = {γ β l i,k : l ∈ {u, d}, i ∈ M} and r P l k = {γ P l i,k : l ∈ {u, d}, i ∈ M} are the vectors of the introduced auxiliary variables. The constraints in (32d) and (32e) are introduced to bound the auxiliary variables. These constraints can be satisfied by ensuring the stability of their virtual queues, since the lower-bound of these constraints can be viewed as the arrival rate of their virtual queues, while the upper-bound can be viewed as the service rate of such virtual queues [24].
Following [24], the problem P2 and the transformed problem P3 are equivalent in which the optimal solution of P3 can be directly turned into an optimal solution of P2.
To handle UL and DL scheduling variables constraints in (30b)−(30d) associated with the control stability constraint in (25), the virtual queues Q C l i,k and Q C i,k are introduced for all control systems whose dynamics are where α l i,k will be optimized at each time k. The constraints in (30b)−(30c) can be satisfied, if their virtual queues are mean-rate stable, i.e., their time-averaged arrival rate is not larger than DRAFT its time-averaged service rate [24]. At this point, the dynamic stochastic optimization is applied to solve the transformed problem P3, which minimizes a weighted sum of the time-averaged cost function of instantaneous AoI and transmission power subject to the virtual queues stability constraints and the original problem constraints in (26b)−(26e). In this regard, we define Q β l k , Q P l k , Q C k , and Q C l k as a vector of all virtual queues Q β l i,k , Q P l i,k , Q C i,k , and Q C l i,k for all control systems, respectively. We denote the combined queue vector of all virtual queues at time k by X k = Q β l k ,Q P l k ,Q C k ,Q C l k , and express the conditional Lyapunov drift-plus-penalty as where L (X k ) is the quadratic Lyapunov function of X k that measures the virtual queues congestion in a scalar metric and is defined as ∀a, b, c ≥ 0, (max (a, 0)) 2 ≤ a 2 , and all virtual queue dynamics into (34), we derive The constant B details in (35) is omitted since it does not affect the system performance in the Lyapunov optimization. A solution to P3 can be obtained by minimizing the upper-bound (35) at each time as subject to: (26b) − (26e) and (32d) − (32e).
The optimality of problem P4 is asymptotically approached by increasing V [24]. Since the problem P4 is of separable structure, which motivate us to determine the AoI auxiliary vector r β l , transmission power auxiliary vector r P l , scheduling vector a l , and transmission power vector P l in an alternative optimization form. Hence, the overall minimization problem P4 can be decomposed into two separate sub-problems that can be solved concurrently with the observation of the virtual queues, control, and channel states.

1) Auxiliary Variable Sub-Problems:
The first decomposed sub-problem is the AoI auxiliary sub-problem, while the second decomposed sub-problem is the transmission power sub-problem.
Since the auxiliary variables of such problems are separated and independent among different control systems, their minimization sub-problems can be decoupled to be computed for each control system separately as the following convex problems subject to: subject to: The optimal AoI auxiliary variables are obtained by differentiating the objective functions of i,k = 0, the optimal AoI auxiliary variable of P4.1 is given as Similarly, by letting A(γ P l ) = V ω P l log(1 + γ P l ) − Q P l i,k γ P l i,k and γ P l * i,k denotes the solution of A(γ P l ) asÁ(γ P l ) = V ω P l (1+γ P l i,k ) − Q P l i,k = 0, the optimal transmission power auxiliary variable of P4.2 is given as 2) Scheduling Decision and Transmission Power Sub-Problems: The optimal UL-DL scheduling variables and the optimal UL-DL transmission power variables are obtained by minimizing the remaining terms of the objective function of problem P4 at each time subject to the scheduling and transmission power constraints in (26b)−(26e), which is expressed as which is a mixed-integer non-convex problem. Due to the complexity of exhaustive search for finding the optimal solution, we propose a low-complexity two-stage sequential optimization strategy to find a sub-optimal solution to the joint power allocation and scheduling assignment problem. This strategy firstly obtains the UL and DL transmission power variables, followed DRAFT by the UL and DL scheduling variables. The optimal UL and DL transmission power for each control system, determined by solving the following power allocation problem subject to: where are the constant terms defined in the objective function of P4.4. The above problem P4.4 is a generalized min-weight problem, which can be decoupled into a series of independent sub-problems for each control system separately. Hence, the optimal UL-DL transmission power variables are given as Given the optimal UL and DL transmission power variables in (43), the optimal UL and DL scheduling variables for each control system that has a control state/action to transmit are obtained by solving the following scheduling assignment problem subject to: (26d) − (26e).
The optimal UL and DL scheduling variables are obtained as follows: where are the terms defined in the objective function of problem P4.5. Moreover, j 1 = arg min i∈M Q 1 i,k , j 2 = arg min i∈M Q 2 i,k , and j 3 = arg min i∈M (Q 1 i,k + Q 2 i,k + Q 3 i,k ) are control system indices.

V. SIMULATION RESULTS AND DISCUSSIONS
In this section, the performance of the proposed stability-aware scheduling algorithm is investigated in an inverted-pendulum on a cart system with M = 2, and M = 20 inverted-pendulums, respectively. Each inverted-pendulum system is described by a four-dimensional state vector as , where x i,k represents the cart's position along the horizontal axis, x i,k represents the cart's velocity, θ i,k represents the pendulum angle along the vertical axis, andθ i,k represents the pendulum's angular velocity. The initial state of the control systems i is x u i,0 = [0 0 0.1 0] T . The action u a i,k is the horizontal force applied on the linear cart. By applying a zeroth-order with a state sampling of 10ms on the continuous dynamics of the inverted-pendulum system and linearizing around the pendulum up-position, i.e., θ i,k = 0, we obtain the following discrete-time linear dynamics matrices [14], Since A i 's largest eigenvalues {3.85, 0.42, 0.92, 1.00} is greater than unity, the invertedpendulum is unstable without an appropriate control action [20]. To stabilize the control system, the feedback gain matrix Φ i is calculated at the controller based on LQR in (16). The rest of the simulation parameters are P l max = 20 dBm, N 0 = −20 dBm, ζ i = 0.01, V = 1000, ω β = 1, ω P = 1, h k = 1, h q = 1, µ = 1, and σ 2 n = 0.01. The performance of the proposed stability-aware scheduling method is compared versus five scheduling baselines. In Baseline 1. (Round-Robin Scheduling), each sensor/controller periodically transmits its state/action over a wireless channel with fixed transmission power and a predefined repeating order [8], [9]. In Baseline 2. (Opportunistic Scheduling), the sensor/controller is scheduled under favorable channel conditions. Otherwise, the controller/actuator applies the last received state/action [10], [11]. In Baseline 3, (Event-triggered Scheduling without FDMA), one control system with the largest state discrepancy, i.e., the difference between the current predicted state using Kalman filtering and the previous received/predicted state is larger than a predefined threshold, is scheduled at each time to transmit the state/action with fixed transmission power [12], [13]. In Baseline 4. (Event-triggered Scheduling with FDMA), each sensor/controller transmits its state/action with fixed transmission power based on its stability condition, i.e., difference between the current and previous states is less than a predefined threshold, using FDMA [25]. In Baseline 5. (Ideal Control Scheduling), all control systems simultaneously transmit their states/actions with ultra-low latency and high reliability over perfect channels [9]. Results are collected over ten independent simulation runs, and each simulation is run for K = 90 discrete-time steps.  Average Control Error Vs. Control Systems. Fig. 3 illustrates the average pendulum angle to the vertical center of each control system, i.e., the average control error of each control system, during 90 control time steps. As shown in Fig.3(a)   However, the number of control systems that require frequent scheduling in the actuating link is larger than that of the sensing link which stems from the fast dynamics related to the invertedpendulum system that require a quick appropriate action.  controller AoI, and actuator AoI of a randomly chosen control system in low and large number control systems regimes. In Fig 5(a), the state trajectory of the proposed stability-aware with GPR scheduling and event-triggered with FDMA scheduling, in a low number of control systems, are extremely close to that of the ideal control system where their pendulums remain upright over time. Meanwhile, the state trajectory of the opportunistic without GPR scheduling is slightly better than that of the event-triggered without FDMA scheduling by keeping all pendulums upright over time due to the scheduled control system with a favorable channel state. Finally, the state trajectory of the round-robin without GPR scheduling slightly matches the desired state at the cost of a high communication rate and fixed transmission power.
As shown in Fig. 5(a), the controller and actuator AoI of the proposed stability-aware scheduling equals one, i.e., the sensing and actuating links of a control system are synchronously scheduled, until 15 control time steps. This is to guarantee that the received actions to action GPR has a low state/action estimation uncertainty that affects the GPR state/action prediction stability and control stability. Then, the actuator AoI starts increasing compared to the controller   turn deteriorates the predication accuracy hindering control stability [26].
Moreover, the direct and recursive GPR approaches for a low number of control systems require frequent scheduling at the early phase (until 45 control time steps in the sensing link), during which the amount of state observations ensures the state prediction credibility. Meanwhile, the recursive and direct GPR approaches for a low number of control systems require only 15 control time steps in the actuating link, within which the amount of control actions ensures the action prediction stability. Fig. 6 highlights that the AoI increase deteriorates the control performance when increasing prediction uncertainty until the training set has a sufficient number of observations to ensure steady-state state/action prediction. Finally, Fig. 6 illustrates that the state/action prediction of the recursive GPR approach converges faster to the steady-state compared to the direct GPR approach. This affects the scheduling decision by increasing the controller and actuator AoIs of the recursive GPR approach at the cost of a low state/action prediction accuracy and control performance compared to the direct GPR approach.
Number of Served Control Systems Vs. Time Interval. Fig. 7 presents the final capacity of the scheduled control systems over two different time intervals. We assume that a scheduling method successfully control several controls systems for all control systems within θ i,k ≤ 0.05 error region for 10 independent simulation runs. As observed in Fig. 7, the proposed approach

VI. CONCLUSION
In this work, we proposed a novel stability-aware scheduling algorithm based on the communication and control co-design exploiting analog uncoded communication and GPR based to reduce the required communication rate and ensure control stability through maintaining prediction stability. We performed extensive simulations for low-latency control systems to demonstrate the effectiveness of utilizing the GPR based approach, as well as, the effectiveness of decoupled scheduling between the UL and DL communications to support a large number of control systems compared to the scheduling baselines. in (14), it holds that the expected current Lyapunov value of the system is In (48), the first term is a constant as the expectation is taken w.r.t. the state prediction error e u i,k , while the cross-terms such as E[x u T i,k Ze u i,k ] and E[e u T i,k Zx u i,k ] can be canceled since the predicted statex u i,k and the state prediction error e u i,k are uncorrelated. Hence, the expected current value of L(x i,k ) is given as where the last term in (49)  As a result of the remote sensing-loop state evolution in (19) and the open-loop state evolution in (18), the expected future value of L(x u i,k+1 ) in (25) of the UL transmission is given as For a given predicted statex u i,k , state prediction error covariance matrix J u i,k , predicted actionû d i,k , action prediction error covariance matrix J d i,k , wireless UL channel H u i,k , and UL transmission power P u i,k , the expected future value of L(x u i,k+1 ) of the UL transmission in (50) is given as Step (1) is a result of using the quadratic Lyapunov function. The step (2) holds when applying the remote sensing-loop and open-loop state evolution in (19) and (18), respectively. The step (3) holds using the expectation in (51) with respect to the state estimation error e u i,k , the action prediction error e d i,k , and the plant noise w k . As a consequence of obtaining the expected current Lyapunov value in (49) and the expected future Lyapunov value of the UL transmission in (51), the control stability constraint in (25) for the UL transmission is After rearranging the terms in (52), the constraint on the UL transmission indicator variable is To capture the overall state evolution of each control system in the UL over time interval of length K, according to the time-averaged Lyapunov [27], (53) is summed over time k∈{0,··· ,K−1}, then the result is divided by K and taking the limit as time tends to infinity. This yields the UL transmission indicator variable constraint in (27).
Similarly, we obtain the DL transmission indicator variable constraint in (28) according to the remote actuating-loop state evolution in (20) and open-loop state evolution in (18). Finally, the expected future Lyapunov value of the UL-DL coupled transmission is obtained using the closed-loop state evolution in (21) and the open-loop state evolution in (18) For a given predicted statex u i,k , the state prediction error covariance matrix J u i,k , the predicted actionû d i,k , the action prediction error covariance matrix J d i,k , the wireless UL channel H u i,k , the wireless DL channel H d i,k , the UL transmission power P u i,k , and the DL transmission power P d i,k , the expected future value of L(x u i,k+1 ) of the UL-DL coupling transmission in (54) is given as Given the expected current value of L(x u i,k ) in (49) and future of L(x u i,k+1 ) of the UL-DL coupled transmission in (55), the control stability constraint in (25) for UL-DL coupled transmission is The UL-DL transmission indicator variable constraint after rearranging the terms in (56) is: At last, we apply the time-averaged Lyapunov [27] to obtain the overall state evolution per control system in the UL-DL coupled transmission and UL-DL coupled transmission indicator variable constraint in (29).