Multimodal spatio-temporal-spectral fusion for deep learning applications in physiological time series processing: A case study in monitoring the depth of anesthesia

Physiological signals processing brings challenges including dimensionality (due to the number of channels), heterogeneity (due to the different range of values) and multimodality (due to the different sources). In this regard, the current study intended, first, to use time-frequency ridge mapping in exploring the use of fused information from joint EEG-ECG recordings in tracking the transition between different states of anesthesia. Second, it investigated the effectiveness of pre-trained state-of-the-art deep learning architectures for learning discriminative features in the fused data in order to classify the states during anesthesia. Experimental data from healthy-brain patients undergoing operation (N = 20) were used for this study. Data was recorded from the BrainStatus device with single ECG and 10 EEG channels. The obtained results support the hypothesis that not only can ridge fusion capture temporal-spectral progression patterns across all modalities and channels, but also this simplified interpretation of time-frequency representation accelerates the training process and yet improves significantly the efficiency of deep models. Classification outcomes demonstrates that this fusion could yields a better performance, in terms of 94.14% precision and 0.28 s prediction time, compared to commonly used data-level fusing methods. To conclude, the proposed fusion technique provides the possibility of embedding time-frequency information as well as spatial dependencies over modalities and channels in just a 2D array. This integration technique shows significant benefit in obtaining a more unified and global view of different aspects of physiological data at hand, and yet maintaining the desired performance level in decision making.


Introduction
Time-series classification is one of the main parts of many recognition tasks.In medicine, time-series classification can be used for analyzing of physiological data.In the case of physiological signals, dealing with large amounts of heterogeneous multimodal data can be considered as a major practical and technical barrier, and would therefore be of high value to transform this high-dimensional diverse data modalities into a unified form which not only preserve as closely as possible all relevant information, but also improves prediction in clinical investigation.Considering the importance of this issue, combination of physiological time-series from multiple sources and gather that information into a compact space is becoming a research hotspot in recent years.
Fusion techniques applied on physiological data have been used in many studies ranging from clinically detection of patterns in patient pathology to emotion recognition.The introduced methods include employing the weighted combination of measures [1], weighted summing of predictive model outcomes [2], the weighted combination of high-level features from different layers of deep model [3] and transforming the original signals into another domain and then combining transformed data [4].
Assessment of the anesthesia depth, as an example being addressed in this study, could be generalized as a time-series classification problem, in which a model is required for categorizing unknown patterns into known clusters based on quantifiable characteristics captured from time-series.Term of anesthesia is generally referred to a reversible state induced by drug, including different conditions of baseline (BL), infusion, loss of verbal contact (LVC), slow-wave activity (SWA) and burst suppression patterns (BSP).One challenge along with anesthesia from the clinician's point of view is measuring its depth and monitoring the process of loss of consciousness.One way of measuring depth of anesthesia involves the analysis of brain activity, commonly monitored by Electroencephalograph (EEG).Anesthetic-induced unconsciousness was found to accompanied by changes in different neural activities in cortical brain areas.The findings yield a wide range of studies investigating how EEG changes contribute to depth of anesthesia.
Bispectral (BIS), as a weighted sum of several sub-parameters in time and frequency domain, is the most studied EEG-derived parameter for monitoring of general anesthesia which could be considered as a feature fusion of EEG data.However, this index was reported to be a measure of certain drugs' effect and is not a true reflection of brain activities during transition between different levels of anesthesia [5,6].Furthermore, this index was developed based on data coming from adult patients, without considering the infants as well as younger groups [7].This EEG-based index maps the nonlinear phases of the EEG dynamics, contributing to the anesthetic drug effect, into an easy-to-use dimensionless number [8].
More recently, combination of feature extraction of EEG time series and learning based methods has been widely used in literature for assessing the level of anesthesia.This includes features in time, frequency, and time-frequency domains, ranging from different types of entropy to alpha-ratio, which are then fed into a model for training [9][10][11].
Image processing-based techniques was also used in recent years for monitoring depth of anesthesia in which the EEG spectrum information is extracted as a 2D representation and fed as input to a convolutional neural network (CNN) for classifying the anesthesia states [12].
Despite the wide application of EEG-derived indices for measuring the level of consciousness, they were reported to be inaccurate in certain conditions.Therefore, there is a trend towards considering more modalities such as electrocardiogram (ECG) in quantifying loss of consciousness.The QT interval (the time from the start of the Q wave to the end of the T wave in the QRS complex) and heartbeat dynamics in an ECG signal was reported to be highly correlated with anesthetic induction [13,14].More importantly, it was proved that ECG-derived parameters are more indicator of an individual's anesthetic state compared to the EEG-based ones [15].A recent study introduced an ECG-based parameter based on the similarity between the statistical distributions of R-R intervals (the time elapsed between two successive R-waves of the QRS complex) in consecutive ECG epochs, which was then fed to an artificial neural network (ANN) model for measuring depth of anesthesia [16].
What remains to be addressed by the anesthesia technology community is investigating the variation in joint EEG-ECG dynamics during transition between different states of anesthesia which, first, requires synchronous measurement of these modalities, and then modality fusion.
Regarding synchronous EEG-ECG measurement, a recent investigation suggested the role of forehead BrainStatus electrodes, with additional channel of ECG, in demanding clinical environment such as intensive care unit (ICU) [17].What remains unclear is that how obtained parameters by this device could be fused to capture any variation in spatio-temporal-spectral distribution during different stages of anesthesia.
The main hypothesis in this research is that classifying the anesthesia states using bio-signals as an example of time series classification problem could be converted into the visual pattern recognition problem in which information from time series of different sources in both time and frequency domains encodes as a single 2D representation.
The major contributions of this research are as follows: 1) This study is pioneer in exploring the use of fused information from EEG and ECG recording in tracking the transition between different states of anesthesia.2) The proposed fusion technique in this study provides the possibility of embedding time-frequency information as well as spatial dependencies over modalities and channels in just a 2D array.3) This study is the first attempt in investigating the effectiveness of pre-trained state-of-the-art deep learning architectures for learning discriminative features in the fused EEG-ECG data in order to classify the states during anesthesia.
Beside the above-mentioned contributions, the proposed technique here in this study is a data-level fusion, simultaneously integrating significant information in all domains, and yet compressing the physiological data directly at the source.This proposed fusion technique is important in a sense that reduction of communication load to other device or to the cloud requires to locally extraction of information from raw data stream in the sensor level.So fused raw data in a compressed form is super important in terms of minimizing the amount of data need to be stored or need to be sent.It is also important in saving battery power and reducing transmission time.

Data collection and preprocessing
The study was approved by the Northern Ostrobothnia Hospital district local ethics committee (82/2018).Twenty adult patients (Table 1) scheduled for an elective surgical operation gave an informed written consent to participate.Patients with cardiovascular or neurological diseases or a body mass index over 30 were excluded.No premedication was used.During the study, the patients were monitored according the standard procedure of the operating room.In addition, EEG and ECG was recorded using the BrainStatus self-adhesive electrode and wireless device and a tablet computer on which the signals were observed online.The EEG channels included were Fp1, Fp2, F7, F8, Af7, Af8, Sp1, Sp2, T9, and T10.The signals used in the analysis were recorded during the induction of anesthesia with propofol.The procedure included the following steps: 1) Baseline recording of at least 2 min.2) Beginning of propofol infusion with a fixed rate of 30 mg/kg/h.3) Observation of the moment for loss of obeying verbal command (LVC) i.e. the time at which the patient stops squeezing anesthesiologist's hand after command ("squeeze my hand").4) Observation of the moment for occurrence of burst suppression pattern (BSP) i.e. the time at which clear suppression periods occur in the EEG.5) Ending of the recording after at least 2 min of BSP.
The analysis is preceded by high-passed filtering at 0.1 Hz, lowpassed filtering at 32 Hz.The sequences were visually inspected and those sequences including different artifacts were excluded from further analysis.

Information fusion based on time-frequency ridge
The EEG frequency variations over time are one important indicator during the process of anesthesia.As mentioned in introduction section, the focus of previous works on using ECG signal for evaluating different states of anesthesia are mostly limited to performing analysis on the R-R intervals only, instead of on the raw ECG.However, an invention for determining the probability of response to pain claimed that the FFT on the ECG could be considered as a noisier yet more comprehensive data than the RR, which provides more information.This international patent compared ECG spectrum of a subject under anesthesia with an awake one [18].The ECG-spectrogram could be able to map coupled oscillations of heart rate variability and respiratory [19].Respiration causes peaks in the low frequencies (< 0.3 Hz) of the ECG spectrogram [20].
Another study also showed that the ECG-spectrogram more closely illustrates the energy distribution of the ECG signal [21].According to another research, quantifying heart rate variability (HRV) is also possible by taking the time-frequency spectrum of the ECG signal [22].Heart Rate could be directly estimated from ECG spectrogram by analyzing the bins around the frequencies related to heart rates, instead of using the RR-interval data [23].ECG-spectrogram was also introduced as one of the biomarkers of a desirable cardiovascular regulatory state [24].Power in the frequency band of (0.20-2.00 Hz) in the ECG spectrogram was also mentioned in literature as one of the heart rate variability metrics in the time-frequency domain [25].Applying time-Frequency spectrogram was subject of some works in ECG signal classification studies as well [26,27].Based on mentioned above, this section recommends fusing of EEG time-frequency information with one derived from ECG signal.
The best way of representing how energy levels vary over time at different frequencies is Spectogram.Power spectrum as a classic tool of frequency analysis, representing the signal power computed in the sliding window approach to extract the activities in different frequency bands.Extracting time-frequency ridges could be an efficient way to simplify Spectogram representation.
Time-frequency ridges could be defined as curves in the timefrequency plane which represent the instantaneous frequency of signal component.These curves are found by maximizing the absolute value of time-frequency matrix at each time point.These ridges of timefrequency representation include important information as they spot the areas of time-frequency plane where most of energy of the EEG signal is concentrated around.

Applying pre-trained deep learning architecture on fused EEG-ECG data
To train a network on the fused EEG-ECG data, the global normalized time-frequency ridge matrix is taken and displayed as an image with full range of colors.The resulting image has j × k pixels where j is the number of rows and k is the number of columns in global normalized time-frequency ridge matrix.The activations of last convolutional layer extracted from the training images of global time-frequency ridge was then used as predictor variables to fit a multiclass support vector machine.
As training from the scratch on relatively small-scale datasets is susceptible to overfitting, most studies tend to use pretrained models for extracting deep features [28][29][30][31].These pretrained classification networks has already learned on more than one million images.As these networks are trained on extremely large datasets, they are capable of being served as a generic model.
Building a new deep structure on the target task from the scratch involves a time-consuming process of hyperparameter tuning, and usually doesn't end up with promising results.In transfer learning, a deep network can take knowledge from image recognition task and adopt it to another classification task.Here in this study, the global normalized time-frequency ridge matrix as the output of proposed fusion algorithm can be displayed as an image.So, fused EEG-ECG data is converted into a single 2D image and then classified by taking advantage of transfer learning.The reason it can be efficient is that learning a lot of data-level features including dots, lines, edges, curves etc., from a very large image dataset would be helpful for algorithms to learn better and faster from other small datasets like anesthesia EEG-ECG dataset which there is not that much data available for it.It is just about transferring knowledge about basic structures and data-level features extracted from a lot of images (more than millions of examples) to a problem with relatively less data (hundreds or thousands of examples).Therefore, this study used layer activations of different pretrained state-of-the-art deep learning architectures as features to train a support vector machine (SVM) for classifying different stages of anesthesia.The applied pretrained models have different characteristics in terms of accuracy, speed, and size.The parallel computing platform of Tesla P100-PCIE-16GB used for implementing these deep structures.The Table 2 lists the depth, size and number of parameters related to each pretrained networks used to extract features from fused data.

Comparison study
In order to demonstrate the compared advantages of proposed method, classification of anesthesia states based on other ways was also performed.One common way for data-level fusing of physiological time series including EEG, ECG and EMG recordings is to concatenate the raw signals from multiple sources in the form of a 2D array [39].A similar approach was also applied for fusing accelerometers and gyroscopes data by vertically stacking time-domain sequences of different modalities and forming a 2D array [40].This fusion technique makes it possible to simultaneously capture the dynamics of data over time and spatial dependencies over modalities.Another technique presented in literature for combining several sources of raw data is to vertically concatenate spectrogram images of modalities into a larger image [41,42].This fusion method is capable of simultaneously capturing dynamics in all time, frequency and spatial domains.

Summary of the analysis conducted
The conducted analyses in this research provided the following results:

Calculation
In mathematical term, time-frequency ridges could be defined as the points satisfying following equation: where X is the time-frequency representation of x(t) EEG signal.
Above maximation could be considered as an optimization problem which can be solved by algorithms like penalized forward-backward greedy.The goal of this maximation is to find the widest region of positive amplitude around the peaks in time-frequency plane in each time point.
The hypothesis in this study is that each anesthesia stage could be represented by a unique distribution of time-frequency ridge.
In what follows, the proposed steps for fusing EEG and ECG recordings using global ridge index has been summarized: ) .
3 Calculate the time-frequency representation of S k has q rows correspond to frequency bins and p columns correspond to time steps.
5 Calculate the value for the S (r) k,1,2 element according to the following equations, based on its distance from other rows and considering penalty value of ρ.The penalty term is two times the squared distance between frequency bins.This penalty term is needed in order to ensure the smoothness of ridge curves and to decline the effect of noise.It filters sharp and abrupt pulses in the time-frequency ridges.
6 Determine the minimum of penalized values and corresponding bin.
k,1,2 element with determined minimum value of 8 Update the values for the remaining elements in column 2 using the same process 9 Consider subscript for each updated element, indicating the index of the bin in the previous column from which a value came 10 Repeat the process for column number of c = 1…p and obtain the final updated matrix 11 Move back in time through the updated matrix by moving from the current frequency bin to the origin of that frequency bin at the previous time step.12 Form the path composing the first time-frequency ridge by keeping the track of the specified indices 13 Remove the first time-frequency ridge from the time-frequency matrix and repeat the process to extract three time-frequency ridges with highest energy.
14 Sum the power spectral density of three extracted ridges and form the integrated ridge vector 15 Repeating the process for r = 1 … (m + 1) and forming the global time-frequency ridge matrix and then normalizing it.
16 Concatenate the global normalized time-frequency ridge matrix into a single vector for each time sample 17 Calculate global ridge index as median value of obtained concatenated vector N. Bahador et al.Fig. 1 shows the flowchart, clarifying the overall procedure of proposed fusion method.

EEG temporal patterns during anesthesia
Fig. 2 shows temporal evolution of EEG signal over an epoch of 763.316 s for different anesthetic states including BL, infusion, LVC, SWA and BSP.Following the temporal variations in this figure, repetitive single high amplitude sharp waves with a random pattern and a wide distribution of amplitude were observed during BL and beginning of infusion period.By deepening anesthesia prior to LVC state, this pattern switched into a narrow amplitude distribution with nonrhythmic pattern which indicates a drop in brain wave amplitude and continued until post-SWA period.Thereafter, slow-wave activity with slightly higher amplitude appeared by infusing more drug and this trend switched into the bursts or suppressed epochs with highly variable amplitude and duration at the end of the sequence.

EEG spectral patterns during anesthesia
Power spectrum of a recorded 763.316-second epoch was plotted in Fig. 3 Considering Fig. 3, at the beginning of infusion period, there was a relatively abrupt change in color from tones between green and light blue to dark blue tones in the high-frequency band (>15 Hz), representing of an increment in the beta activity of EEG signal.More green tones in the high-frequency band (>20 Hz) were observed in post LVC compared to the pre LVC, showing a growing reduction in beta frequencies.In post LVC, the dark blue tones became predominant, which indicates a drop in brain wave amplitude.Continuous narrow pink tones in the low-frequency band (<4 Hz) showed that dynamics in post SWA tends to follow a delta dominant pattern.Wide vertical green-colored stripes in post BSP were indicators of suppression epochs in very deep anesthesia.In post BSP, there was also an increase in the violet tones around 10 Hz frequencies representing an increment in alpha power during burst-suppression pattern.

Ridge index pattern during anesthesia
Some examples of extracted ridge curve with their corresponding histogram for both SWA and BSP periods were plotted in Fig. 4

Results of applying pre-trained deep learning architectures on the fused EEG-ECG data to classify anesthesia states
A sample resulting image with 11 × 271 pixels for fused data based on 10-channel EEG and one-channel ECG recordings has been plotted in Fig. 8.I.
Fig. 8.II compares the classification precision and relative prediction time of different deep architectures trained on global time-frequency ridge images.According to the Fig. 8.II, it can be found that using SqueezeNet network led to very encouraging classification results and have the best performing among other models with precision of 94.14% within 0.28 s.The percentage of correct and incorrect predictions of different state of anesthesia regarding the SqueezeNet is plotted in Fig. 8.III.
Various statistics calculated from the confusion matrix for comprehensive study has been listed in Table 3.Based on classification results, it can be found that how well classification of anesthesia stages by taking advantage of proposed fusion was done.

Figures of 9.I and 9
.II show representations of inputs derived from mentioned techniques in Section 2.4, being then fed to SqueezeNet network.The classification precision of applying SqueezeNet on fused data based on vertically stacking of time-domain sequences and spectrograms respectively reached to 68.0% and 83.2%.By comparing the classification results of these two methods (figures of 9.III and 9. IV) with those obtained from fusing EEG and ECG data based on global timefrequency ridge (Fig. 9.III), it can be found that how well classification of anesthesia stages by taking advantage of proposed information fusion was done and how far it competes with the results of applying other techniques on the same data sets in terms of both precision and relative prediction time.
For more evaluation, recurrent neural network (RNN)-based method was also considered for comparison.RNN architecture includes weighted recurrent (time-delayed) connections in which output of network depends not only on the current inputs, but also on both previous outputs and previous internal states.The studied RNN network included different topologies of recurrent connections with delay in the hidden layers, inputs and outputs.The Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm was considered for gradient descent method.The termination error was set to 1e-3.Fig. 10 shows the recurrent neural network including three hidden layers.
Two different RNN-based approaches were studied.First, the performance of Squeezenet_RNN algorithm was evaluated in which the high-level features extracted by Squeezenet were fed to a RNN model.Second, RNN algorithm was directly applied on raw signals.For each approach, different topologies of RNN were investigated.Fig. 11 compares the output of Squeezenet_RNN algorithm for both training and test data.Tables 4 and 5 summarize the precision value for different hyperparameter settings.According to the results, the model performs well during training, and very poorly during testing phase.The results of implementing different topologies demonstrates that a small change in the values of hyperparameters can highly affect the performance of model.

Dealing with missing values
Physiological timeseries often contain large missing segments due to poor electrode contact or other artifact contamination, and this data imperfection affects the final representation of proposed data fusion technique to a great extent which makes it required to be recovered.This reconstruction could be done both in signal-level and in image-level analysis.In signal-level analysis, this is based on the assumption that the corrupted channel and other neighboring channels share the same morphological and statistical structures.In image-level analysis, it is assumed that missing region in 2D image has correlation with local neighboring pixels as well as corresponding region in other neighboring bands [43].
Inspired by abstract topological representing of signal space by unsupervised learning, this section aimed to reconstruct complex time series along with preserving their temporal morphology.In so doing, this section invokes the idea of using nonlinear principal component analysis (NLPCA) to reconstruct missing signal from neighboring channels.NLPCA is an extension of linear PCA to a nonlinear one, being implemented using a multi-layer perceptron of an auto-associative topology.The idea behind using nonlinear PCA is that modeling the inherent complex structure of the data by curved subspaces is the only

Dealing with artifacts
One major concern in fusing biomedical time series in non-isolated demanding environment such as intensive care unit (ICU) or any other noisy environment, is varying nature of characteristics of different artifacts in time, frequency and spatial domains.To investigate the possibility of detecting different types of artifacts in fused data, an EEG dataset collected from 15 patients treated in the ICU of Oulu University Hospital was studied.The patients had no history of serious neurological disease and were 18-85 years in age.During the recording, the patients were not mechanically ventilated and were recently diagnosed with hyperactive delirium.Delirium was treated with administration of dexmedetomidine following the ICUs standard protocol to keep the patients moderately sedated.The study was approved by local ethics committee.Written informed consent for participation was obtained either from the patient or his/her relative.The EEG data was recorded with the BrainStatus self-adhesive electrode and BrainStatus wireless amplifier with a sampling frequency of 250 Hz.The EEG recordings contained 10 channels (Sp2, T10, Af8, F8, Fp2, Fp1, T9, F7, Sp1, AF7).The recorded sequences contained EMG artifact with high amplitude and sharply contoured transients, EOG artifact with high amplitude and low frequency activity and morphology of steep fall followed by the slower rise, transient artifact with high amplitude sharp waves, trend artifact with nonlinear curved pattern, and powerline interference artifact with relatively high amplitude single-frequency harmonic noise.
As show in Figs. 15 and 16, each 2D representation of fused channels contaminated by artifact has a unique color distribution associated with each types of artifact.
Figs. 17-19 respectively show the generated 2D representation for fused EEG channels related to negative score and positive score, and EEG channels contaminated by artifact.According to the figures, there is a visible difference in the color patterns for fused data with and without artifacts.Therefore, a deep network can learn specific patterns in generated 2D representation related to each types of artifact.
The results of confusion matrix in Fig. 20 shows that feeding 2D representation of fused EEG channels to SqueezeNet network led to encouraging classification of sequences related to both negative and positive scores as well as sequences contaminated by different artifact.The results proved the capability of a deep network in learning specific patterns in generated 2D map related to each types of artifact.N. Bahador et al.
Various statistics calculated from the confusion matrix after applying SqueezeNet on fused modalities has been listed in Table 6.Based on classification results, it can be found that how well the model distinguished sequences related to negative and positive scores and sequences contaminated by different artifacts.

The performance of proposed algorithm on other applications
To investigate the potential of proposed method on other applications, the proposed fusion algorithm was also applied to an open dataset associated with the activities of daily living (ADL).The dataset was collected from 10 healthy participants, performing 186 ADL-related   .The recorded data includes quaternions (with resolution of 0.0001), accelerations along the x-, y-and z-axis (with resolution of 0.1 mG) and angular velocity along the x-, y-and z-axis (with resolution of 0.01 • per second).Data annotation for all the experiments was manually performed based on videos recorded by an RGB camera [45].
Fig. 21 shows 2D representation of fused data generated from different modalities captured during different activities of walking, opening the door, closing the door, pouring water, drinking from glass, brushing teeth, and cleaning the table.
Various statistics calculated from the confusion matrix after applying SqueezeNet on fused modalities has been listed in Table 7.Based on classification results, it can be found that how well classification of activities of daily living by taking advantage of proposed fusion was done.

Discussion
Exponential growth of the amount and complexity of time-series data generated in biomedical fields is increasing the need for mapping the raw input space into a more comprehensive space of less dimensionality   and producing high levels of abstraction and yet rich representation of raw space.Statistical models assume that the time series can be described through a parametric random process in which the values of the parameters like amplitude and frequency contribute to specific characteristics and they need to be estimated.The best example of these models is RNNs.The superiority of RNNs in biomedical time series analysis have been proven [46].However, in RNNs, highly sampled inputs can lead to elongation of the range and consequently affect the network optimizability and training speed [47].Furthermore, it requires very large and diverse dataset to train.Collecting of such large dataset in medical field for studying a rare disorder may not be feasible [48].One way to handle the limited-size datasets is to transfer knowledge from diverse pretrained models to the targeted problem.In transfer learning, retraining happens in a much lower learning rate than the original one and for this reason it has been used for classification tasks in highly sampled long-sequence biomedical time series [49].
To the best of our knowledge, there has been no single study that combine EEG and ECG data for tracking the depth of anesthesia.Also, no research has been investigated the use of transfer learning in differentiating different anesthetic sates based on joint EEG-ECG dynamics processing.
Instead of applying raw data, many methods exist in the literature try to address data-level fusion and perform the decoding of biomedical signals into more abstract representation and then use it to train RNN models.One method combined standard deviation vector extracted from multi-channel EMG as an input for the LSTM-based RNN [50].Another one transformed raw sEMG signals into an image representation being fed into a hybrid CNN-LSTM-RNN model [51].One technique converted multi-channel raw EEG signals into a mesh-like representation being fed to LSTM-based RNN [52].In one study, the log energy entropies of EEG channels were combined to feed a RNN classifier [53].In another method, the linear frequency cepstral coefficients extracted from multi-channel EEG were fused prior to feeding into the CRNN [54].As another technique, the power spectral densities of EEG signals were first dimensionally reduced and then fed to a RNN [55].One method converted signals of EEG, EOG, and EMG into time-frequency representation and fed into a BRNN [56].In another study, Raw EEG signals were converted to spectrogram and fed to a 5-layer LSTM-based RNN [57].One method combined the spectrograms of the ECG signals before feeding into the LSTM units [58].
A recently proposed technique fused ECG-driven features within two    through a process of developing separate generative models for each modality based on independent component analysis and then mapping their results into t2 statistics and merging them by applying exponentially weighted moving average [62].One research introduced a modified version of Daubechies transform for fusing spike shape information and time-frequency patterns of biological signals [63].Training separate classifier for EEG signal of each channel and combing their output through a modified weighted majority voting process is another method suggested for fusing information within multi-channel EEG recordings [64].Using hybrid deep learning model including a convolutional neural network (CNN) and a recurrent neural network (RNN) respectively utilized for extracting spectral correlation of channels and integrating extracted features as a 3D frame cube representation is another approach proposed in literature for deep fusion of multi-channel  neurophysiological signal [65].It should be also mentioned that most of these researches used full cap for EEG recording like 128-electrode and 256-electrode brain caps [66,67].Some timeseries fusion techniques also derived their inspiration from methods used in fusing of remote sensing images.These methods include 1-finding maximum a posteriori probability (MAP) of adjacent local group of pixels in different bands [68] 2-using the weighted summation of neighboring pixels in different bands as a fused representation.3using correlation matrix taken from each pair of bands as a joint fusion [69,70].Inspired by these, weighted summation of neighboring channels as well as correlation matrix among channels have been used for physiological timeseries fusion.
Regarding simultaneously considering information in all time, frequency and special domains, the fusion techniques in physiological timeseries mostly focused on learning features from a concatenated structure of deep recurrent and 3D convolutional neural networks.For instance, in a recent study, 2D EEG topographic map in different frequencies are concatenated into a single cube and fed to a hybrid CNN-RNN architecture.In this architecture, a vector of features is first extracted using 3D CNN.The output of the 3D CNN is then fed to an RNN network [71].First, such techniques cannot be considered as data-level fusion in which several sources of raw data are combined.Second, these methods increase dimension of input space to provide more details in different domains which lead to more computationally heavy    processing.Moreover, as shown in the revised manuscript, the results of hybrid CNN-RNN architecture are highly affected by the setting and adjustments of hyper parameters.Even though the above-mentioned techniques are not comparable as they are not tested on the same dataset and same modalities, some of these approaches require setting and adjustments of hyper parameters.Those methods based on extracting hand-engineered features may not meet the need of long-term monitoring, as the computation of those features sometimes can be really time consuming, particularly when non-linear features should be extracted from long time series, and the results of feature-based methods are highly affected by the parameter settings, the length of the epoch, and dimension of data.The complexity of some of the mentioned techniques could be also problematic as they require separate models for each modality.Finally, comparing to full EEG cap, the easy-to-use device used in this study is much more suitable for demanding clinical environment.In this regards the proposed method could be said to be more effective compared to others.
Although deep architectures like recurrent neural networks have achieved significant success for sequence analysis, they're bad at dealing with very long sequence of physiological signals with complex and nonuniform distribution.The main challenge in supervised analyzing of physiological timeseries is to capture the long-range temporal dependencies which is limited by information dilution and gradient vanishing in recurrent neural networks.Because of curse of dimensionality, the training complexity of network increases exponentially along with the signal's length.It also results in very long training times.Training a light RNN model for long-sequence physiological timeseries may takes 12 to 36 h.Compared with RNN-based approaches, the proposed method does not need hyperparameter tuning, and shows less sensitivity to the length of the sequence.Time-frequency ridges can reflect the global time-frequency signatures within long-sequence physiological timeseries in a very efficient and easy-to-implement way.Moreover, the proposed technique is robust to the noise disturbance as well as outliers and irrelevant components in time-frequency plane, because energy of noise is randomly distributed within the time-frequency plane while the energy of signal's relevant components is concentrated along the timefrequency ridges.
The extension of these research findings should also be conducted in future to confirm the applicability of BrainStatus electrode for the recovery phase from anesthesia.
Despite the same biochemical interaction of other anesthetic drugs which likely produce same results as those presented in this study, a broader study is needed to prove this.
Further analysis needs to be done in order to test the applicability of the findings obtained from healthy-brain patients to other patient groups with different brain injuries.

Conclusion
This study presented a general framework for implementing efficient fusion on time series based on time-frequency ridge mapping.The promising classification results reaching precision of 94.14% showed that global time-frequency ridge can reliably quantify the transition from wake state to the deep anesthesia, as it provides a simple abstract representation of joint EEG-ECG dynamics over both time and frequency as well as spatial dependencies over modalities and channels.

Declaration of Competing Interest
None.

1 -
The first comparison study was designed to compare the classification performance of different pre-trained deep learning architectures being fed by (I) 2D map obtained from proposed fusion technique, (II) 2D map based on vertically stacking raw time-domain sequences, (III) 2D map based on vertically stacking spectrograms.2-The second comparison study was designed to compare the performance of different topologies of RNN models being trained on raw data with transfer learning-based model being trained on 2D map extracted from proposed fusion algorithm 3-Another study was designed to see the possibility of artifact identification and removal on fused data.4-The performance of proposed algorithm on other applications like classifying activities of daily living was also studied.

⎞⎠
and y = ( y 1 ⋯ y n ) be respectively EEG and ECG time series for each time sample of i = 1…n and each EEG channel of j = 1…m. 2 Build a s (m+1)∩(L) -dimensional time representation for window length of L and each window number of k = 1… ( n L

k
(t, ω) using power spectrum analysis for each row of r = 1 … (m + 1).4 Assume the time-frequency matrix of S (r)

Fig. 4 .
Fig. 4. Comparison of time-frequency ridge curves for EEG between SWA states (Even sequence) and BSP states (Odd sequence).

sequences) and 5 (
ECG sequences).According to these figures, SWA and BSP periods has different ridge curve with different distribution.Fig. 5. EEG signals and the corresponding spectrogram and extracted global ridge index recorded under anesthesia was plotted in Fig. 6.Considering the trend in extracted ridge indices, all the subjects almost followed a similar pattern.According to Fig. 6, global ridge index provides the possibility of enhancing time-frequency representation by sharpening local maxima.It also has a small filtering role and noise reduction by smoothing the sharp transient effect.Similar figures extracted for ECG sequences are presentenced in Fig. 7.As clear in figures, temporalspectral progression patterns were captured by time-frequency ridge index.

Fig. 5 .
Fig. 5. Comparison of time-frequency ridge curves for ECG between SWA states (Even sequence) and BSP states (Odd sequence).

Fig. 6 .
Fig. 6.EEG Changes in unnormalized form of ridge index in different subjects with increasing depth of anesthesia.

Fig. 7 .Fig. 8 .
Fig. 7. ECG Changes in unnormalized form of ridge index in different subjects with increasing depth of anesthesia.

Fig. 9 .
Fig. 9. Comparing methods results (I) Fused data based on vertically stacking raw time-domain sequences; (II) Fused data based on vertically stacking spectrograms; (III) Classification results of SqueezeNet learned on stacked raw signals representation (relative prediction time of 1.403 s); (IV) Classification results of SqueezeNet learned on stacked spectograms representation (relative prediction time of 0.507 s).

Fig. 10 .
Fig. 10.Recurrent neural network including three hidden layers with input, output and internal delays.

Fig. 14 .
Fig. 14.Time-frequency representation of both real and reconstructed channel.

Fig. 15 .
Fig. 15.2D map of fused EEG sequences contaminated by one type of artifact.

Fig. 16 .
Fig. 16.2D map of fused EEG channels contaminated by another type of artifact.

Fig. 20 .
Fig. 20.Confusion matrix for the classification results of validation dataset with Class 0: Artifact, Class 1: Negative score, Class 2: Positive score P.

Table 1
Details regarding all included subjects N. Bahador et al.

Table 2
Properties of pretrained networks used in this study.

Table 3
Comprehensive study of pre-trained deep learning classifier's performance.

Table 4
The results of applying Squeezenet_RNN algorithm for scenarios with different topologies of recurrent neural networks.

Table 5
The results of applying RNN algorithm for scenarios with different topologies of recurrent neural networks.

Table 6
Comprehensive study of pre-trained deep learning classifier's performance for dataset collected from patients with hyperactive delirium.

Table 7
Comprehensive study of pre-trained deep learning classifier's performance for activities of daily living dataset.