Computational Intelligence Methods for BrainMachine Interfacing or BrainComputer Interfacing
View this Special IssueResearch Article  Open Access
Yufeng Yao, Yan Ding, Shan Zhong, Zhiming Cui, "EEGBased Epilepsy Recognition via Multiple Kernel Learning", Computational and Mathematical Methods in Medicine, vol. 2020, Article ID 7980249, 9 pages, 2020. https://doi.org/10.1155/2020/7980249
EEGBased Epilepsy Recognition via Multiple Kernel Learning
Abstract
In the field of braincomputer interfaces, it is very common to use EEG signals for disease diagnosis. In this study, a style regularized least squares support vector machine based on multikernel learning is proposed and applied to the recognition of epilepsy abnormal signals. The algorithm uses the style conversion matrix to represent the style information contained in the sample, regularizes it in the objective function, optimizes the objective function through the commonly used alternative optimization method, and simultaneously updates the style conversion matrix and classifier during the iteration process parameter. In order to use the learned style information in the prediction process, two new rules are added to the traditional prediction method, and the style conversion matrix is used to standardize the sample style before classification.
1. Introduction
Due to the proposal of support vector machine (SVM) [1] and the development of related theories, the kernel method has become an effective method to deal with nonlinear fractional data. Since the performance of the classification algorithm depends largely on the representation of data, the kernel method uses relatively simple functional operations to map samples to higher dimensions, avoiding the design of feature space and complex inner product calculation in feature space. For example, in [2], a fast kernel ridge regression was proposed by using the kernel method. In the last decades, the kernel method has been applied in many fields of machine learning [3–5].
However, some data sets contain samples with uneven distribution, heterogeneous features, or irregular data; the singlekernel method using only a single feature space performs poorly. And since different kernel functions have their characteristics, even in the same application, the effect of using different kernel functions may be very different, which makes the selection of kernel functions and their parameters have an important influence on the performance of the algorithm. Since one kernel function often cannot meet the requirements in some practical application scenarios, multikernel learning that combines multiple kernel functions has been attracting more attention [6].
The combination generated by multikernel learning can be the combination of the same kernel function under different parameters or the combination of many different kernel functions [7]. After years of research, compared with single kernel function, multikernel learning has stronger flexibility, higher interpretability, and better performance in data dimension reduction [8], text classification [9], domain adaptation [10], and other fields.
Although the multikernel learning algorithm fully combines the mapping ability of different kernel functions for data, essentially, it only uses the physical characteristics of samples that include similarity and distance and fails to take into account the implicit information in the stylized data set in the real situation. In practical application, in addition to the representative content information, the data set often contains a variety of style information, and samples with the same style often exist in the form of groups. For example, there are two ways of dividing the letters shown in Figure 1(a), i.e., by the content shown in Figure 1(b) and by the font shown in Figure 1(c), where each font is regarded as a style, and such data is regarded as stylized data.
(a)
(b)
(c)
To mine the style information of data, scholars have done many types of research. The secondorder statistical model proposed in the literature [11] is applied to the problem of number recognition, but it only has a good effect on the data subject to the Gaussian distribution, which leads to a great limitation in the application scenario of the algorithm. The bilinear discriminant model proposed in the literature [12] has achieved good results in behavior recognition data, but the computational cost of the algorithm is relatively high. The domain Bayesian algorithm proposed in the literature [13] improves the naive Bayesian algorithm to identify the style information in the sample group, but it needs to specify a clear data distribution type for the algorithm in advance. However, the distribution of data in real situations is often complex and difficult to be determined in advance. The algorithm proposed in the literature [14, 15] uses a single mapping to mine the style information of samples and achieves excellent results in regression and classification problems, but it makes limited use of the physical characteristics of samples. The timeseries style model of mining sample historical information proposed in the literature [16] and the bilayer clustering model of user’s age and gender information proposed in the literature [17] effectively make use of the style information in the data in the unsupervised problem, but the algorithm is only targeted at specific fields, and the use of style information is limited.
Inspired by the above scholars, we propose style regularization least squares support vector machine based on multiple kernel learning (SRMKLSVM) to excavate and utilize the physical similarities between sample points and the implied style information in samples. In addition to using the physical characteristics of each basic kernel function for data mapping to express the similarity between samples, the algorithm uses the style transformation matrix to represent and mine the style information contained in the data set and takes it into the objective function. In the training process, the alternate optimization strategy is used to update the style transformation matrix in addition to the classifier parameters, and the mined style information is used to synchronously update the kernel matrix. To use the sample style information obtained by training in the process of prediction, two new prediction rules are added on top of the prediction method of traditional multikernel least squares support vector machine. Because the style information contained in the sample is used effectively in the training and prediction process, the experiments of most of the stylized data sets show that SRMKLSVM is relatively recent and the classical multikernel support vector machine algorithm is effective.
2. Related Works
2.1. Multikernel Learning
Let and be two sample vectors; is a mapping function from the input space to the feature space. If there is a function , which can be defined as then we call the kernel function. Multikernel learning expects to achieve better mapping performance by combining different kernel functions. There are many ways to combine [6] kernel functions. In this study, we use the following way to find a final combined kernel function based on basic kernel functions . If we use to represent the kernel function coefficient, then the final combined kernel function is formulated as where
According to Mercer’s theory, the combined kernel function generated by the above method still meets the Mercer condition.
2.2. Least Squares Support Vector Machine Based on Multikernel Learning
Let be the training sample set; and are the label corresponding to . The objective function of the least squares support vector machine (LSSVM) proposed by Suykens [18] can be formulated as where represents the mapped in a high dimension, and are the classification hyperplane parameters, is the error term, and is the regularization parameter.
The Lagrange multiplier is introduced into Equation (4), and its dual form can be further obtained by the Slater constraint specification: where is the kernel matrix. With (11), we can obtain the following two equations, where and .
By integrating K into (2) and (3), we can obtain multiple kernel least squares support vector machine (MKLSSVM) as
Let , and replace with ; we have
It is obvious that (8) is a semiinfinite linear program (SILP) problem, which can be solved by many existing mature optimization toolkits. For an unseen sample , MKLSSVM predicts it by using the following equation:
2.3. MKLSSVM Algorithm Process
The algorithm steps of MKLSSVM is shown in Algorithm 1.
3. SRMKLSVM
3.1. Objective Function
Let be a training set, where the set can be divided into groups according to the style. The samples in each group have the same style, and the superscript is the number of samples in group . is the th sample in group . Under the above definition, the objective function of SRMKLSVM can be formulated as where is the weight coefficient of the kernel matrix, where is the number of predefined kernel matrices, is the style conversion matrix of the sample of style , and is the identity matrix.
The first two subformulas in are standard MKLSSVM expressions, and the third subformula is a penalty term using the Frobenius norm, which is used to control the degree of style conversion of the style conversion matrix to the sample, where the parameter is used. Obviously, when is larger, the deviation of the sample is smaller after style conversion from its original style; otherwise, it is larger; especially when is set, there is .
3.2. Optimization
The goal of the algorithm is to minimize the value of . It is very difficult to directly optimize the objective function. We can use the alternating optimization method to obtain a sufficiently available local optimal solution. When and are given separately, the objective functions are optimization problems about and , and the above two processes are repeated until convergence or the maximum number of iterations is exceeded. To be specific, (1)When fixing , the optimization problem of formula (10) is transformed into The above formula is about the standard MKLSSVM problem of sample after style conversion, and can be determined by Algorithm 1 in Section 2.2 of the article. At this time, the sample mapped to the high dimension cannot be directly calculated, but the synthetic kernel matrix formed by the styleconverted sample can be updated by the kernel method to obtain the styleconverted kernel matrix. The specific method of using the kernel method to obtain the styleconverted synthetic kernel matrix will be introduced in Section 3.3 of the article(2)When is fixed, then the optimization problem of Equation (10) is transformed into
The above formula is a linear constrained quadratic programming problem for , which can be transformed into independent problems for each to be solved. At this time, the parameters of the synthetic kernel matrix and the classifier have been fixed, similar to the original LSSVM, and the dual form can be obtained after introducing the Lagrange multiplier to Equation (12):
Let ; we have
Let get . It can be seen that this formula has the same KKT [18] condition as LSSVM.
Through the process of alternating optimization, it can be known that in the process of training classifier parameters, the samples converted by the style conversion matrix are used as training data. In the first iteration, the style conversion matrix is initialized to the identity matrix. At this time, the samples after the style conversion are the same as the original samples, and no style conversion is generated. Therefore, the classifier parameters obtained by the first round of SRMKLSVM training are the same as the original MKLSSVM. In the subsequent iteration process, due to the optimization of the style conversion matrix, the samples in each style group undergo the transformation of the style conversion matrix and gradually approach the standard style. The classifier parameters trained at this time fully consider the style information contained in the sample as a whole. At the same time, the process of solving the style conversion matrix from Equation (14) not only uses the physical characteristics of the samples obtained by training but also effectively uses the style information in the data. The style conversion matrix trained at this time contains each style group style information. According to the above analysis, the processes of training the classifier parameters and the style conversion matrix make full use of the style information contained in the sample, and the two processes promote each other.
3.3. Style Transformation
Since the dimension after the sample is mapped to the highdimensional space may be infinite, the sample value after the style transformation cannot be obtained directly. At this point, each element in the synthetic kernel matrix can be updated with the help of the kernel method to obtain the synthetic kernel matrix after the style transformation.
Because the synthesis kernel function still has to satisfy the allowed kernel of the theorem, as , let be for the synthesis of the combined map of the core matrix; by formula (9), you can make the synthesis of the core matrix ; let ; you can get after the style conversion of the core matrix elements as where is the core matrix element after style transformation and ; formula (15) can be updated to
Because of formula (16) can be updated to:
3.4. Algorithm
The training algorithm of SRMKLSVM is listed as follows.
 
Algorithm 2. 
SRMKLSVM uses alternate optimization method to solve the problem, which can be divided into two steps. The first step is kernel matrix weight coefficient and classifier parameter optimized steps can be divided into two subprocesses, respectively, i.e., solving the kernel weight SILP problems and solving the linear programming problem of the classifier parameters for the synthesis of kernel matrix at the same time, the time complexity and , respectively. Due to , the total time complexity can be treated as . The second step is to optimize the stylestandardization matrix, and the time complexity of this step is . Therefore, the total time complexity of the algorithm training process is , where M is the number of predefined basic kernel matrices, is the total number of samples, is the number of styles in the data set, and iter is the number of iterations of the algorithm.
Compared with typical MKLSVM, the MKSRLSSVM algorithm is in the process of training in style transformation matrix to the regularization processing style samples, but the multikernel support vector machine (SVM) algorithm in solving the basic kernel function in the process of the weight coefficient is applied to solve the need to invoke the original SVM algorithm in this paper; using the original LSSVM subspaces, the SVM training process is essential in solving quadratic programming problem and the nature of the training process of LSSVM for solving linear programming problems. Therefore, the computational complexity of SRMKLSVM in this step is far less than that of the typical MKLSVM algorithm. The algorithm presented in this paper optimizes the weight coefficient by solving SILP problems, which is superior to the support vector machine algorithm that optimizes the weight coefficient by solving SDP problems or QCQP problems and is comparable to the multikernel support vector machine algorithm that uses SILP and other problems to solve the weight coefficient. Therefore, SRMKLSVM has the same complexity as typical support vector machine algorithms.
3.5. Prediction Rules
Two new prediction rules were defined based on MKLSSVM in order to use the weight, classifier parameter and the style transformation matrix . Since the style of the sample may or may not appear in the training process in the practical application, two new prediction rules Rule 2 and Rule 3 are added into the traditional prediction method to deal with the two cases, respectively.
Let be a subset of the entire testing data set in which each element has the same style, and is a sample.
Rule 1. Traditional prediction method.
Traditional prediction methods only use weight and classifier parameters and to predict the sample in the testing data set and obtain the corresponding label :
Rule 2. Test sample style is known.
If the style of the test sample already exists in the training data set, the corresponding style transformation matrix acquired during the training process can be directly used to process the style transformation of the sample, so that the sample is close to the standard style. Then, the predicted label was obtained by using traditional prediction rules for the processed sample .
can be obtained from Section 3.3.
Rule 3. Test sample style is unknown.
If the sample group ’s style does not exist in the training data set, to effectively make use of the style information obtained by training; based on the direct extrapolation idea, we consider the same style of the information contained in the sample group as a new style. The detailed steps are as follows:
Step 1. Obtain the temporary label of testing data set by using Rule 1.
Step 2. Train and its temporary label with the training data set to obtain the new weight , classifier parameter , and style transformation matrix .
Step 3. Use to predict test set and get the formal prediction label.
Since that most of the data in real scenes contain implicit or obvious style characteristics, the new prediction method added in SRMKLSVM takes into account the situation of known style and unknown style. The style information corresponding to the predicted samples is directly used to predict the samples with known styles. The direct extrapolation method is used to predict the unknown style samples, and the trained style information is used effectively, so the algorithm has good universality.
3.6. Analysis of SRMKLSVM
Different from SVM, which only searches for the optimal classification hyperplane according to the physical distribution of the original data, SRMKLSVM not only considers the physical characteristics contained in the data but also mines the style characteristics of the data. In this paper, the whole training samples are used to optimize the classifier parameters and the data sets with different styles are processed, respectively. With the advantage of multikernel learning for data mapping, the algorithm in this paper can represent and process the data containing more complex styles and make full use of the trained style information to conduct style regularization processing on the original samples in both training and testing methods, so that the data distribution after style transformation can be more easily divided. Compared with traditional SVM and SRMKLSVM, we find that SRMKLSVM can make full use of the information contained in the stylized data to improve the classification performance.
4. Experimental Results
4.1. Data
In this section, we introduce the EEG data provided by Bonn University to evaluate our proposed method. The EEG data set consists of 5 groups of samples from 2 groups, with detailed information as shown in Table 1, and randomly selected samples from each group as shown in Figure 2. As can be seen from Figure 2, the fluctuation of samples from different groups is very different. For example, the signal fluctuation of patients in group A and healthy people in group E is significantly different. The signal fluctuation of patients in group C and group E also differed greatly under different conditions.

Studies [19] showed that feature extraction of original EEG data in advance could effectively improve classification performance. In this paper, kernel principal component analysis (KPCA) [5, 20] was used to extract features from original data. In this section, the data after dimension reduction is used for experiments. As can be seen above, the number of samples in the data set is 500, the number of categories is 2, and the sample dimension is 70. Samples from the same group are considered to have the same style.
In order to verify the validity of this algorithm, different groups of data are selected to form two types of data sets. The first type of data is all styles contained in the test set exist in the training set at the same time. The second type of data is the test set has a style not found in the training set, and the details of the construction data set are shown in Table 2.

4.1.1. Epileptic EEG Data Set
Data sets DS.1 and DS.2 are the first type of data; DS.3 and DS.4 are the second type of data. All data were random, and 10 experiments were conducted under the same set of parameters, averaging the results. Rule 2 and Rule 3 are used to predict the two types of data. The experimental results and parameters of all algorithms [21–32] are shown in Table 3.

From the experimental results in Table 3, it can be concluded that the decision tree algorithm in data set DS.1 has the best wave signal recognition effect, and the NLMKL algorithm in data set DS.2 has the best classification accuracy, leading all other algorithms including this algorithm. The results of this algorithm in the first two data sets are not as good as DT and NLMKL, but the difference is small.
From the above results, we can see the effectiveness and stability of the proposed algorithm in improving the accuracy of EEG signal recognition by mining and utilizing different fluctuation features contained in each group of samples.
5. Conclusion
In order to use the style information contained in the sample, this paper proposes a style regularization least squares support vector machine (SRMKLSVM) based on multicore learning. In addition to the advantage of multicore learning for the expression of physical similarity between samples, the algorithm also mines and uses the style information contained in the samples to improve the classification accuracy of the algorithm. SRMKLSVM takes the style information contained in the sample into the objective function, uses the style conversion matrix to standardize the sample, uses the regularization method to limit the degree of style conversion, and optimizes both the classifier parameters and the style standard during the training process conversion matrix. In addition to the traditional prediction methods, new prediction rules that can use the trained style information are added. Experiments in stylized data sets show the effectiveness and certain practicality of the algorithm.
Data Availability
The original EEG data are available and can be downloaded from http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under grant 61876217.
References
 V. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995. View at: Publisher Site
 H. Avron, K. L. Clarkson, and D. P. Woodruff, “Faster kernel ridge regression using sketching and preconditioning,” SIAM Journal on Matrix Analysis and Applications, vol. 38, no. 4, pp. 1116–1138, 2017. View at: Publisher Site  Google Scholar
 J. W. Tao and S. T. Wang, “Kernel support vector machine for domain adaptation,” Acta Automatica Sinica, vol. 38, no. 5, pp. 797–811, 2012. View at: Publisher Site  Google Scholar
 C. Feng and S. Z. Liao, “Largescale kernel methods via random hypothesis spaces,” Journal of Frontiers of Computer Science & Technology, vol. 12, no. 5, pp. 785–793, 2018. View at: Google Scholar
 B. Schölkopf, S. Mika, A. Smola, G. Rätsch, and K.R. Müller, “Kernel PCA pattern reconstruction via approximation preimages,” in Proceedings of the 8th international conference on Artificial neural networks, pp. 147–152, Skövde, Sweden, Piscataway, September 24, 1998. View at: Google Scholar
 Y. Zhang, H. Ishibuchi, and S. Wang, “Deep Takagi–Sugeno–Kang fuzzy classifier with shared linguistic fuzzy rules,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 3, pp. 1535–1549, 2018. View at: Publisher Site  Google Scholar
 R. Alain and C. Stéphane, “More efficiency in multiple kernel learning,” in Proceedings of the 24th international conference on Machine learning, pp. 775–782, Corvalis, Oregon, New York, 2007. View at: Publisher Site  Google Scholar
 Y. Zhang, F. Chung, and S. Wang, “Fast exemplarbased clustering by gravity enrichment between data objects,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 50, no. 8, pp. 2996–3009, 2020. View at: Publisher Site  Google Scholar
 Y. Zhang, F. Chung, and S. Wang, “A multiview and multiexemplar fuzzy clustering approach: theoretical analysis and experimental studies,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 8, pp. 1543–1557, 2019. View at: Publisher Site  Google Scholar
 S. S. Bucak, Rong Jin, and A. K. Jain, “Multiple kernel learning for visual object recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1354–1369, 2014. View at: Publisher Site  Google Scholar
 S. Veeramachaneni and G. Nagy, “Style context with secondorder statistics,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 14–22, 2005. View at: Publisher Site  Google Scholar
 M. S. Cheema, A. Eweiwi, and C. Bauckhage, “Human activity recognition by separating style and content,” Pattern Recognition Letters, vol. 50, pp. 130–138, 2014. View at: Publisher Site  Google Scholar
 X. Y. Zhang, K. Huang, and C. L. Liu, “Pattern field classification with style normalized transformation,” in Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1621–1626, Catalonia, Spain, 2011. View at: Publisher Site  Google Scholar
 H. C. Jiang, K. Z. Huang, and R. Zhang, “Field support vector regression,” in Neural Information Processing: 24th International Conference, pp. 699–708, Guangzhou, China, 2017. View at: Publisher Site  Google Scholar
 K. Z. Huang, H. C. Jiang, and X. Y. Zhang, “Field support vector machines,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 1, no. 6, pp. 454–463, 2017. View at: Publisher Site  Google Scholar
 Y. Zhang, F. Chung, and S. Wang, “Fast reduced setbased exemplar finding and cluster assignment,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 5, pp. 917–931, 2019. View at: Publisher Site  Google Scholar
 G. Marzinotto, J. C. Rosales, M. A. ElYacoubi, and S. GarciaSalicetti, “Age and gender characterization through a two layer clustering of online handwriting,” in International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 428–439, Catania, Italy, 2015. View at: Publisher Site  Google Scholar
 J. Suykens, T. van Gestel, J. de Brabanter, B. de Moor, and J. Vandewalle, “Least squares support vector machines,” International Journal of Circuit Theory and Applications, vol. 27, no. 6, pp. 605–615, 2002. View at: Publisher Site  Google Scholar
 Y. Jiang, Z. H. Deng, F. L. Chung et al., “Recognition of epileptic EEG signals using a novel multiview TSK fuzzy system,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 1, pp. 3–20, 2017. View at: Publisher Site  Google Scholar
 M. E. Wall, A. Rechtsteiner, and L. M. Rocha, “Singular value decomposition and principal component analysis,” in A Practical Approach to Microarray Data Analysis, D. P. Berrar, W. Dubitzky, and M. Granzow, Eds., pp. 91–109, Springer, Boston, MA, 2003. View at: Publisher Site  Google Scholar
 M. Lu, L. H. Liu, and L. H. Wu, “Research on multikernel support vector data description method of classification,” Computer Engineering and Applications, vol. 52, no. 18, pp. 68–73, 2016. View at: Google Scholar
 H.Q. Wang, F.C. Sun, Y.N. Cai, N. Chen, and L.G. Ding, “On multiple kernel learning methods,” Acta Automatica Sinica, vol. 36, no. 8, pp. 1037–1050, 2010. View at: Publisher Site  Google Scholar
 X. Chen, N. Guo, Y. Ma, and G. Chen, “More efficient sparse multikernel based least square support vector machine,” Communications and Information Processing, vol. 289, pp. 70–78, 2012. View at: Publisher Site  Google Scholar
 M. Kloft, “l_{p}norm multiple kernel learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 348–353, 2007. View at: Google Scholar
 V. Manic and R. Bodla, “More generality in efficient multiple kernel learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1065–1072, Quebec, Canada, 2009. View at: Publisher Site  Google Scholar
 C. Cortes and M. Mohri, “Learning nonlinear combinations of kernels Advances in Neural Information Processing Systems 22,” in 23rd Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2009. View at: Publisher Site  Google Scholar
 G. Mehmet and E. Alpaydin, “Localized multiple kernel learning,” in Proceedings of the 25th international conference on Machine learning, pp. 352–359, Helsinki, Finland, 2008. View at: Publisher Site  Google Scholar
 C. Nello and S. John, An Introduction to Support Vector Machines and Other KernelBased Learning Methods, Cambridge University Press, 2000.
 Z. L. Xu, R. Jin, C. Stephane, H. Yang, I. King, and M. R. Lyu, “Simple and efficient multiple kernel learning by group lasso,” in Proceedings of the 27th international conference on machine learning (ICML10), pp. 1175–1182, Haifa, Israel, 2010. View at: Google Scholar
 C. Corinna, M. Mehryrar, and R. Afshin, “Twostage learning kernel algorithms,” in Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010), pp. 239–246, Haifa, Israel, June 2124, 2010. View at: Google Scholar
 A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, “Simple MKL,” The Journal of Machine Learning Research, vol. 9, no. 11, pp. 2491–2521, 2008. View at: Google Scholar
 F. Aiolli and M. Donini, “EasyMKL: a scalable multiple kernel learning algorithm,” Neurocomputing, vol. 169, pp. 215–224, 2015. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Yufeng Yao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.