Fuzzy Support Vector Machine Using Function Linear Membership and Exponential with Mahanalobis Distance

ABSTRACT


A. INTRODUCTION
Support vector machine (SVM) is a machine learning method developed by Vapnik in 1995 to predict, both classification and regression. SVM is included in the supervised learning technique. In addition, there are also unsupervised learning techniques, including clustering techniques. Several studies related to clustering were conducted by  using one type of clustering, namely Fuzzy Subtractive Clustering. SVM theory is based on the principle of structural risk minimization (Vapnik, 1995). In many of its applications, SVM has proven to perform better than conventional machine learning and has built a reputation for its distinct approach to classification. That is the reason why this algorithm was introduced as an effective tool for solving classification problems (Lin & Wang, 2002).
Aside of mathematical foundation in theoretic statistic learning, SVM has shown a very competitive performance in a lot of real world application, for the example in text mining (Manochandar & Punniyamoorthy, 2018), bioinformatic (Battineni et al., 2019) (Viloria et al., 2020), face recognition (Anzid et al., 2019), and graphic processing that has established SVM as one of the tool in machine learning and data mining, together with the other software and computation technic, for the example neural network and fuzzy system (Ladwani, 2018). SVM based algorithm, just like the other algorithm, has a drawback that is sensitive to noise and outlier (Mohammadi & Sarmad, 2019), (J. Liu, 2020), (Xiaokang et al., 2016b).
Uncertainty can be defined as the absence in class membership criteria that is defined sharply rather than random variable (J. ROSS, 2010). Theory and fuzzy group method can solve the uncertainty and the vagueness. This is extremely useful to tackle a complex realworld problem, including the problem in introducing prediction inference and control, and decision making. The combination of fuzzy group theory and classification technic in determining a class come in a handy to improve the ability of classifier generalization (Lin & Wang, 2002) (Mohammadi & Sarmad, 2019) (Inoue & Abe, 2001;Jiang et al., 2006;W. Liu et al., 2020;Richhariya & Tanveer, 2018;Wu, 2011).
According to (Lin & Wang, 2002), one of the criteria that is widely chosen to consider one point as outlier is the appropriate distance to the center of class. Euclidean distance is a general common distance. Euclidean distance has been applied in FSVM by (Surono, Nursofiyani, et al., 2021) for classification. But this distance is not an appropriate metric for data with correlation feature. As alternative method, Mahalanobis distance is often being used to measure the distance of every point to class center while also taking covariant into consideration (Mohammadi & Sarmad, 2019).
To lessen the noise and outlier effect, fuzzy membership function combined with SVM can give a lighter mass to the outlier data point. A new fuzzy membership function is being proposed to be formed using mixed kernel function on future space. Based on research (Lu et al., 2009), the results of the smallest training error and testing error are found in mixed kernels, where the training error is 0.00457893 and the testing error is 0.01598367. Three FSVM models were created and normal SVM was selected as a reference for comparison and it can effectively reduce noise impact and improve classification accuracy. The results showed that FSVM had the best classification performance compared to SVM (Xiaokang et al., 2016b). To tackle the outlier, An dan Liang (An & Liang, 2013) proposed a fuzzy support vector machine based on class scattering to solve classification problem. FSVM Interval type-2 is proposed to be able to reach the level of accurate classifier (Ekong et al., 2016).
To solve this problem, we propose the fuzzy support vector machine (FSVM) with different fuzzy membership value and using Mahalanobis distance to measure the distance of each point to the class center. The method used is especially targeted to improve SVM in lessening noise and outlier effect in data point.

Support Vector Machine
Support Vector Machine (SVM) is a biner classification method to find the best hyperplane by separating defined data between classes. Support vector machine is a biner classification algorithm that maximizes margin around the hyperplane separator and minimizes training fault at the same time. For the example S is in n group training sample (Lu et al., 2009) where ∈ is a matrix with n dimension and ∈ {−1, +1}. To find the separator area (symbolized by H) for data that can be separated in linier is with w defined as normal vector from hyperplane separator that is sized as big as pdimension and b is scalar. If 1 and 2 are first and second classes barrier area, so: Every point on barrier area class (support vector) has the equation of: So that for every data point, applies: Hyperplane optimal can be obtained by followed primal optimization problem (J. Liu, 2020): If the data training is unable to be separated by linier, slack variable can be added as misclassification from difficult data training or noise. Slack variable addition will change the formula into (Xiaokang et al., 2016a): with the requirement, Optimal hyperplane in (2) can be obtained by transforming the primal shape into the double shape of quadratic programming (QP). Equivalent double problem can be solved based on the condition of Kuhn-Tucker (KKT) Lagrange multiplier (Mohammadi & Sarmad, 2019), with the requirement, where = ( , … , )is a vector multiplier Lagrange. Finding by solving the quadratic optimization problem so can found the result as follows: According to KKT condition, bias definition can be solved by (Mohammadi & Sarmad, 2019): also, can be solved for every supporting vector (based on the observation that the suitable is bigger than zero). Sample point is classified based on function sign in which explained as follows (Mohammadi & Sarmad, 2019).
For non-linier data can be separated in feature room, kernel function ( . ) = ( ) ( ) is used to find the optimal hyperplane in higher dimension room, in which ( ) is a non-linier mapping function. In research conducted by (Ningrum, 2018), of the three kernels used, the kernel with a polynomial function provides the highest classification accuracy value, with an accuracy value of 95.45%. Therefore, this research uses a polynomial function kernel, which can be stated as follows in Figure 1.

Fuzzy Support Vector Machine
Even though SVM is an effective biner classification method, it is sensitive towards noise and outlier. Similar to SVM, the purpose of FSVM is to find the optimal separating hyperplane that is separating sample point into two classes with maximum margin. In the year of 2002, (Lin & Wang, 2002) extension method from SVM called Fuzzy Support Vector Machine (FSVM) was proposed. In FSVM, every data training point is given a fuzzy membership value symbolized by . The effect of outlier data point over classification accuracy can be lessened by giving a value to fuzzy membership because the outlier data point is unable to show the correct ownership on the defined class. Outlier data point is going to have a small fuzzy membership value. Usually, fuzzy membership function is designed based on the distance from suitable center class. These will minimalize the impact on error and on the forming of dividing area. For the example a S data set filled with training points (Mohammadi & Sarmad, 2019): = {( 1 , 1 , 1 ), ( 2 , 2 , 2 ), … , ( , , )} Primal optimization problem: with the requirement Normal vector from hyperplane: * = ∑ =1 scalar from hyperplane:

Start
End where every data training point ∈ is a class in the label of ∈ {1, −1} and fuzzy membership value, where 0 < ≤ 1 for = 1, . . . , . The optimal formula of hyperplane for this problem is with the requirement, To solve the FSVM optimal problem, (4) is being transformed into following double problem by introducing Langrangian multiplier: The only difference between fuzzy SVM with classic SVM is the upper limit , which is obtained from multiplying both from parameter C and .

Mahalanobis Distance
Mahalanobis distance is a statistic method used to obtaining a data with certain distance over the data average so that the data distribution with patter over average value is obtained. Mahalanobis distance can be defined as the distance of two points involving covariant or correlation between the changer. Mahalanobis distance between two objects are being presented in the shape of vector and matrix. This distance is taking account of correlation between variable, because it is considered as the opposite of covariant matrix in calculation. It can be defined as (Mohammadi & Sarmad, 2019): where x is a vector of p-dimension in R p , ̅ is a vector of a sample average and −1 shows and inversion of covariant matrix. For the example, the used data is formed based on the amount of p variable and the amount of observation N. So that it is obtained the matrix form p×p written as follows: where to the covariant value can be found by using the formula below: with = 1,2, … , , dan = 1,2, … , .

Membership Function
Fuzzy membership function in linier case. In FSVM, fuzzy membership is used to lessen the effect of outlier or noise and a different fuzzy membership function will give another result in fuzzy algorithm. Basically, the rule to determine the correct membership value to data point can depend on relative importance of date point to their own class. On this research, we consider two fuzzy membership functions given in (Lu et al., 2009). Definition 1.
as FSVM 1 is referred as fuzzy linier membership and can be defined as follows in which is a small positive value used to avoid the value of becoming zero. Definition 2.
as FSVM 2 is referred as exponential fuzzy membership and can be defined as in which the parameter of ∈ [0, 1] decides the steepness of disintegration.
in which is the calculation of Mahalanobis distance based on its class while also considering the inside covariant.

Dataset
The experiment is done with medical data extracted from UCI database repository in the form of diabetes risk. It is very probable that this real dataset will have several noise and outlier in a different amount. The group of data is composed of 520 people (328 males and 192 females, aged between 16 to 90 years old). The data is obtained from questionnaire that is directly given to the people who just diagnosed with diabetes or who haven't but showing symptoms. The data variables are 1 (Age), 2 (Sex), 3 (Polyuria), 4 (Polydipsia), 5 (Sudden weight loss), 6 (Weaknes), 7 (Polyphagia), 8 (Genital thrush), 9 (Visual blurring),

C. RESULT AND DISCUSSION
In this section we will examine the performance of the membership function based on the Mahalanobis distance. To test the performance of our algorithm, FSVM using Mahalanobis distance will be compared with classic SVM. Diabetes data with the amount of data as much as 520 and the number of variables as many as 16 variables. The data is processed several times by modifying the amount of training data. The amount of training data used in this study is 70%, 80% and 90% where the number of positive class data and negative class data is shown in Table 1. While data that is not included in data training is defined as data testing. Input: Data training sample {( , ), = 1, 2, 3, … , } Data testing sample {( , ), = 1, 2, 3, … , } Output: Label prediction of from data { , = 1,2,3, … , } Process: 1) Calculating the distance using (12) for data {( , ), = 1, 2, 3, … , } 2) Calculating fuzzy membership function using (10) or (11) 3) Determining the C parameter to calculate * (9), * (4) dan * (5) by using QP solver for data {( , ), = 1, 2, 3, … , } that has been changed using polynomial kernel (7) 4) Use the decision function (6) with sample and resulting in the last class label Calculating membership function is the most important thing in FSVM. Before calculating the fuzzy membership function, measuring the distance is done first, the reason for this is to determine the approach size for each sample data over class data distribution center. The data used is processed using Python software. By using (12) so the distance is obtained as Shown in Table 2. After obtaining the distance value, the next step is to calculate the fuzzy membership function using (10) to determine linier membership function and (11) to determine exponential membership function. The calculation result is showed in Table 3, Table 4 and Table 5 below:   Accuration of classification in Table 2 show that the result of SVM classic. The accuracy classification result shown in Table 2 explain that the performance of SVM classic has a better accuracy on every data training comparison than FVSM that has been modified with Mahalanobis distance. However, the research conducted by (Mohammadi & Sarmad, 2019) showed the best performance was with FSVM, but the data used in this study was different from the data used in this study. Thus, it is safe to say that the method of giving fuzzy membership value for data training will play an important role when FSVM method is applied in a group of data. In another word, an incorrect method in determining membership value can worsen the performance of the model compared to SVM classic model. The accuracy result in Table 2 is presented in a form of graphic line as shown in Figture 2. Accuracy results using training data with 70%, 80%, and 90%. Based on Figure 1, the accuracy results for SVM have better accuracy results compared to FSVM modified with Mahalanobis distance.

D. CONCLUSION AND SUGGESTIONS
In this essay, we suggest FSVM method with consideration of Mahalanobis distances. On this research, we propose the FSVM method while also considering Mahalanobis distance. The proposed distance has the goal to characterize the distance of different data point to the real group of class. FSVM will determine the fuzzy membership value for every input point so that another input point can give a different contribution in decision making. By determining different fuzzy, that is linier membership function and exponential membership function, we are able to observe the FSVM performance in solving different kind of problem easily. This will enrich our knowledge on SVM application that is developed with fuzzy. In this study, the data used were 520 data. The results of data accuracy from the FSVM method are 0.017170689 and 0.018668421. While the results of the SVM accuracy of 0.018838348. This shows that adding fuzzy membership does not always guarantee better results than classical SVM. This also means that the act of adding the correct fuzzy membership is very important in generalizing the performance of FSVM in classification problems.

ACKNOWLEDGEMENT
Thanks to the Head of The Study Center of Science Data Laboratory for the resources given.