Nonparametric Smoothing Spline Approach in Examining Investor Interest Factors

ABSTRACT

The nonparametric approach is an appropriate approach for patterns of relationships between predictor variables and response variables that are not or have not been known in form.In other words, there is no complete information about the pattern of relationships between variables.Curve estimation is determined based on relationship patterns in existing data.The nonparametric approach has great flexibility for estimating regression curves.This study aims to form a model on investor interest factors in improving tourism investment decisions with a nonparametric approach.The nonparametric method used is the smoothing spline regression method.The smoothing spline method is used because the modeling results from the smoothing spline approach can follow the relationship model between variables contained in the data.Thus, this method really helps researchers to model relationships between variables that are not linear and whose linear form is unknown.The results of the analysis showed that the nonparametric smoothing spline regression analysis method could model data by 94.63%, indicates that data variance can be explained by 94.63% with models, while other variance outside the study explain the remaining 5.37%.That is, investment motivation is one of the most important factors to improve investment decisions.

A. INTRODUCTION
One kind of statistics is regression analysis used to ascertain how predictor and response variables relate to one another (B.Xu et al., 2021).If the linearity assumption in regression analysis is satisfied and the data can be expressed in terms of specific functions, such as exponential, cubic, linear, quadratic, or cubic polynomial, a parametric approach can be used (Wadhvani & Shukla, 2019).If the shape of the curve is unknown, a nonparametric approach is used (Ibacache-Pulgar et al., 2023).The Ramsey Regression Equation Specification Error Test (RESET) is one tool for determining linearity.A nonparametric method can be applied if none of the assumptions are met.In estimating curve shapes, nonparametric approaches to regression analysis have high flexibility (Fernandes et al., 2014a).
Numerous nonparametric techniques exist truncated splines, wavelets, kernels, polynomials, MARS, and smoothing splines are a few examples (Matteson & James, 2014).As shown in these examples, smoothing splines have a unique quality that allows them to be adjusted with good flexibility, resulting in a variety of curve shapes depending on different smoothing parameters (Wood et al., 2016).The spline curve estimator's smoothness or roughness is controlled by the smoothing parameter.Many researchers, including the following, have studied splines: Gu (2014) which discusses smoothing spline ANOVA for exponential distribution family epidemiology case study of Wisconsin diabetic retinopathy, Lestari et al. (2018) about the estimation of regression function nonparametric using smoothing spline and kernel estimators and Fernandes et al. (2014b) About spline estimation for double-response nonparametric regression of longitudinal data multi predictors.Nonparametric regression with splines has been applied in many fields.One of them is by Wylasmi (2016) which uses a smoothing spline approach in the health sector, namely on infant weight growth data in Malang City.With a truncated spline approach on poverty level data in East Java.
Before to employing regression analysis, several assumptions need to be taken into account, including the assumptions of linearity, residual normality, and residual homoscedasticity (Lai & Wang, 2013).One of the fundamental presumptions for figuring out the relationship between predictor and response variables is the linearity test (Fernandes et al., 2019).Regression analysis assumes that the relationship between predictor variables and response variables can be explained through a known function and that function is a linear.If this assumption is violated, estimating parametric path functions cannot be done (Nurcahayani et al., 2021).
The first assumption is the assumption of linearity.When the assumption of linearity in the data is not met, there are several solutions besides using a parametric approach, namely using a nonparametric approach.Nonparametric techniques come in a variety of forms, including wavelet, MARS, kernel, polynomial, truncated spline, and smoothing spline.One unique feature of the nonparametric smoothing spline approach is its high degree of adaptability, which allows for the creation of different curve shapes using a range of smoothing parameter values (Mielke, 2015).The spline curve estimator's smoothness or roughness is controlled by the smoothing parameter.Research on splines is mostly done by Hidayat et al. (2021) about regression curve estimation by using mixed smoothing spline and kernel (MsS-K) model and Budiantara et al. (2019) in comparison of smoothing and truncated spline estimators in estimating blood pressure models.
The nonparametric approach can be used when the pattern of relationships between predictor and response variables is nonlinear and has no known nonlinear form (quadratic, cubic, polynomial, and so on).Regression analysis defines the relationship between predictor variables and response variables with functions ().Because splines can track the pattern of relationships between predictor and response variables, they are used in nonparametric regression analysis.flexibly because splines are one unique nonparametric regression technique that can effectively adapt to changes in data behavior (Mariati et al., 2021).The approach used in spline guessing is the truncated spline approach and the smoothing spline approach .In truncated splines, a knotted base is used, while in smoothing splines, the estimation of the function is based on the criteria of model accuracy and the size of the smoothness of the curve which has been regulated by smoothing parameters (Hidayati et al., 2019).
The second assumption is the assumption of normality.The normality assumption is used to determine whether the residuals in the resulting model are normally distributed (Hainmueller et al., 2019).The third assumption is residual homoscedasticity.The homoscedasticity assumption is used to determine whether the variety of residues produced by regression analysis has relatively the same value.Parameter estimation in a parametric approach using OLS produces an unbiased and consistent estimator that is considered less efficient because it does not minimize residual variety (Du & Bentler, 2022).An alternative is needed to produce an efficient and best estimator, namely using Weighted Least Square (WLS) (P.Xu et al., 2023).The Weighted Least Square (WLS) method is a development of the OLS method that will provide a more accurate solution than the OLS method when the data is identified outliers.It is possible that the model produced by the OLS method still contains outliers so that it will affect the diversity of the rest of the models, while the WLS method minimizes the outliers in the data (Kang et al., 2020).In contrast to the nonparametric approach, the handling of heteroscedasticity cases in the weighted spline nonparametric approach uses the Weighted Penalized Least Square (PWLS) method.Regarding PWLS, numerous studies have been carried out, including: Ouyang et al. (2011) examined the effect of penalties on PWLS reconstruction images for low-dose CBCT, and Mardianto et al. (2023) about the autocorrelation comparison on longitudinal data.The residual variety's degree of heterogeneity will vary depending on the data.The Mean Absolute Percentage Error (MAPE) is one statistic that assesses this.The accuracy of a method to form a model that corresponds to calculating the absolute percentage of average error is measured by MAPE, which has a value between 0 and 1 (Abatzoglou, 2013).
Explanations related to the methods that have been described can be used to review more deeply related to the tourism sector, especially the tourism investment sector in Indonesia.The potential for tourism in Indonesia is very large considering that Indonesia consists of tens of thousands of islands (Connell, 2013).They have abundant biodiversity and are surrounded by two continents causing cultural influences that add to the cultural richness of the archipelago in addition to the original culture of the population that is not influenced by outside cultures (Wahyuni et al., 2023).Tourism trends that are most in demand by tourists to Indonesia are the natural beauty, biodiversity and cultural diversity of Indonesia.The natural potential that exists in Indonesia includes unique flora and fauna such as Komodo dragons, corpse flowers, and other very rare types.Landscapes in an area can be sought after by tourists such as Raja Ampat, Bunaken, Badr Island and others.Indonesia with all its natural wealth and biodiversity in it is an attraction for investors who want to invest in Indonesia.Tourism is a very interesting and promising sector to be developed today (Higgins-Desbiolles, 2018).The tourism sector is an activity that never dies and becomes very important for a country (Higgins-Desbiolles, 2021).The existence of tourism, more specifically for local governments, tourist attractions will be income for the region itself.This study was conducted to model tourism investment decisions that are influenced by investment interests.The analysis used to determine the influence of investment interest on tourism investment decisions in Indonesia uses the nonparametric smoothing spline regression analysis method.

B. METHODS 1. Nonparametric Smoothing Spline Regression
Smoothing spline is one of the function estimation models in regression analysis that can be applied to path analysis.Spline has special characteristics and good variability in adjusting data behavior patterns (Fernandes et al., 2014b).The nonparametric smoothing spline regression model has the following function as in the following equation.

𝑓 ̰ = 𝑻𝑑̰ + 𝑽𝑐̰
(1) T is a matrix of size n x m and d is an m-sized vector as presented in the following Equation.

T
, and V is a matrix of size n x n and c is a sized vector n as presented in the following Equation.  is a submatrix of the matrix V for endogenous k-variables and element 〈  ,   〉 obtained from the following equation.1 , ( , ) and m = 2 (linear) The following results were obtained , ( ) The nonparametric path analysis model is a development of the model on nonparametric regression analysis.The nonparametric path analysis function formed based on simple parametric path analysis can be written as follows (Fernandes et al., 2019): Equation ( 13) can be written in matrix form as presented equation ( 14) Using spline's polynomial ordo m = 2, equation ( 13) can be replace like in to equation ( 14).
T is the matrix with dimension n x m.According to equation ( 6), T can be written into equation ( 16).18) and ( 19) Equation ( 15) can be solved as follows.

Penalized Weighted Least Square (PWLS)
Nonparametric regression curves have no known shape, however k f assumed smooth to obtain regression curve estimation k f In nonparametric path models, a single predictor that satisfies the assumed form   0 E   and

 
Var   Σ can be solved by the PWLS method (Fernandes et al., 2014b) that is: Value of M In the equation above represents the value 2n.Optimization in the Equation above considers: 2 refiner parameters k  as a controller between goodness of fit (first segment) and roughness penalty (second segment) where λk is a smoothing parameter that simultaneously controls these two things.The goodness of fit field in PWLS optimization is:

Optimum Smoothing Parameter (  )
Finer parameters  is a balance controller between the conformity of the curve to the data and the smoothness of the curve.If  has large value, the estimation of the functions obtained will be smoother, whereas if  has small value, the estimation of the function obtained will be rougher (Lai & Wang, 2013).The method for determining the optimal smoothing parameter is the mean square error (MSE) and generalized cross validation (GCV).

Mean Square Error (MSE)
Mean square error (MSE) is the presumptive value of the residual variety.MSE is also defined as the square value of the difference between the estimator and the population parameter (Fernandes et al., 2019).The best model is a model with a minimum MSE, which means that the presumptive value of the model is close to the actual value.The MSE formula can be written as follows (Nurcahayani et al., 2021).
where n: Sample size; i Y : Response variables; i x : Predictor variables;

Generalized Cross Validation (GCV)
GCV is a cross-validation modification (CV).CV is a method of selecting a model based on the predictive ability of the model.The formula of CV is as follows.
is the sum of the diagonal elements of the matrix () IG  .GCV functions are defined as: The selection of optimal parameters is one of the important things, because it is related to the acquisition of an optimal spline estimator (Sifriyani et al., 2017).When the GCV has a minimum value, the spline regression model can be bound to have a small error as well.

Goodness of Fit
Goodness of fit can be seen from several criteria, one of them is determination coefficient score (R 2 ).According to Gujarati (2003), determination coefficient (R 2 ) is an overall measure of the goodness of the function estimator model.R 2 score described in the proportion form or percentage from total variant on endogenous variable which is described by exogenous variables.R 2 score can be obtain from equation (32).

Hypothesis Testing
According to (Liu & Wang, 2004) there are different treatments to test the hypothesis on a nonparametric smoothing spline approach.Testing is performed using t-SKL test (Symmetrized Kullback-Leibler).Let 0 f be the regression function under the null model.Then 0 f H  y , where . Note that H is an idempotent hat matrix and where S1 measures the difference between 0 f and f , S2 is the residual sum of squares under H1, and S3 is the residual sum of square under H0.Where the hypothesis is determined as follows.
For non-Gaussian data, Xiang & Wahba (1995) proposed the SKL test based on the SKL distance between 0 f and f : For Gaussian data, it reduces to The test criteria, if   ≥  2 (−1) then  0 rejected.Which mean there is significant effect betweet exogenous variable towards endogenous variable.

Investment Decisions
Investment decision is one of the functions of financial management that involves allocating funds, both funds sourced from inside and outside the company in various forms of investment decisions with the aim of obtaining greater profits from the cost of funds in the future.Forms of investment include short-term investments consisting of investments into cash, short-term securities, receivables, and inventories.Long-term investments in the form of land, buildings, vehicles, machinery, production equipment and more.In addition to getting profits in the future, another goal is to maximize the value of the company.

C. RESULT AND DISCUSSION 1. Linearity Test
The linearity test is used to determine the pattern of relationships between variables.The Ramsey Reset Test was used to test linearity in this study.Data on latent variables in testing the linearity assumption were obtained using the average score.The following are the results of the linearity test in the following Table 2. Based on Table 2, it can be explained that the relationship between the predictor variable (X) and the response variable (Y) is not linear because it has a p-value (0.002) < 0.05.The parametric approach cannot be used in this case because the data does not satisfy the assumption of linearity.Another solution that can be used is to use a nonparametric approach.The nonparametric approach can be used because the relationship between the predictor (X) and response (Y) variables does not satisfy the assumption of linearity.A visualization of the relationship pattern between the investment interest variable as a predictor variable (X) and the investment decision variable as a response variable (Y) is shown as follows.In the figure above, it can be seen that the pattern of the relationship between the investment interest variable as a predictor variable (X) to the investment decision variable as a response variable (Y) has a pattern that is not linear and does not form quadratic and cubic patterns so that it cannot use a parametric approach to estimate the regression coefficient parameters.So that the nonparametric approach is used to estimate the regression coefficient parameters because the nonparametric approach can estimate regression coefficient parameters in data that have a pattern of relationships between predictor variables (X) and response variables (Y) that are nonlinear and the pattern is unknown.This is because the nonparametric approach has flexibility in data patterns so that the results of estimating regression coefficient parameters can follow the pattern of relationships between predictor variables (X) and response variables (Y) of unknown form.The predictor variable (X) in this study is the investment interest variable and the response variable (Y) in this study is the investment decision variable.

Nonparametric Smoothing Spline Estimation
To obtain a nonparametric model with the best smoothing parameters, the GCV value must be minimum.The iteration to obtain the minimum GCV performed using 'nlminb' package to solve the optimization with R software yields the lambda and GCV pairs described by the following Table 3.The main principle of the Generalized Cross Validation (GCV) method is the process of selecting the best model by entering λ values and taking λ which produces the lowest GCV value.To obtain the minimum GCV, which is characteristic for determining the model with the best λ, a three-stage iteration is carried out.In the end, a minimum GCV of 9.0271 was obtained when the λ value was 0.02271.Estimating curves with a nonparametric smoothing spline approach requires two vector coefficients, vector d and vector c.Table 4 below presents the coefficients of estimating non-parametric regression functions in estimators with the PWLS approach.
The values of d11 and d12 can be interpreted like slope and intercept values in simple linear regression analysis.While the values c1.1 to c1.500 are penalty values in each data.Where this penalty value provides the accuracy of the resulting modelling on the nonparametric smoothing spline approach.So that in each of the first data up to the 500th data, a penalty is applied.This is the main characteristic of the nonparametric smoothing spline approach where the modelling obtained produces a model that can closely follow the pattern of the data owned.The measure of goodness of the model (R2 test statistics) obtained is 0.94627.This means that 94.63% of the diversity of data has been able to be explained by the nonparametric smoothing spline function estimator model, and the rest (5.37%) is explained by other variables outside the model.One of the advantages of estimating the smoothing spline function is that the pattern of relationships between variables is more sensitive to the original data.This is due to the penalty element in the model that can produce function estimators with high flexibility.The penalty in the model can be seen in the equation above where each observation has a mathematical element that moves based on the observation, for example in the first observation (X1) penalized , and so on.

Hypothesis Testing
Hypothesis testing in this study does not use t-test as commonly used, but uses SKL (Symmetrized Kullback-Leibler) t-test.The t-SKL test is used because in nonparametric smoothing spline modelling special treatment is required where the t-SKL test statistics are obtained by comparing the value of the function when it is 0 and when it is the same value as the smoothing parameter ("lambda").The test criterion t SKL (Symmetrized Kullback-Leibler) is that when the test statistics   ≥  2 (−1) then H0 is rejected.The rejection of H0 means that there is a significant influence between the investment interest variable as a predictor variable (X) and the investment decision variable as a response variable (Y).Based on the results of hypothesis testing using the t-SKL test, the following results were obtained.Based on the table above, it can be seen that the results of hypothesis testing using the SKL (Symmetrized Kullback-Leibler) t-test produce a significance value of < 0.05.The significance value or commonly referred to as a p-value of less than 0.05 means reject H0, so it can be concluded that there is a significant influence between predictor variables or in this study is an investment interest variable (X) on response variables, namely investment decision variables (Y).

D. CONCLUSION AND SUGGESTIONS
Based on the analysis that has been done, it was found that a nonparametric smoothing spline approach is suitable to be used to determine how the influence of investment motivation on investment decisions is related.The high degree of flexibility of the nonparametric smoothing spline approach makes this approach most likely to be used in various disciplines.Data challenges do not become a serious problem if a nonparametric smoothing spline approach is used to model or conjecture a function in a particular problem.The advice that researchers can give is to develop it on more complex models such as using more than one predictor variable.In addition, from the modelling that has been done, in the next research can be done on data that has outliers.The flexibility of the nonparametric smoothing spline approach allows the resulting model to model well data that has outliers.
matrix V can be written into equation (17).

d
and c can be written as equation ( i th diagonal element of the matrix G.The GCV equation is obtained by substituting

Table 2 .
Linearity Test Results

Table 3 .
Lambda and GCV values.

Table 4 .
Smoothing Spline Estimation Function.Based on Table4, the estimator equation for the nonparametric smoothing spline regression function can be written as follows.

Table 1 .
Results of Hypothesis Testing using t-SKL