Nonparametric Spline Truncated Regression with Knot Point Selection Method Generalized Cross Validation and Unbiased Risk

ABSTRACT


A. INTRODUCTION
A truncated spline is a piece of polynomial that has a continuously segmented property so that it effectively explains the local characteristics of the data function. The spline truncated approach is an approach towards matching data while still taking into account the smoothness of the curve. One of the advantages of truncated splines is that they are flexible, meaning that this model tends to look for data estimates wherever the data pattern moves .
The application of multivariable nonparametric regression with a truncated spline approach with the GCV and UBR knot point methods can be applied to health data, one of which is Stunting data (Purnaraga et al., 2020). Stunting is an indication of chronic malnutrition as a result of poor interaction of various determinants of child nutrition. A child is stunted and that protein intake should meet a child's nutritional needs. Toddlers are at 1.59 times the risk of stunting when their protein intake is below the level of nutritional adequacy.
Child growth is the main indicator for assessing nutritional status in children under five years of age (under five years old) and is also one of the 6 global nutrition targets set by WHO in 2012(da Silva et al., 2018 and is a leading indicator of the Sustainable Development Goals (SDGs) of 2030 (Dewi et al., 2019). This can be interpreted to mean that nutrition has an important role in the growth period of toddlers. Adequate nutrition, health conditions, protection, and safety factors play an important role in child development, especially at an early age The occurrence of stunting in this period can affect the structure and function of the brain where a reduced number of cells causes growth delays. A survey by the Indonesian Ministry of Health revealed that 16% of children under five years old have impaired fine and gross motor development, hearing loss, decreased intelligence and speech delay with a total of 0.4 million cases (Padatuan et al., 2021).
If at the age of growth the toddler is already stunted, it will have a bad impact on the future of the child. The risk of obesity occurring in children will be very high considering that short children have a low ideal weight. An increase in body weight alone can cause the child's Body Mass Index (BMI) to rise beyond the normal limit. This situation will continue for a long time until the risk of degenerative diseases occurs. Children are considered stunted and severely stunted if body length and height based on age range are less than the WHO-Multicentre Growth Reference Median Standard Study (MGRS) (Hendraswari et al., 2021).
According to the World Health Organization (WHO) an area can be said to have stunting problems if the stunting percentage is above 20%. Referring to WHO standards, it can be known that Indonesia has a fairly high stunting problem of 31.8% so it is necessary to pay special attention from the government to this incident to anticipate the increasing prevalence of stunting in the future. One of the special concerns that can be done is to pay attention to the factors that cause stunting in Indonesia (Indanah et al., 2022).
The Sustainable Development Goals (SDGs) one of the goals in the health sector mentions the target of community nutrition (Singh et al., 2017), namely by 2030, ending all forms of malnutrition, including the achievement of the 2025 international target to reduce stunting and wasting in children under five, reducing stunting programs globally The government pays appropriate attention to stunting in children under the age of 2-3 years through the National and Internal Nutrition international movement namely the Scaling Up Nutrition (SUN) movement with a concentration system to border areas (Yulianti et al., 2022).
The causes of stunting are multifactorial, which include genetic, socio-demographic, economic status, as well as cultural and environmental factors and other health-related variables (Geberselassie et al., 2018). This is in line with research conducted in Bangladesh in 2018 where the results showed that stunting factors are caused by parental education factors. In addition, sanitation facilities in households also determine the incidence of stunting in Indonesia, unkempt toilets and unprocessed drinking water provide three times greater chance of stunting (Torlesse et al., 2016).
Previous research has shown that shaky growth before birth and 18 months after pregnancy is associated with poor language and motor development (Sudfeld et al., 2015). Stunted children aged 2, 5, and 9 years had verbal scores and a lower IQ of 4.6 points compared to others (Koshy et al., 2022). Some studies have also revealed that they have lower scores in all aspects of development. A study in Kalasan showed that stunted children were 3.9 times more at risk of suspicion than others with normal growth (Nahar et al., 2020).

B. METHODS
The data used in this study are secondary data, namely data on the prevalence of stunting, the percentage of households that have access to proper sanitation, the percentage of toddlers who get IMD, the percentage of pregnant women at risk of SEZ obtained from the Ministry of Health of the Republic of Indonesia Study of Nutritional Status of Indonesia (SSGI). Meanwhile, data on the percentage of babies aged less than 6 months who get exclusive breastfeeding and the number of poor people are obtained from the Central Statistics Agency (BPS), as sowhn in Table 1.

Prevalence of Stunting Toddlers (Y)
The condition of growth failure in children under five years old (infants under five years old) due to chronic malnutrition so that the child is too short for his age.

Percentage of Babies Getting Exclusive
Breastfeeding for 6 Months (X1) Babies who only get breast milk from birth to 6 months of age in one working area at a certain period of time.

Percentage of Households That Have Access to Proper Sanitation (X2)
Intentional behavior in the cultivation of clean living with the intention of preventing humans from coming into direct contact with dirt or other harmful waste materials in the hope that this effort will maintain and improve human health Percentage of Toddlers Who Get Early Breastfeeding Initiation (IMD) (X3) is the beginning of a mother giving breast milk to her baby when the baby is born into the world, namely in the first hours or 1 hour after giving birth. SEZ is a condition where the mother experiences malnutrition.

C. RESULT AND DISCUSSION 1. Descriptive Statistical Analysis
Descriptive statistics for response variable data and predictor variables from observational data are shown in Table 2. Based on Table 2, it can be seen that the case of stunting prevalence in Indonesia in 2021 obtained an average value of 25.21%, with the lowest value of 10.90% and the highest of 37.80%. The variable percentage of babies getting exclusive breastfeeding for 6 months in Indonesia in 2021, obtained an average value of 68.88%, the lowest value of 52.75% and the highest of 81.46%. The variable percentage of households that have access to proper sanitation in Indonesia in 2021 obtained an average value of 80.97%, with the lowest value of 40.81% and the highest of 97.12%. The variable percentage of toddlers who get IMD in Indonesia in 2021, obtained an average value of 49.23%, with the lowest value of 32.50% and the highest of 62.70%. The variable percentage of poor people in Indonesia in 2021, obtained an average value of 10.76%, the province that has a percentage of poor people, namely Bali at 4.53% and the highest is Papua at 26.86%. The variable percentage of pregnant women at SEZ risk in Indonesia in 2021 obtained an average value of 12.1%, with the lowest value of 3.10% and the highest of 40.70%.

Relationship Patterns Between Predictor Variables and Response Variables
The relationship between the response variable and the five variables that are thought to have an effect can be seen in the scatterplot in Figure 1.  Figure 1, it can be seen that the distribution between the Stunting Percentage data and the five variables that are suspected to have an effect does not form a certain pattern, the data pattern spreads and some data that is far from other data distributions, so the spline truncated nonparametric regression method can be used on the data because the shape of the data pattern is unknown.

Selection of Optimal Knot Points Using the GCV Method
The first step taken before modeling using truncated spline nonparametric regression is to determine the number of knot points used. In this study, the knot points tried were 3 knots, after which one optimal knot, two optimal knots, and three optimal knots will be sought. Here is the selection of the optimal knot point using the GCV method. The nonparametric spline truncated regression model on stunting prevalence data in Indonesia in 2021 with one knot point is as follows. (1) After 50 knot point experiments to obtain the optimal knot point, 5 smallest CV values with one knot are obtained as shown by Table 3.  Table 3 shows the minimum GCV values for multivariable nonparametric regression models on linear truncated splines with a single knot point of 8.74. with the optimal knot point in variable X1 which is 71.50, in variable X2 which is 77.58, in variable X3 which is 52.22, in variable X4 which is 19.11, and in variable X5 which is 27.65. A truncated spline nonparametric regression model using a minimum GCV value with one knot can be written down in equation (2).
(2) Furthermore, the selection of knot points is carried out using two knot points. The nonparametric spline truncated regression model on stunting prevalence data in Indonesia in 2021 with two knots is as follows: (3)     . 7, No. 3, July 2023, pp. 848-863 50 knot point experiments were conducted to obtain the optimal knot point. The following shows the 10 smallest CV values with two vertices, as shown in Table 4.   (4).
(4) The next step is to select knot points using three knots. The spline truncated nonparametric regression model on stunting prevalence data in Indonesia in 2021 with three knots based on equation (5) is as follows.
(5) 50 knot point experiments were conducted to obtain the optimal knot point. The following shows the 5 smallest GCV values with three knots, as shown in Table 5.
The results of optimal knot point selection using GCV with one knot point, two knots point, and three knots point can be seen in Table 6.  Table 6 shows a minimum GCV value of 7.29 in models with three knots. Parameter estimation of a truncated spline nonparametric regression model using the GCV method with three knots is as follows.
The truncated spline nonparametric regression model with three knots in equation (7) yielded an R 2 value of 99.57, this suggests the model can explain the Prevelence data for stunting in 2021 of 94.07%.

Optimal Knot Point Using UBR Method
The first step taken before modeling using truncated spline nonparametric regression is to determine the number of knot points used. In this study, the knot points tried were 3 knots, after which one optimal knot, two optimal knots, and three optimal knots will be sought. Here is the selection of the optimal knot point using the UBR method. The spline truncated nonparametric regression model on stunting prevalence data in Indonesia in 2021 with three knots based on equation (8)     Furthermore, the selection of knot points is carried out using two knot points. The nonparametric spline truncated regression model on stunting prevalence data in Indonesia in 2021 with two knots is as follows: (10) 50 knot point experiments were conducted to obtain the optimal knot point. The following shows the 5 smallest UBR with two vertices, as shown in Table 8.  Table 8 shows the minimum UBR value for two knots of with optimal knot points in variables X1 of 77. 94 and 80.87, in variables X2 of 90.22 and 95.97, in variables X3 of 59.00 and 62.08, in variables X4 of 24.12 and 26.40, and in variables X5 of 39.09 and 39.93. A truncated spline nonparametric regression model using a minimum GCV value with two knots can be written down in equation (11).
The next step is to select knot points using three knots. A nonparametric spline truncated regression model on stunting prevalence data in Indonesia in 2021 with three knots based on equation (12) is as follows.     . 7, No. 3, July 2023, pp. 848-863 (12) 50 knot point experiments were conducted to obtain the optimal knot point. The following shows the 5 smallest UBR values with three knots, as shown in Table 9.
x 4 x 5 The results of optimal knot point selection using UBR with one knot point, two knots point, and three knot points can be seen in Table 10.  Table 10 shows the minimum UBR value of i.e. in models with two knots. The parameter estimation of a truncated spline nonparametric regression model using the UBR method with two knots is as follows.
The truncated spline nonparametric regression model with three knots in equation (14) yielded an R2 value of 0.8467, this suggests the model can explain the 2021 Prevelence stunting data of 84.67%.

Comparison of GCV Method and UBR Method in selection of Optimal Knot Point
The following is shown a comparison table of GCV and UBR methods in the selection of optimal knot points in stunting prevalence data in Indonesia in 2021, as shown in Table 11.  Table 11 shows that the MSE value of 1.82 obtained in nonparametric spline truncated regression modeling with three knots using the GCV method is better than the UBR method with an MSE value of 4.87. Clarified by the R 2 value produced by the GCV method of 94.07 is greater than the R 2 value produced by the UBR method of 84.16. So it can be concluded that the GCV method with three knots is a more appropriate method in selecting optimal knot points than the UBR method in Indonesia's stunting prevalence data in 2021.

Simultaneous testing of parameter significance
Simultaneous testing is performed to test whether all the parameters in the regression model have a significant effect. The results of simultaneous testing are as follows.  . 7, No. 3, July 2023, pp. 848-863 there is at least one The ANOVA results can be seen in Table 12. Based on Table 12 it can be seen that the calculated F value of 7.92 when compared to the value of F(20,13,0.05) of 2.46, it is decided that H0 is rejected. So it can be concluded that there is at least one significant parameter to the prevalence of stunting in Indonesia in 2021.

Partial Parameter Significance Testing
Testing was performed to determine which parameters had a significant effect on the regression model. Partial test results are as follows: The results of the test can be partially seen in Table 13. Based on Table 13, There are 21 parameters in the truncated spline nonparametric regression model formed with a confidence level of 95% and a significance level of 5%, there are 9 significant parameters and 12 insignificant parameters. The parameter is declared significant if the p-value is less than the significance level. Partially, the variable percentage of x 3 x 4 x 5 x Tutik Handayani,Nonparametric Spline Truncated Regression with... 861 toddlers who get IMD (x3), the percentage of poor people (x4), and the percentage of pregnant women at risk of SEZ (x5) affect the prevalence of stunting (y), while the variable percentage of infants receiving exclusive breastfeeding for 6 months (x1) and the percentage of households that have proper sanitation (x2) have no effect on the prevalence of stunting (y) in Indonesia in 2021.

D. CONCLUSION AND SUGGESTIONS
The best multivariable nonparametric regression model in the case of stunting prevalence in Indonesia year obtained from a linear truncated spline approach using the GCV method with three knots resulting in an MSE value of 1.82 with an R2 value of 94.07%, so it can be concluded that the GCV method with three knots is a more appropriate method used for optimal knot point selection compared to using the UBR method in the case of stunting prevalence in Indonesia year 2021. Based on testing the significance of partial parameters of factors that influence stunting prevalence cases in Indonesia in 2021 based on the best model, namely the variable percentage of babies getting exclusive breastfeeding for 6 months (x1), the percentage of households that have proper sanitation (x2), the percentage of poor people (x4), and the percentage of pregnant women at risk of SEZ (x5).