Identifying Significant Factors Affecting the Human Development Index in East Java Using Ordinal Logistic Regression Model

ABSTRACT


A. INTRODUCTION
The development of a region can be seen from various indicators, one of which is development. Development is a planned process to improve all aspects of people's lives to achieve national goals. It can be said that development as an economic, socio-cultural transformation is prepared for the expected purpose. Currently, the development paradigm is based on economic growth, human resources, and welfare (Maulana, 2013).
Development based on human resources is a paradigm that makes humans the target focus of all development activities. This improves the quality of intellectual, spiritual, health, morality, and economic welfare resources. Quality human development will support sustainable economic growth (Farida et al., 2021). There is one indicator that can measure the success of human development, namely the Human Development Index (HDI). HDI was initiated by the United Nations Development Programme (UNDP) in 1990, which became a core indicator of the paradigm of human development functioning in the development of GDP and GDP per capita (Biggeri & Mauro, 2018). In addition, it plays a role in improving aspects of people's lives, as seen in inequality, empowerment, productivity, and sustainability (Setiawan & Hakim, 2008). With data related to HD, people can easily access economic development results to obtain income, education, and health (Muhajirah et al., 2019). Where seen from the HDI figures of a country can be classified based on the level of development to be developed and backward (Wang et al., 2018).
UNDP compiles HDI from a combination of health, education, and income indicators received as an alternative to exploring the country's progress (Ouedraogo, 2013). Health indicators on Life Expectancy (LE), educational hands-on the Average School Length (ASL) and School Expectations (SE), and income indicators on Per Capita Expenditure (Farida et al., 2021;Setiawan & Hakim, 2008). These three indicators have a significant effect on the decline and rise of HDI.
The achievement of East Java HDI from 2014 to 2020 experienced a significant increase. HDI in 2020 grew by 0.30% compared to 2019, which was influenced by an increase in Life Expectancy (LE) of 0.17%, Old School Expectations (OSE) of 0.23%, and Average School Length (ASL) of 2.50% (Nursiyono, 2020). In addition, the increase in HDI in East Java is also caused by several regions that contribute high HDI values, one of which is the city of Surabaya by 82.23 (Yhoga, 2021). The value of East Java HDI compared to other provincial HDI on Java Island is shown in Figure 1.  Figure 1, East Java HDI eased every year, but East Java HDI occupies the lowest position (Dwi, Novita, Muchtolifah, 2019). East Java is the second-most populous province in Indonesia and has a high-performance intensity and good economic aspects; the human development index in East Java occupies the 17th position out of 33 areas in Indonesia (Giap & Amri, 2016). It should be the abundance of natural and human resources in East Java and a relatively good economy that can bring east Java HDI to the top 10. However, this has not been achieved due to several factors, including an imbalance in the average number of school lengths Jawa Barat DI Yogyakarta Jawa Timur that cause the centralized ease of accessing education in certain cities in East Java (Nugraheni, 2021). On the other hand, caused by nutritional problems that are lacking in pregnant women when pregnant have an impact on the birth of babies with low weight who are susceptible to disease (Nur Wulan, Natria, 2020). The poverty rate in East Java increased by 0.37% in September 2020, and due to the Covid-19 pandemic, per capita spending decreased by 1.18% compared to the previous year (Nursiyono, 2020;Yhoga, 2021). Facing the inequality of the achievements of the East Java HDI, there needs to be handling from the government in reviewing factors that affect HDI in the economic, education, and health sectors. The review was carried out in all regions of East Java so that no area was left behind in improving human development. This is a step of anticipation for the government in reviewing the sector that is predicted to have a significant impact on the increase in HDI in the previous.
Previous research related to the HDI was conducted by Yakunina & Bychkov (2015) analyzed the cross-country human development index component using multiple linear regressions. The research resulted in 2 of 0.98166, it is stated that the main features that affect the human development index are the innovation index, the information communication technology development index, the life expectancy index, and gross national income. Another research by Setiawan & Hakim (2008) analyzed Indonesia's HDI using the Error Correction Model, resulting in that gross domestic product and income tax affect HDI in the short and long term. In other research, Handalani (2018) analyzed the human development index in ASEAN countries using regression panel data and stated that per capita income growth and population affect HDI with a determinant coefficient of 98.64%. Research by Melliana & Zain (2013) analyzed factors affecting the HDI in East Java Province regency/city using regression panels. This research concluded that the value of HDI is influenced by elements of the number of health facilities, the percentage of households with access to clean water, school participation rates, labor force participation, and GRDP per capita. The study resulted 2 value of 96.67 percent.
From various previous studies, several variables indicated an effect on HDI: Gross Regional Domestic Product (GRDP), high school participation rate, number of health facilities and population density, labor force participation, and open unemployment rate. Some independent variables that affect HDI can be presented in a definite form. Similarly, HDI can also be expressed low, medium, high, and very high. So that in analyzing the relationship of various factors from variables in the form of definite can use the method of ordinal logistic regression. The ordinal logistic regression method can manage the analysis model and has advantages compared to the multiple regression method, namely in determining categories do not need to assume the same interval value (Stewart et al., 2019).
Several studies that use ordinal logistics regression methods, among others, are related to research by Ataman & Sarıyer (2021) identified factors that resulted in improved care and waiting times in emergency departments. The study produced two models, namely models for predicting waiting times resulting in an accuracy value of 52.247%, and treatment time models making accuracy of 66.365%. Subsequent research by Setyawati et al. (2020) analyzed factors that affect student GPA, resulting in two influential factors: majors when studying high school and the origin of the student area. In addition, research by Imaslihkah et al. (2013) analyzed factors that affect the predicate of graduation of undergraduate students at ITS Surabaya in the study produced influential factors, namely the selected faculty, gender, parental occupation and income and college entrance path, and obtained a model with a classification accuracy of 77.41%.
This study proposes an ordinal logical regression method to identify the significant factors affecting East Java HDI. The value of HDI and various factors that affect it are expressed in categorical form. This research contributes to implementing the Econometric model in the case study of HDI values. It helps provide information for decision-makers (related parties) to formulate different strategies for increasing the value of HDI.

B. METHODS 1. Data Collection
This study uses secondary data obtained from the Central Statistics Agency of East Java Province (https://jatim.bps.go.id/), namely in 2020, based on regencies/cities. This study uses dependent variables and independent variables. The dependent variable (y) is the Human Development Index (HDI) of East Java Province, while the independent variable (x) has seven variables.
The value of HDI (y) is categorized into four categories, for < 60 is classified as low, 60 ≤ < 70 is classified as medium, 70 ≤ < 80 is categorized as high, and value > 80 is ranked as very high (Nursiyono, 2020). Meanwhile, the infant mortality rate ( 3 ) is also classified into four categories, for 3 < 20 is categorized as low, 20 ≤ 3 ≤ 39 is categorized as a medium, 40 ≤ 3 ≤ 70 is classified as high, and 3 > 70 is categorized very high. The following is presented in detail variables along with their measurement categories and scales, as shown in Table 1.

Ordinal Logistic Regression
Ordinal logistic regression, commonly called the cumulative logit model, is a development of binary logistic regression. The model used in ordinal logistic regression is the incremental logit model. Where the cumulative logit has ordinal properties or categories of dependent variable responses (y) that are included in the cumulative odds. In mathematical models, cumulative logits are defined as follows (Imaslihkah et al., 2013): The result of ordinal logistic regression is represented by the odds ratio that is noted with ( ≤ )/( ( > )), which can characterize and make it easier to define the influence of factors in a particular category (Wu et al., 2019), (Scott et al., 1991). The odds ratio can be defined as follows: where j is a response category is ranging from 1,2, ..., j. The probability of a dependent variable with k category symbolized with 1 , 2 , … , . Meanwhile, 0 1 , 0 2 , … , 0 −1 represents the interception of each regression model, and represents the regression coefficient corresponding to the variable , and x is each a coefficient parameter vector and an independent variable vector.

Ordinal Logistic Regression Parameter Testing a. Overall Test
The test used in this study is the likelihood ratio. There is an asymptomatic chi-squared distribution in the likelihood ratio test with degrees of freedom (Komunjer & Zhu, 2020). The formulation of the hypothesis is: 0 : 1 = 2 = ⋯ = = 0 (free or independent variables do not affect the model) 1 : there are independent variables that involve at least one of the whole ≠ 0, where = 1,2, … , . Where are the test statistics: b. Partial Test In partial or individual testing using the Wald Test. Where in this test, there is the formulation of hypotheses, namely: 0 ∶ = 0 (independent variables are not significant to dependent variables) 1 : ≠ 0 (independent variables are significant to dependent variables) Where are the test statistics : Value ̂ is an estimate of the regression parameter, while the value (̂) is a standard error (Haloho et al., 2013). Where the test criteria of the wald test are 0 it will be rejected if the result of > ( ,1) 2 or in other words, − ≤ , so 1 is accepted.
c. Goodness of Fit Hypotheses in testing the suitability of ordinal logistic regression models include: 0 ∶ has no difference between observation results and prediction results (corresponding model) 1 ∶have a difference between observations and prediction results (models do not match) Goodness of fit is stated in the test statistics as follows: ̂ = probability of I observation in the j category = − ( + 1) J = the amount of covariate and p = total parameters in the model For the test criteria 0 is rejected when value > ( ; ) 2 or, more easily, the significant value is less than .

d. Classification Accuracy
The Ordinal logistic regression model produces HDI classification based on the variables that influence it. The quality of the model is determined from the classification accuracy using APER (Apparent Error Rate). The smaller the APER score, the better the classification accuracy (Silvestre & Ling, 2014). The APER can be determined as follows: where, 1 , 2 = the wrong amount of classification data in the prediction 1 , 2 = amount of data Thus, the calculation of the classification accuracy value is written with 1 − .

C. RESULT AND DISCUSSION
There are 38 HDI regency/city data in East Java used as observation material in this study. No regency/city is included in the low category in the HDI categorization. The grouping of districts/cities in East Java as shown in Figure 2. Based on Figure 2, the HDI value of cities/regencies in East Java categorized into the medium, high, and very high. There is no city/regency in the low category. Fourteen regencies/cities are included in the medium category, twenty regencies/cities classified as high categories, and four regencies/cities that categorized into the very high class. Furthermore, the estimation of the ordinal logistic regression model parameters is carried out on dependent and independent variables. The results of the estimated parameters are obtained as shown in Table  2. Based on Table 2, the result of its model's estimated parameter has two values of the dependent variable. This resulted in two cumulative logit models formed as follows: ( 2 ) = 58.717 − 4.709 1 + 0.380 2 + 0.026 3 − 0.004 4 + 0. The above models are early models that have not been tested for effectiveness. To obtain an ordinal logistic regression model with influential independent variables, it must be tried on stability or the partial basis of dependent variables. The following are some parameter tests carried out to determine the effect of the independent variable on the dependent variable in the model and how the fit of the model and the quality of the resulting model: a. Parameter Overall Test To find out if there are independent variables that affect the model. Here are the overall parameter test results listed as shown in Table 3. Based on the results of the overall test parameters in table 3, get 2 a value of 15,772. Where the hypothesis 0 is rejected because of the value − < , it can be concluded that one or more independent variables affect the model.

b. Parameter Partial Test
Partial testing of parameters is performed to determine from each significant or not independent variable to the ordinal logistic regression model. After partial testing, the parameters obtained outputs as shown in Table 4 and Table 5.  table 4 and table 5 above, it was found that the hypothesis 0 is rejected on 2 and 4 ; they are a variable of high school participation rates and health facilities because they have − < , in which 0.012 for variables 2 and 0.044 for variables 4 . After deciding on the two variables, the second stage of the Wald test is retested, and − is smaller than the value . So, it can be concluded that the variable of high school participation rates and health facilities partially affects the value of HDI.

c. Goodness of Fit
After a series of observations related to testing parameters partially and in whole. So, testing the fit model of the ordinal logistic regression is carried out. The output obtained as shown in Table 6. Based on the model conformity test results in Table 6 above, a deviance value of 15.772 was obtained or a p-value of 1. So, 0 is accepted for − > , where = 0.05. Then it can be concluded that the observations with prediction results have no difference or can be said to be the appropriate model.

d. Ordinal Logistic Regression Final Model
After a comprehensive and partial parameter test, the final model of ordinal logistic regression is obtained as follows: ( 2 ) = 58.717 + 0.380 2 + 0.004 4 ( 3 ) = 83.702 + 0.380 2 + 0.004 4 Two models are obtained in the final ordinal logistic regression model with constant values and coefficients 2 and 4 with positive values. So that the higher the value 2 and 4 it will tend to increase HDI.
To understand the model of ordinal logistic regression through odds ratio. The odds ratio results will determine the factors studied related to HDI or not. Obtained results from variable odds ratio 2 by 1.462, the variable high school participation rate is estimated to have a higher chance of increasing the medium and high category HDI by 1.462 times than the very high category HDI. Meanwhile, the result of the variable odds ratio 4 by 1.004, in which the variable of health facilities is estimated to have a higher chance of increasing the HDI of the medium and high categories by 1.004 times compared to the HDI of the very high category. It can be said that when the participation rate of high school and the number of health facilities rises, it will tend to increase the value of the Regency/City HDI, which falls into the category of medium and high HDI. Thus, it will tend to increase the value of HDI in regencies/cities that turn from the medium and high categories to a very high category to increase HDI significantly in East Java.

e. Classification Accuracy
Reviewing classification accuracy in classifiers of the low, medium, high, and very high HDI categories can use an Apparent Error Rate (APER). The output of the classification accuracy based on HDI categories as shown in Table 7. Based on Table 7 results from the accuracy of classification in HDI data, no one falls into category 1. The prediction results in category 2 are 12 data, but there are 14 data, so there are 2 prediction results that fall into category 3. The results of the predicted value of HDI in category 3 there are 19 data, but there are 20 data, so there is 1 wrong prediction result in category 3. One of the incorrect data falls into category 2. Meanwhile, the results of the prediction of HDI value in category 4, there are 4 data. The results are correct according to the actual data. Therefore, the apparent error rate can be calculated from the results listed in the Table 7. The calculation of the APER is as follows. So that the following calculations can obtain the accuracy value of classification. ClassificationAccuracy = 1 − 7,89% = 92.11% By obtaining a classification accuracy score of more than 90%, it can be said that ordinal logistic regression is a very compatible method used to analyze factors that affect HDI in East Java. This study uses seven independent variables that are thought to affect HDI, but based on the ordinal logistic regression model, it was obtained that significant factors affected the value of HDI in East Java there were two factors/variables, namely high school participation rates and health facilities.
Another research related to HDI modeling in East Java, West Java, and Central Java provinces used ordinal logistic regression (Pramesti & Indrasetianingsih, 2019). This research uses independent variables, including GDP, the percentage of households with access to decent drinking water sources, the percentage of poor people, the labor force's participation rate, and the open unemployment rate. In the study, factors that affect HDI in East Java are the open unemployment rate and the degree of accuracy of its classification of 76.31% and factors that affect the HDI of the three provinces, namely the poverty rate, GDP, and decent drinking water sources with a classification rate of 73%.
In addition, other research by (Azizah & Pramoedyo, 2019) regarding the effectiveness of the Ordinary Least Square and Geographically Weighted Regression models in HDI in East Java Province, wherein this study used variables of gross participation of public high schools, infant mortality rates, average expenditure per capita and the number of Public health center. The study found that variable infant mortality rates, average per capita spending, and public high school participation rates affected HDI with a score 2 of 78.86%.

D. CONCLUSION AND SUGGESTIONS
Based on a series of discussions related to the analysis of HDI factors in East Java using ordinal logistics regression obtained two models with categories of medium HDI values and high HDI values. As for the very high HDI value, it becomes a comparison model. From the ordinal logistic regression model, it was obtained that significant factors affected the value of HDI in East Java there were two factors/variables, namely high school participation rates and health facilities. The resulting ordinal logistic regression model has an excellent classification accuracy value of 92.11%. This study resulted in a minimum error value of 7.89%, so similar research in other case studies with categorical variables can use this ordinal logistic regression method. In addition, future studies can also add some new variables that can affect the value of HDI.