Comparison of Spatial Weight Matrices in Spatial Autoregressive Model: Case Study of Intangible Cultural Heritage in Indonesia

ABSTRACT

Intangible Cultural Heritage (ICH) can effectively contribute to Sustainable Development Goals (SDGs) in all economic, social, and environmental dimensions, along with peace and security. Studying ICH in Indonesia cannot be separated from the spatial aspect of how an area's attributes are related to other areas located close to each other. Spatial regression modeling needs to be done by considering the selection of spatial weight matrix. Using the wrong spatial weight matrix will increase the standard error in parameter estimation. Therefore, this study aims to determine: the best spatial weight matrix to accommodate the spatial autocorrelation in analyzing the description of the spread of ICH in Indonesia; and the variables that are thought to influence the number of ICH determination in Indonesia. The spatial regression modeling used in this study is the Spatial Autoregressive (SAR) model and the spatial weight matrices compared in this study are queen contiguity and inverse distance. The best model is the SAR model used the queen contiguity spatial weight matrix because it has minimum values of AIC, BIC,RMSE and MAPE which are 310.397,319.555,18.857 and 57.169 respectively. Simultaneously, involved in performing arts, wearing traditional dress, knowing Indonesian folklore and the spatial lag contribute significantly to number of ICH determination in Indonesia. Partially, only knowing Indonesian folklore have a significant effect on number of ICH determination in Indonesia at significance level α=5%. Each additional 1% of population that knowing Indonesian folklore in an area increases number of ICH determination in that area by 0.6719 units. presents geographic location information or an accurate picture of an area on the earth's surface (Cressie, 1993). Unlike Meng's research, this study focuses on cultural product factors rather than social, economic, or natural factors using spatial analysis. Spatial analysis analyses the data that refer to positions, objects, and the relationship among them in the earth's space. The indications of attribute linkages between adjacent regions become the basis for spatial analysis, known as the spatial regression model, which is a model that can define the relationship of variables with spatial linkages (Anselin, 1990).
In addition, unlike Meng's research, which used the Geodetector method, this study used the spatial regression method. Spatial regression models need to be done by considering the relationship between locations, which is described by a weight matrix. A contiguity matrix describes the relationship between regions or the close relationship between regions (Suryowati et al., 2018). Another approach to determine the weight matrix is to use the distance band. (Anselin & Rey, 2014) explain that building a simple weight matrix is based on distance measurements when i and j are considered neighbors if j falls within a predetermined critical distance (distance band) from i.
In examining spatial weight, (Arlinghaus et al., 1996) states that the selection of the weight matrix must be carried out specifically and arranged appropriately. The specification of the weight matrix will represent information on the scope and intensity of spatial effects of a location unit in the geographic system (Trisilia, 2014). In several studies, the use of weight matrix in spatial regression models is still diverse, trial and error, and based on habit. Using the wrong weight will increase the standard error in parameter estimation, while using the correct weight will minimize it. Furthermore, this study aims to determine: (i) the best weight matrix to accommodate the spatial autocorrelation in analysis the description of the spread of Intangible Cultural Heritage in Indonesia, and (ii) the variables that are thought to influence the number of Intangible Cultural Heritage determination in Indonesia.
There is still less research concerning Intangible Cultural Heritage within spatial regression analysis; studies related to Intangible Cultural Heritage are critical to maintaining and protecting cultural heritage; this makes researchers interested in further study, especially in spatial regression model. Spatial Autoregressive Model (SAR) is a spatial regression model that is pretty effective for estimating data with a spatial dependency effect.

B. METHODS
The data used in this study is secondary data sourced from the Central Bureau of Statistics and Ministry of Education and Culture. The data obtained from the cultural heritage page.kemdikbud.go.id is data on the determination of intangible cultural heritage in Indonesia which has been determined until February 2022, which is 1,511 intangible cultural heritages. Meanwhile, the data obtained from BPS is socio-cultural statistical data for 2018. In more detail, it can be seen in Table 1. The units of observation and analysis in this study cover 34 provinces in Indonesia, as shown in Table 1. The method of analysis in this study is to use the Spatial Autoregressive (SAR) model. Spatial autoregressive (SAR) is a combination of a simple regression model with spatial lag using cross section data (Lesage, 1999). The general models of SAR are: = + + , ~(0, 2 ) (1) With denotes response variable, denotes predictor variable, denotes spatial autocorrelation coefficient on the response variable, denotes spatial weight matrix, denotes intercept and regression coefficient and denotes error.
Spatial autoregressive model is a model whose response variables are spatially correlated. Parameter estimation using Maximum Likelihood (Lesage, 2004) method and the following equation is obtained: With ̂ is an estimator of the regression parameters based on the spatial autocorrelation and the weight matrix . However, this form cannot be solved directly because the value of is unknown. So, the log-likelihood concentrated function is used as follows (Anselin, 2001).
With is a constant. Equation (3) is a nonlinear function in one parameter and is maximized using a numerical technique with direct search. Data processing was carried out using R software version 4.1.2 and thematic map creation using Q.GIS Desktop 3.16.15. Steps of data analysis were carried out as follows: 1. Exploring data based on thematic maps of all provinces in Indonesia so that data distribution patterns can be known; 2. Test the classical assumptions of multiple linear regression. The assumption test carried out is as follows: a. Normality test Normality test was carried out using the Shapiro Wilk test. This test is used to identify whether a random variable is normally distributed or not, checking the assumption of normality from errors or random errors and is suitable for small samples (Shapiro & Wilk, 1965). The hypothesis testing is: With denotes Shapiro Wilk's test coefficient, − +1 denotes ( − + 1)-th data, denotes -th data, ̅ denotes average of data. The significance of the 3 value test is compared with the table value of Shapiro W, to see the position of the probability value (p). The decision criteria in making conclusion is to reject This test is carried out using the Breusch-Pagan (BP) test, the BP test is used to test whether there is diversity between provinces. However, this test is affected when there is a spatial autocorrelation in the error term. BP test statistics were spatially adjusted for heteroscedasticity (Yamagata et al., 2011). The hypothesis is: With denotes OLS residual vector , 2 denotes variance based on OLS residual, denotes matrix + 1 of constants and variables that cause heteroscedasticity. The decision criteria in making conclusion is to reject 0 if BP > ( +1) 2 .

c. Non-Autocorrelation Test
This test was carried out with the Durbin Watson test. The non-autocorrelation assumption requires freedom between errors for each Y observation value, which means that the error in a certain observation is not affected by the error in other observations (Gujarati, 2003). Hypothesis testing: 0 : ̂= 0(There is no autocorrelation) 1 : ̂≠ 0 (There is autocorrelation) Test Statistics: With denotes Durbin Watson test statistic, denotes error of the regression model at time t, −1 denotes error of regression model at time (t-1), 2 denotes Square of error of the regression model at time t. The decision criteria are: If ( < ) or ( > 4 − ) then 0 is rejected, meaning that there is an autocorrelation in the residual; If (dL < d < dU) or (4 − dU < d < 4 − dL) then the Durbin Watson test does not produce definite or inconclusive conclusions. And if (dU < d < 4 − dU) then it fails to reject 0 , meaning that there is no autocorrelation between the residuals. d. Non-Multicollinearity Test The detection of multicollinearity according to (Johnson & Wichern, 2007) can be seen through the Pearson correlation coefficient ( ). If the coefficient between variables > 0.95 then there is a correlation between these variables. In addition, you can also use the Variance Inflation Factors (VIF) which is stated as follows: With 2 is the coefficient of determination between and other predictor. The value of which is greater than 10 indicates the existence of collinearity between the predictor variables.
3. Create a spatial weight matrix based on queen contiguity and inverse distance. Spatial weight matrix (W) is basically a matrix that describes the relationship between regions and is obtained based on distance or neighbour information. Because the weight matrix shows the relationship between all locations, the dimensions of this matrix are N×N, where N is the number of locations or the number of units across objects (Dubin, 2009). The diagonal of the weighted matrix is zero. Meanwhile, is an element of W, namely the element in the i-th row and j-th column, which describes the relationship between the i-th location and the j-th location with > 0 if the i-th location is related to the j-th location. Several approaches can be taken to display the spatial relationship between locations, including the concept of contiguity and the concept of distance. The concept of contiguity or the concept of adjacency is based on geographically neighbouring relationships (Dubin, 2009). The types of spatial weight matrices include rook contiguity, bishop contiguity, and queen contiguity. While in the concept of distance, the elements of the spatial weight matrix are represented in the form of a distance function. In principle, the weight of the distance between a location and the surrounding locations is determined by the distance between the two areas. One method that is often used is the inverse distance method. The shorter the distance between the locations, the greater the weight given. The distance matrix that is generally used is the inverse distance as follows: With represents the Euclidean distance from location i to location j. The general form of the spatial weight matrix is: 250 | JTAM (Jurnal Teori dan Aplikasi Matematika) | Vol. 7, No. 1, January 2023, pp. 244-261 With denotes Moran Index, denotes Moran index test statistic, ( ) denotes expected value of Moran index and Var(I) denotes variance of Moran index. The decision criteria in making conclusion is to reject 0 if > (α/2). Moran's I is a measure of the correlation (relationship) between observations that are close together. This statistic compares the value of observations in one area with the value of observations in other areas. According to Moran's I can be measured using the equation: With denotes number of observations, ̅ denotes average value of { } from locations, denotes value at location i, denotes value at location j, denotes Spatial weight matrix elements. The value of I is the same as the correlation coefficient, which is between -1 to 1. A high value means that the correlation is high, while a value of 0 means that there is no autocorrelation. However, to say whether or not there is autocorrelation, it is necessary to compare the value of the I statistic with the expected value. The expected value of I is: According to (Lee & Wong, 2001) the test statistic used is derived from the standard normal distribution in equation (12). An important aspect in determining spatial autocorrelation is determining the relationship between the closest region, the area around the observed area is thought to have an influence on the observed area. According to Tobler's first law, everything is related, but something closer is more related (Fischer & Wang, 2011). According to (Lembo, 2006) Spatial autocorrelation is the correlation between variables and themselves based on space or can also be interpreted as a measure of the similarity of objects in a space (distance, time and area). 5. Investigate whether there is a spatial dependence on lag by using the Lagrange Multiplier (LM) test. The Lagrange Multiplier (LM) test is used to see whether there is a spatial dependency on lag or error so that it can determine the type of spatial analysis that is suitable for use. The test statistic used to determine spatial dependencies is the LM test (Anselin, 1988). The hypothesis of the LM test is as follows: 0 : = 0 (no spatial dependence on lag or error) 1 : ≠ 0 (there is a spatial dependence on lag or error) Test Statistics: The decision criterion in making conclusions is if the test statistic value is greater than the chi square value (reject 0 ), then the model used is the Spatial Autoregressive (SAR) model. The LM test for error spatial dependencies is as follows: The decision criterion in making conclusions is if the test statistic value is greater than the chi square value (reject 0 ), then the model used is the Spatial Error Model (SEM). If both and are significant, one of the best models can be chosen by comparing the Akaike Information Criterion (AIC) values. The model with the smallest AIC value is the best model (Zebua & Jaya, 2022). 6. Parameter estimation of two SAR models using the Maximum Likelihood method, one model using the queen contiguity weight matrix and the other model using the inverse distance; 7. Selection of the best model by comparing the values of AIC, BIC, MAPE, and RMSE generated from each model with the spatial weight used (queen contiguity and inverse distance). The criteria for selecting the model used in this study are as follows: a. Akaike Info Criterion (AIC) To choose the best regression model found by Akaike and Schwarz (Grasa, 1989). The method is based on the Maximum Likelihood Estimation (MLE) method. Denoted by: = −2 + 2 (21) With denotes Maximum log-likelihood and denotes number of parameters in the model. Models with small value of AIC are the best (Wei, 2006). b. Bayesian Information Criterion (BIC) According to (Stoica et al., 2004), the order selection rules (AIC/BIC) have the same form: −2 ln ( ,̂) + ( , ) but with different penalty coefficients ( , ). For BIC ( , ) = ln while for AIC ( , ) = 2. Then the BIC formula becomes: 252 | JTAM (Jurnal Teori dan Aplikasi Matematika) | Vol. 7, No. 1, January 2023, pp. 244-261 = −2 ln ( ( ,̂)) + ln (23) With denotes number of parameters in the model, ( ,̂) denotes the maximum likelihood function of the model, denotes number of observations. The best model is determined based on the smallest BIC value. c. Root Mean Square Error (RMSE) According to (Webster & Oliver, 2007), one of the most frequently used criteria or measures to compare between two or more models in spatial analysis is the Root Mean Square Error (RMSE). The formula for calculating RMSE is as follows: With denotes number of observations, ̂ denotes the value of the prediction, denotes true value of observation. The smaller the RMSE value of a model, the more accurate the model is.

d. Mean Absolute Percentage Error (MAPE)
MAPE is a measure that reflects the mean absolute percentage of prediction error over the true value (Sharma et al., 2012). The smaller value of MAPE value, the more accurate the model is. The formula for calculating MAPE is as follows: 8. Interpretation of the model based on the best model that has been obtained; 9. Diagnostic testing of SAR model.  Based on Figure 1, in general the provinces located on the island of Java still dominate the number of Intangible Cultural Heritage determinations in Indonesia, which is 31.41% of the total existing ICH. The province with the highest number of Intangible Cultural Heritage (ICH) determinations in Indonesia is the Special Region of Yogyakarta (130 ICH), followed by Central Java (96 ICH) and West Java (85 ICH). Meanwhile, Central Kalimantan, West Papua, and West Nusa Tenggara occupy the lowest position with the least number of ICH determination. This shows a tendency for spatial autocorrelation, that is, a province with a high number of ICH determinations is close to one another or close to other provinces with a fairly high number of ICH determinations and conversely a province with low ICH determinations is close to one another or close to provinces with fairly low ICH determinations, as shown in Table 2. One of the efforts made in advancing Indonesian culture according to the Law of the Republic of Indonesia Number 5 of 2017 is through the protection, development, utilization, and fostering of culture. The main targets for promoting cultures are the Cultural Advancement Object (CAO), which includes performing arts, verbal traditions (fairy tales, folklore), folk games, and traditional knowledge, sports, ceremonies, and products (regional/traditional dress and crafts), as shown in Figure 2. One form of appreciating art performances/exhibitions can be seen based on the population percentage (aged five years and over) involved in art performances/exhibitions during the last three months. The profession of an artist is still tiny in demand by the public. This can be seen in Figure 2, which shows the small percentage of the population involved in performing arts/exhibitions in the last three months. Bali and the Special Region of Yogyakarta occupy the top positions for involvement in art performances/exhibitions over the past three months. Here, the involvement concept is deliberately taking the time to perform arts performances/exhibitions or provide direct entertainment to the audience, either as actors or supporters (BPS, 2019).
The next Cultural Advancement Object (CAO) is traditional products, especially regional/traditional dress. Indonesia has a variety of regional/traditional dress. Based on Figure 3, it can be seen that the percentage of households using regional/traditional dress for the last three months varies significantly in each province. The highest percentage of households using regional/traditional dress is in the provinces of Bali (75.09%) and East Nusa Tenggara (60.71%). The high percentage of the use of regional/traditional dress in the province of Bali is inseparable from the religious ritual activities carried out daily, especially for Hindus in Bali, as shown in Figure 3. In addition to performing arts and traditional products, verbal tradition is one of the cultural advancement objects in Indonesia. Verbal traditions are passed down from generation to generation by the community, including in the form of fairy tales/folklore. The percentage of Indonesia's population (aged five years and over) who knows fairy tales/folklore is relatively high. Almost all provinces are above 50% in terms of knowing fairy tales/folklore in Indonesia, except for the provinces of East Nusa Tenggara (49.89%) and Papua (38.86%). The highest percentages are in the Riau Islands Province (88.63%) and West Sumatra (88.23%). Several fairy tales/folklore in Indonesia have been included in school lessons. In addition to textbooks, people can access Indonesian fairy tales/folklore more freely through other media such as television, radio, or online/streaming, as shown in Figure 4.

Spatial Analysis
Before conducting SAR modeling, it is necessary to test the classical assumptions of multiple linear regression and Moran's I tests. Table 3 shows that almost all of the classic assumptions in multiple linier regression model have been fulfilled, and 33.03% of number of ICH determination in Indonesia can be explained by the three predictor variables. Because d < dL then H0 is rejected, it means that there is an autocorrelation in the residuals for the multiple regression model, as shown in Table 3. In this study, two spatial weight matrices were used. The first is a spatial weight matrix with a neighboring basis using a queen contiguity. And the second is a spatial weight matrix based on distance using an inverse distance matrix. The results of the spatial autocorrelation test by conducting the Moran's I Test using two types of spatial weights can briefly be seen in Table 4. According to the results in Table 4, it can be seen that there is a spatial autocorrelation in the variables Y, X1, X2, X3 and almost all of the results are significant using two types of spatial weight. This can be seen from the p-value which is almost entirely less than α = 5%, there is one that is not significant, which is wearing traditional dress (X2) variable with inverse distance spatial weight. The value of Moran's I statistic is entirely in the range of 0 to 1, which means that the closer an area is, the more similar the variable values will be.
Likelihood Ratio (LR) test is used to determine which model is better, whether the spatial regression model or the multiple regression model. And Lagrange Multiplier (LM) test is used to determine the spatial dependence more specifically whether the dependency on a response variable (lag), dependency on other variables that are not studied (error), or both (lag and error).
According to the results in Table 5 it can be seen that for both spatial weight matrices LR Test are significant because the p-value is smaller than α=5%, it means there is a very significant difference between the spatial regression model and the multiple regression model. And it can be seen for both spatial weight matrices the spatial dependence in lag are significant because the p-value is smaller than α=5%, and only for queen contiguity spatial weight matrices the spatial dependence in error is significant. So, the SAR model will be applied in this study because for both spatial weight matrices the spatial dependence are significant. The SAR model is a spatial regression model that involves spatial lag in the response variable, as shown in Table 5. Estimation of SAR model parameters using the Maximum Likelihood method. The estimation results using the two spatial weight matrices are shown in equation (26) and equation (27)  In this study to determine the best model used several statistics such as AIC, BIC, RMSE and MAPE. A good model is a model that has the smallest value for the four statistics. The results of the model selection criteria can be seen in Table 6. According to the results in Table 6, information can be obtained that from the two SAR models with two spatial weights used in this study, the model chosen as the best model is the SAR model with queen contiguity spatial weight. This is because the model has minimum AIC, BIC, RMSE and MAPE values. A model with a lower AIC and BIC fits the data better than a model with a higher AIC and BIC, and a model with a lower RMSE and MAPE has lower prediction errors over the true value than a model with a higher RMSE and MAPE. The parameter estimation result of the SAR model with queen contiguity weight can be seen in Table 7. Significant codes: ** ) 0.05, * ) 0.1 According to the results in Table 7, the value of spatial lag variable (Rho) has a positive and significant coefficient in influencing the number of ICH determination in Indonesia. This means that the greater number of ICH is influenced by a large number of ICH in the surrounding area. And the P-value of the Wald test is 0.0046715, indicating that the relationship between number of ICH determination and all predictor variables is spatially dependent. Simultaneously, all predictor variables and the spatial lag all contribute significantly to number of ICH determination in Indonesia. Partially, only knowing Indonesian folklore (X3) have a significant effect on response variable at significance level α=5%. While involved in performing arts (X1) and wearing traditional dress (X2) do not have a significant effect.
In the SAR model, the impact of covariates can be categorized into three types, which is direct impact, indirect impact, and total impact (Zebua & Jaya, 2022). Direct impacts are impacts that occur locally in an area as a result of changes in predictor variables. Indirect impacts are impacts that occur when changes in predictor variables occur in the surrounding area. And total impact is the change that occurs in an area as a result of changes in the area and its surroundings. The size of the direct and indirect impacts of the SAR model in this study can be seen in Table 8. According to the results in Table 8, one predictor variables have a significant direct impact on number of ICH determination, but no variable have a significant indirect impact. In Indonesia, knowing Indonesian folklore has a direct effect on increasing number of ICH determination. Each additional 1% of population that knowing Indonesian folklore in an area increases number of ICH determination in that area by 0.6719 units, as shown in Table 9. To find out whether the SAR model obtained is good, it is necessary to carry out diagnostic tests, including assumptions of normality, non-autocorrelation, and homogeneity. Table 9 shows the results of diagnostic test, can be concluded the SAR model fulfilled normality, non-autocorrelation and homogeneity assumptions.

D. CONCLUSION AND SUGGESTIONS
In general, provinces located on the Java Island dominate the number of determinations of Intangible Cultural Heritage (ICH) in Indonesia, which is 31.41 percent of the total existing ICH. The number of ICH determinations, the percentage of the population involved in performing arts and the percentage of the population who know Indonesian folklore have spatial dependencies based on the Moran Index test using either the queen contiguity weight matrix and the inverse distance. While the percentage of households wearing traditional dress has a spatial dependence based on the Moran Index test using only the queen contiguity weight matrix.
The best spatial econometrics model was chosen based on the LM test is the Spatial Autoregressive (SAR) model and based on the results of MAPE, AIC, BIC, and RMSE, the model with the queen contiguity weight matrix is better than the model with the inverse distance weight matrix. Percentage of population knowing Indonesian folklore has a direct effect on increasing number of determinations of ICH. Each additional 1% of population that knowing Indonesian folklore in an area increases number of determinations of ICH in that area by 0.6719 units.
Suggestions for the government, especially the Ministry of Education, Culture, Research and Technology in order to preserve Indonesian culture so that every policy taken can support the determination of ICH in Indonesia. For further research, other variables can be added that have a strong correlation with the determination of ICH in Indonesia. Besides that for further research on ICH can use the Spatial Error Model (SEM) method.