GRG Non-Linear and ARWM Methods for Estimating the GARCH-M, GJR, and log-GARCH Models

ABSTRACT


A. INTRODUCTION
In nearly four decades, since the seminal papers of Bollerslev in Engle et al. (2019) on GARCH (Generalized Autoregressive Conditional Heteroscedasticity) model, study in theory and empirical finance area is dominated by GARCH-type models. Their models have attracted a special interest from academics and practitoners because volatility (square root of variance) of asset returns can be considered as a time-varying process (or exhibit heterscedasticity), so that it can be used to measure the asset risk. When the volatility is high, the risk is high and leads to great uncertainty which may discourage investors to invest in asset markets. Therefore, it is important to have the approriate volatility model which is suitable to financial time series data. appropriate model. Final section gives some conclusions and offers some possible extensions to this study.

B. METHODS
This section describes the basic concepts of the four GARCH(1,1)-type models, formulates the log-likelihood function for the models, explains how to apply the Excel's Solver and ARWM methods, and provides two criteria to select models from the GARCH-type family.

GARCH(1,1) model
The GARCH model generalizes the ARCH (Autoregressive Conditional Heteroscedasticity) model of Engle in Mostafa et al. (2021) by allowing the present conditional variance is not only affected by the past returns, but also by the past conditional variances. Let be the asset return at time t and formulated by = 100 × (ln − ln −1 ), where be the asset price at time t. Most famous and successful version of GARCH-type models in practice is GARCH(1,1) expressed by: = + , ~(0, 2 ), = 1, 2, … , 2 = + −1 2 + −1 2 , = 2, 3, … , } in which can represent the average risk premium and is the stochastic error (also called error or shock) term. In particular, this study sets equal to zero as in most financial studies. The above model has the following constraints: > 0, ≥ 0 and ≥ 0 to ensure the positive variance and 0 ≤ + < 1 to ensure the stationary variance. Engle et al. in Rodriguez (2017) generalized the ARCH model by recognizing a direct relationship between the return and its conditional variance. This is established by expressing the conditional mean of the return process as a function of the conditional variance. In the GARCH context, the GARCH(1,1)-M model is particularly defined by:

GARCH-M(1,1) model
in which is arbritary and the constraints in the conditional variance equation as in the GARCH(1,1) model. Thus the GARCH-M specification generalizes the GARCH-type models by including the additional regressor. The term 2 in Eq. (2) is often referred to a risk premium. As conditional variance 2 varies over time, the risk premium should also be varying. The significantly positive or negative value on k implies an increase on conditional variance, indicating an increase in risk that leads to an increase or a decrease in the conditional mean of return (Abonongo et al., 2016). Thus k can be interpreted as the risk premium which represents the compensation required by investors for taking on the additional risk. A significantly positive risk parameter suggests a positive relation between return and risk which implies that an investor is compensated for assuming higher return or risk on the asset. Meanwhile, a negative relation could imply that during great volatility times an investor reacts to other risks factors than the variance of aset from their historical mean.
The following is interpretation for asymmetric parameter . The value of = 0 reduces the model to the GARCH(1,1) model which reflects bad and good news symmetrically. It means that different news have the same effect on the conditional variance (or volatility). Meanwhile, the value of ≠ 0 implies asymmetric variance, that is, different news have a different effect on the future conditional variance. Bad news implies an effect of + on the future conditional variance, while good news implies an effect of on the future conditional variance. Hence, the value of > 0 reflects a greater effect of bad news than the arrival of good news on the future conditional variance.

Log-likelihood functions for the model
When estimating GARCH models, we need to find the values of the parameters which maximize likelihood or its logarithm. Generally, rather than working on likelihood, it is easier to maximize the log-likelihood. Under the specification of normal distribution, total loglikelihood function of returns conditional on parameter for the above models can be expressed in a form as follows: The above function is for the GARCH-M(1,1) model when ≠ 0 or for the others when = 0. Many empirical studies have found a fact that the distribution of financial asset returns has heavy tails. Liu et al. (2017)suggested replacing the normality assumption with that of the Student-t distribution to capture the heavy tails. The likelihood function for a random variable which has a Student-t distribution with mean of zero, variance 2 , and degrees of freedom is defined as follows: in which Γ(•) is the gamma function. By letting tend to infinity, it can recover the normal distribution. The lower the degrees of freedom, the higher the kurtosis and the heavier the tails. Total log-likelihood for the models with Student-t distribution is now given by (7)

Estimation Method
On the basis of its log-likelihood function, the model is fitted to real data and estimated by using the Excel's Solver tool, which is easy to use and has been widely practiced by academics and practitioners in finance. The estimation of the GARCH(1,1) model using the Excel's Solver was studied in some studies, e.g., Nugroho et al. (2018), , and Nugroho, Kurniawati et al. (2019). In Excel's Solver we particularly choose the default GRG non-linear solving. This Solver method works by looking at the gradient of the objective function as the decision variables change. In the Excel spreadsheet, the existing values of the decision variables are taken as its initial solution and the objective function is improved by considering small changes in those variables (Powell & Batt, 2008). In the case where our goal is to maximize a log-likelihood, it gradually marches ``uphill'' until it reaches an optimum solution.
Meanwhile, to compare performance of Excel's Solver, we employ the Adaptive Random Walk Metropolis (ARWM) method proposed by Atchade-Rosenthal in Nugroho et al. (2021) in MCMC scheme by writing own code in Matlab software. This method was successfully applied by Nugroho (2018) for estimating the GARCH(1,1) models, where the ARWM method is statistically more efficient and has a faster CPU time than the Hamiltonian Monte Carlo method that was employed by Nugroho & Morimoto (2015 in the context of stochastic volatility model. The basic idea of ARWM method is to update the candidate sample by adapting the step size which is based on the acceptance rate. For a parameter , a candidate parameter value at iteration ( + 1) is generated as follows: where the step size of Δ ( ) will be updated to be 0.6 , in which is the number of accepted candidates. In this case, an acceptance of the candidate is determined by comparing the values of the posterior density of the current and candidate sample values conditional on the observation. The posterior density according to Bayes' theorem is defined by ( | ) = ℒ( | ) + log ( ), where R denotes the vector of daily returns and log ( ) is the prior distribution for .

Model Evaluation
This study investigates performance of GARCH-type models in terms of their loglikelihood values. Since the GARCH model is clearly nested within both the GARCH-M and GJR models, a Chi-square-difference test (i.e., Log-likelihood Ratio Test (LRT) is performed to test the goodness of fit. Suppose the null model is M0 with the log-likelihood ℒ( 0 ) and the alternative model is M1 with the log-likelihood ℒ( 1 ), the LRT statistic is defined by Perneger (2021): LRT( 0 , 1 ) = 2(ℒ( 1 ) − ℒ( 0 )).
(8) The alternative model is better than the null model if the LRT statistic is greater than the critical value. In particular, the critical values of the Chi-square distribution with one degree of freedom at the significance level of 1%, 5% and 10% are 6.64, 3.84 and 2.71, respectively. In this case, the degree of freedom is the difference in the number of parameters of the competing models. When the two competing models are not nested, the best model selection is based on AIC (Akaike Information Criterion), formulated as follows (see Portet (2020) in which be the number of estimated parameters in the model. Given a set of competing models fitted to the data, the best model is the one with the minimum AIC value.

C. RESULT AND DISCUSSION
This section demonstrates the performance of presented GARCH models on the basis of fitting models to real data sets. The models is based on the assumption that error follows Student-t distributions.

Data and Descriptive Statistics
The empirical analyze is on the bases of daily returns of the Dow Jones Industrial Average (DJIA), Standard and Poors 500 (S&P 500), and S&P CNX Nifty stock indices. The sample period is from January 2000 to December 2017. These data have been downloaded from Oxford-Man Institute of Quantitative Finance (https://realized.oxford-man.ox.ac.uk) and are available to the public.
To investigate the distributional properties of the daily return series of all three considered datasets, Table 1 reports the summary of the descriptive statistics. As expected, the three stocks exhibit mean close to zero for their time series of returns. The values of skewness for all data ranges are close to zero (except for S&P CNX Nifty), indicating that the data distribution are nearly symmetrical. The values of kurtosis for all data are greater than 3, meaning that the distribution of returns has heavier/fatter tails than a normal distribution. The kurtosis leads to the result that the distribution of daily returns is clearly not normally distributed. This finding is confirmed by the value of the Jarque-Bera statistic which is greater than critical value, as shown in Table 1.

Implementing the Estimation Methods
We ran the Excel's Solver and MCMC algorithm using the sets of initial solutions for decision variables as follows: for GARCHt-M(1,1): In the Solver Options dialog Box, on the GRG non-linear tab, we kept the default GRG nonlinear solving method. The above initial values are selected to reflects estimates typically found in empirical studies since the Excel's Solver is very sensitive to the initial values as noted by  and Nugroho, Kurniawati, et al. (2019).
Meanwhile, the Bayesian MCMC algorithm was run for 6000 iterations, where the first 1000 iterations are discarded to reduce the bias caused by the selection of initial values. To complete the Bayesian model, following Nugroho et al. (2021), the prior distributions for each parameter were specified as follows: ,~(0,10), , ,~(0,1) [ , , >0] ,~exp(0.01), where [ ] = 1 if condition holds and 0 otherwise. In particular, the initial step size in ARWM method was set to 0.05.
This result does not indicate that the IGARCHt-M(1,1) model better fits than the GARCHt-M(1,1) model, since Matlab produces the log-likelihood value of -5653.56 for the IGARCHt-M(1,1) model, which is smaller than the log-likelihood value of -5642.86 for the GARCHt-M(1,1) model, as shown in Table 2. Although several parameter estimates do not satisfy the model constraints, those values relatively do not seem to greatly impact other estimates. It is possible to appear a violation since Excel's Solver only allows greater/less-than-or-equal-to in a constraint. Furthermore, by comparing the estimation results produced by the Excel's Solver and Matlab tools, the results show that the GRG non-linear dan ARWM methods produce estimation values, indicating that the Excel's Solver is a powerful tool to estimate the presented models.
Regarding the effect of applying Student-t distribution for return error in the GARCH-M(1,1), we noted that this distribution implies a greater positive relationship between return and risk in comparison to the normal distribution, i.e. 0.0178 by the GRG non-linear (GRG-NL) method and 0.0589 by the ARWM method for DJIA, 0.0075 by the GRG non-linear method and 0.0470 by the ARWM method for S&P500, 0.0157 by the GRG non-linear method and 0.0367 by the ARWM method for S&P CNX Nifty. The parameter of risk premium, k, is positive and statistically significant (in Matlab output) for all cases, which implies that the return increases as variance increases. This result is consistent with a portfolio theory of a positive and statistically significant of risk premium parameter, which states that the level of return is positively related to its past conditional variance (Mostafa et al., 2017). It interprets that investors on the stock indices are rewarded by higher returns for taking additional risk (high variance).
For the asymmetric parameter  in the GJR(1,1), the Student-t distribution for return error does not greatly affect the parameter estimates produced by the models under normal distribution. The only exception is the S&P CNX Nifty data. On the basis of Matlab results, the estimates of  is positive and statistically significant in all cases. It means that past negative return generates a higher current variance than positive one of the same magnitude. This result is consistent with the finding in the stock market. A contrast result was given by Bouri (2014) and Bouri and Roubaud (2016) in the wine market, where variance respons more to positive return and it concludes that asset acts as a safe-haven, sheltering in periods of falling asset prices. In accordance to their interpretation, our result on positive asymmetric parameter suggests investors not to actively invest in the DJIA, S&P500, and S&P CNX Nifty assets since investment in these asset will not flourish.
We next analyze the persistence and half-life of variance. The variance persistence refers to the property of momentum in conditional variance. It is denoted by and given by = + for the GARCH(1,1) and GARCH-M(1,1) models, by = + + 0.5 for the GJR(1,1) model, and by = | + | for the log-GARCH(1,1) model. As reported in Table 3, the persistences produced by Excel's Solver's GRG non-linear and ARWM methods are similar in the same model and data, with the exception of the results which tends to produce IGARCH parameter estimates. Persistences are the highest in the models under Student-t specification. The only exception is the S&P CNX Nifty data adopted to log-GARCH models. These results indicates that specifying Student-t distribution for error implies the conditional variance is high persistent and less volatile. In general, we also note that persistence implied by GARCH(1,1) model is higher than those in the extended models, with the exception of the log-GARCH(1,1) model under normal specification and adopting S&P CNX Nifty. These results show that adding the risk premium in the returns process or allowing asymmetric effect causes the conditional variance is less persistent and more volatile, as shown in Table 3. The persistence in variance is then extended to investige the half-life measure of the variance shock. This half-life refers to the speed of mean reversion and measures the average time periods for the variance to move halfway back to its long-term average (Ahmed et al., 2018). Given the persistence , the variance half-life is defined as Lhalf = log(0.5)/log(), indicating higher persistence exhibits longer-memory (weaker mean reversion). On the basis of persistence values, the models with Student-t distributed error have the longer variance half-life in almost all cases. It means that any informations/shocks to variance in those models do not die out quickly rather its effect endures. Therefore, the incorporation of Student-t distribution into the return error is suggested for long-term investments. In particular, under specification of normal distribution, the conditional variance implied by the GJR-GARCH(1,1) model is faster to move back to its unconditional mean than those implied by the extended models. It means that the asset values adopted in the normal GJR-GARCH(1,1) model are more sensitive to new information since it will pass in shorter period. This suggests investors to operate over a shorter period. Therefore, for short-term investments, the GJR-GARCH (1,1) model can be recommended. Under Student-t specification, all the return series adopting the log-GARCH(1,1) model have strongest mean reversion.

Model Selection
The selection of competing models is made on the basis of two criteria: LRT and AIC, where their estimates are reported in Table 4 for LRT statistics and Table 5 for AIC values. Under Student-t distribustion, LRT statistics indicate that the GARCH-M(1,1) and GJR(1,1) models significantly better fit than the GARCH(1,1) at 1% level in all data cases. The only exception is the result of GRG non-linear for the S&P500 index adopted to GARCH-M(1,1) which produces 5% significance. Superiority of the GARCH-M(1,1) and GJR(1,1) models to the GARCH(1,1) confirms the presence of risk premium and asymmetric parameters. These results are similar to the normal distribution specification in Nugroho, Kurniawati, et al. (2019), as shown in Table 4. On the basis of AIC values from the outputs of both methods, the models with Student-t distributed errors turn out to be more suitable than the models in Nugroho, Kurniawati, et al. (2019) in the case of normal distributed errors. In particular, the GJRt(1,1) model provides a best fitting, followed by the GARCHt-M(1,1), GARCHt(1,1), and log-GARCHt(1,1) models, as shown in Table 5.

D. CONCLUSION AND SUGGESTIONS
This study examined the fitting performance of the three different GARCH(1,1)-type models, including GARCH-M(1,1), GJR(1,1), and log-GARCH(1,1) models by adopting the DJIA, S&P500, and S&P 500 CNX Nifty stock indices. Two distributions (normal and Student-t) for the error density in the returns were considered. From the empirical study, first, the Excel's Solver's GRG non-linear has the capability to estimate the GARCH-type models. Therefore, these tool can be recommended to the practitioners that do not have enough knowledge in the programming language. Second, a comparison of two error distributions showed that the models with Student-t distributed error outperform the models with normal distribution. In this case, the longer-term investments are suggested by the models with Student-t distributions. In particular, we found strong evidence that the daily volatility of financial data can be best explained by the GJR(1,1) models and then by the GARCH-M(1,1) model. This suggests to incorporate a risk premium in the return equation and an asymmetric effect in the conditional variance equation. There are several possible extensions of the above models that deserve further study. It would be interesting to investigate the application of power transformations for the return series as in Nugroho et al. (2021b) and the lagged variance as in (D. B. Nugroho et al., 2021a). It could also be considered a combination of the GARCH-M and GJR models as in Chen et al. (2019), but the model is based on the Student-t distributed error.