Optimal Raw Material Inventory Analysis Using Markov Decision Process with Policy Iteration Method

ABSTRACT


A. INTRODUCTION
Materials or goods stored that will be used to fulfill the production process are called inventories. Inventory of raw materials is a major factor that is important to companies in supporting the smooth operation of the production process, both in large companies and in small companies such as home-based business products or small "home made" products. Raw material is a material that is used in the manufacture of a product, then processed so that it becomes a finished product for sale (Ristono, 2009).
In optimal control of raw material inventory, the company must provide an amount of certain raw material at a certain time. This kind of procurement is due to the fact that the number of arrivals of requests cannot be known with certainty, so it will lead to a less than optimal inventory level and cause costs that should be measured. In an effort to overcome the fluctuating demand for raw material prices, a method is needed that can link current demand with previous demand. One of them is by using the Markov decision process (Mani et al., 2021).
The Markov decision process is a system that can move one particular state to another possible state by considering several alternative policies (Hasan & Iqbal, 2004). The decisionmaker must take an action from those set of alternatives. This action affects the probability of the transition to the next move and brings a profit or loss after that. The transition probability matrix and the income (cost) matrix depend on the decision alternatives that can be used. The goal is to determine decisions that optimize the expected revenue or costs of that process.
The applications of Markov decision process can be seen in the field of medical science (Bennett & Hauser, 2013) which studies how to determine optimal decisions in health services. Furthermore, in the field of telecommunication (Ksentini et al., 2014;Y. J. Liu et al., 2017), each of which studies the transfer of telecommunications services and election for machine type communications using reinforcement learning based Markov decision process. Research in other fields can also be seen such as in the field of transportation (Iversen et al., 2014;Ong & Kochenderfer, 2017;Shou et al., 2020), social science (Rong et al., 2016), game application (Zheng & Siami Namin, 2018), and Internet of Things (Yousefi et al., 2020).
One method of making decisions in the Markov decision process is the policy iteration method. In this method, the policies taken are evaluated first, and then policy improvements are made so that convergent improvement results are obtained (Feinberg & Shwartz, 2002). An example of an analysis using the policy iteration method in previous research can be seen in (Fürnkranz et al., 2012), (D. Liu & Wei, 2013), (Luo et al., 2014), (Alla et al., 2015), (Pérolat et al., 2016), (Wu & Shen, 2017), and (Yang & Wei, 2018).
One of the cases related to the problem in this research is the inventory of raw materials. The raw material used in this research is pandan leaves. Pandan leaves are one of the raw materials for woven pandanus mats or in Acehnese called "Tika Seukee" which are generally obtained from self-cultivated gardens and can be cultivated quickly and easily in Aceh, especially in North Aceh Regency. If the raw materials are not sufficient, then the craftsmen can obtain raw materials from wholesale results obtained from other regional farmers. Therefore, this study provides an application of the Markov decision process to determine the optimal inventory of raw materials in the pandanus mat weaving business. Research related to inventory with Markov decision process can be seen in (Noorida, 2003), (Sarjono et al., 2011), (Layla, 2016), (Ferreira et al., 2018), and (Oktaviyani et al., 2018). For example, (Noorida, 2003) studies an analysis of fire tube boiler stock inventory using the policy iteration method without considering the discount factor. The thing that distinguishes this research from what has been done is that in this study an inventory analysis was carried out by involving the discount factor. This means that the price of raw materials is expected to increase in the future and this is more realistic in real terms.

B. METHODS
The data used are primary data obtained directly from woven pandanus mat craftsmen in Meunasah Aron Village, Muara Batu District, North Aceh Regency. The data in Table 1 is the amount of raw material inventory (Pandan leaves) for 2 years, starting from January to December 2018 and 2019, as shown in Table 1. In addition to the raw material inventory data, data on raw material price, raw material ordering cost, and raw material storage cost are given as follows: 1. The raw material price for pandan leaves is IDR 50.000/Kg. 2. The average of raw material ordering cost (IDR/Month) in 2 years (2018 and 2019) is presented in Table 2. 3. The average of raw material storage cost (IDR/Kg) in 2 years (2018 and 2019) is presented in Table 3. The steps for solving the problem with the Markov decision process are given as follows: 1. Determine the possible state for the initial inventory ( ) and determine the ordering-level alternative decisions ( ) from the results of the frequency distribution. 2. Determine the transition probability matrix of the obtained state. The probability of each initial inventory state to the initial inventory state at each ordering-level alternative ( ) is ( ), where the initial inventory state will transition to the initial inventory state for = + − and the result of probabilities is arranged in a square matrix with the elements in the matrix being ( ) , where ( ) = ( ) , ( ) represents the probability of raw materials ordering demand. The basis for determining this probability can be studied in (Taylor & Samuel, 1998), (Ching & Ng, 2006), and (Ross, 2010). 3. Find the shortage cost. In each state and each decision , the shortage cost can be calculated using the equation: with: : ordering costs, can be seen in Table 2, which is IDR 430.000, : storage costs, can be seen in Table 3, which is IDR 3.000, : raw materials ordering demand, provided that > + , : initial inventory state, : ordering-level alternative decisions.
4. Determine the total cost matrix. The total cost of inventory for each initial inventory ( ) and each ordering-level alternative ( ) is given by the equation: ( ) = + + (2) with: ( ) : total cost of each initial inventory ( ) at each ordering-level alternative ( ), : ordering costs, can be seen in Table 2, which is IDR 430.000, : storage costs, can be seen in Table 3, which is IDR 3.000, : shortage cost.
Furthermore, for each calculation of the total cost of inventory at each initial inventory ( ) and each ordering-level alternative ( ), the smallest total cost will be chosen. The total cost matrix is arranged in a column matrix with the entries being ( ).

5.
Calculating the optimal solution using the policy iteration method with a discount factor.
The discount factor ( ) is a multiplier to calculate the future value of money when valued in the present time. The presence of a discount factor ( < 1) can result in a change in the optimal policy, compared to the case without a discount ( = 1) (Littman et al., 2013). The steps for the solution are as follows: a. Determine the initial policy ( ) marked with the initial policy ( = 0) and take any ordering-level alternative = ( ) for each initial inventory ( ), and construct a probability matrix and its cost matrix.
b. Evaluate the routine policy is to determine ( ) for each initial inventory ( ) which is the solution to the linear equation, namely in the equation: with: ∑ ( ) ( ) : the probability sum of each initial inventory state ( ) to the initial inventory state ( ) in each ordering-level alternative ( ) for the initial inventory state ( ), ( ) : minimum total cost, : discount factor, discount factor must be < 1, while the discount factor here is 0.98, : initial policy, that is = 0. To solve the system of linear equations, LINDO (Linear Interactive Discrete Optimizer) program is used.
c. Improvements to the routine policy ( +1) is determining a new policy ( + 1) by finding an ordering-level alternative ( ), = ( +1) for each initial inventory ( ) that can be seen in the equation: with: ∑ ( ) ( ) : the probability sum of each initial inventory state ( ) to the initial inventory state ( ) in each ordering-level alternative ( ) for the initial inventory state ( ), ( ) : total cost for each initial inventory ( ) with each ordering-level alternative ( ), : discount factor, discount factor must be < 1, while the discount factor here is 0.98, : new policy + 1 = 1.
d. If the new policy + 1 differs from the initial policy = 0 by at least one state, increase the count of by one and return to step two, as shown in Figure 1. Step 1 •Determination of Initial State in the form of starting inventory amount and possible order-level alternatives.
Step 2 •Determination the transition probability matrix from the analyzed data.
Step 3 •Determination of Shortage Cost and Total Cost.
Step 4 •Calculating the optimal solution using the policy iteration method with a discount factor.
Step 5 •Improvements to the routine policy to obtain most optimal policy.

C. RESULT AND DISCUSSION 1. Frequency Distribution Table
Data on the amount of raw material inventory for two years in Table 1 is presented in a frequency distribution Table 4.  Table 4 describes about the frequency distribution of raw material inventory used, which contains raw materials ordering demand, its frequency, and the probability for each interval class. Frequency is the amount of data included I n each interval class, namely the number of raw materials used and the number of frequencies is the total number of data . Raw materials ordering demand ( ) is taken from the number of upper edges of each interval class.
While the probability is ( ) . The probability of raw materials ordering demand ( ) for the first interval class (16-20 kg) is 5/24 = 0.21. The same way is obtained for the other intervals.

Analysis of the Markov Decision Process
The Markov decision process has elements, namely state ( ) and decision ( ), in this case the state is referred to as initial inventory ( ) and the decision is referred to as an orderinglevel alternative ( ). From Table 4, two assumptions are obtained, where the first assumption is the possible state for initial inventory ( ) as many as 6 states are obtained starting from 0 kg, 5 kg, 10 kg, 15 kg, 20 kg, 25 kg, where the distance between the top edges each interval class is 5 kg. Second, it can be assumed that the ordering-level alternatives ( ) are 20 kg, 25 kg, 30 kg, 35 kg, 40 kg, 45 kg which is the upper edge of each interval class, and the demand for raw material ordering ( ) corresponds to the ordering-level alternatives. The order is 20 kg, 25 kg, 30 kg, 35 kg, 40 kg, or 45 kg. Table 5 is a table of the state of the initial inventory ( ) and the ordering-level alternative ( ), as shown in Table 5.  0  20  25  30  35  40  45  5  20  25  30  35  40  -10  20  25  30  35  --15  20  25  30  ---20  20  25  ----25 20 -----For example, if the initial inventory ( ) is 0 kg, then there are 6 alternative choices for the ordering-level ( ), namely orders of 20 kg, 25 kg, 30 kg, 35 kg, 40 kg, and 45 kg. The total for each line of raw material inventory must be less or equal with 45 kg, because in Table 1 it can be seen that the maximum amount of raw material inventory is 45 kg. The empty entry from Table 5 describes that there is no alternative to order in that ordering-level alternatives.

Transition Probability, Shortage Cost, and Total Cost
The probability of each initial inventory state to the initial inventory state at each ordering-level alternative ( ) is ( ), where the initial inventory state will transition to the initial inventory state for = + − and the result of probabilities is arranged in a square matrix with the elements in the matrix being ( ), where ( ) = ( ), for ( ) represents the probability of raw materials ordering demand. For each initial inventory ( ) and each ordering-level alternative ( ), the shortage cost can be calculated using equation (1). The total cost ( ) for each initial inventory ( ) at each ordering-level alternative ( ) can be calculated using equation (2).
The following steps are given to calculate the transition probability ( ), shortage cost ( ), and total cost (( ( )) using Table 4 and Table 5 for initial inventory ( ) 0 kg. From Table 5, if the initial inventory is 0 kg, then there are 6 alternative choices for the orderinglevel ( ), namely the order of 20 kg, 25 kg, 30 kg, 35 kg, 40 kg, and 45 kg, and the raw materials ordering demand ( ) begins of 20 kg, 25 kg, 30 kg, 35 kg, 40 kg, and 45 kg. Calculation of the raw materials ordering demand ( ) stops when = 0, where = . The probability value can be seen in Table 4 So, the total cost required for initial inventory ( ) 0 kg at the ordering-level alternative ( ) 20 kg is IDR 887.900. In the same way, the results of the calculation of the transition probability, shortage cost, and total cost for each initial inventory with ordering-level alternative ( ) of 20 kg, 25 kg, 30 kg, 35 kg, 40 kg, and 45 kg can be summarized in Table 6,  Table 7, and Table 8.  Table 6 illustrates the transition probability from an initial state stock to an initial state stock based on the policies selected. A value of zero indicates that the transition is not possible. This is because the value of ordering raw materials is a positive variable. It is not possible that the transitional state will have a smaller value than the initial supply state, as shown in Table 7.  Table 7 illustrates the amount of Shortage Cost for each decision taken if the initial condition of the inventory is known. The decisions taken for each initial inventory have been given in Table 5. It can be seen that the Shortage Cost value is getting smaller for larger alternative orders, as shown in Table 8.  Table 8 illustrates the total cost value for each policy taken if the current inventory state is known. The value of this total cost looks bigger if the initial inventory state is higher. This can happen because of the costs needed to handle inventory items that are still stored to be used in the future and also there is a fixed ordering cost that presented in Table 2.

Policy Iteration Method with Discount Factor
The following are some steps to complete the optimal solution using the policy iteration method with discount factor. a. Determine the Initial Policy The meaning of = ( ) is that if the initial inventory is at level kg, with an initial policy of , then the optimal of ordering-level alternative is kg. For this initial policy, the probability matrix is as follows: .
The minimum total cost matrix is as follows: Evaluate the routine policy, which is to determine ( ) for each initial inventory ( ) which is the solution to a linear equation using equation (3). We obtained 6 linear equations as follows: c. Improvements to the Routine Policy Improvements to the routine policy ( ( +1) ) is determining a new policy ( + 1). In the first step, it is stated that the initial policy is = 0, then for the new policy + 1 = 1. Then find an ordering-level alternative ( ) that is = ( ( +1) ) for each initial inventory ( ) using equation (4).
For 5 (0) = IDR 45.323.000, which is = 5 kg, when = 40 kg we get ( 5 ( ) + ∑ 5 ( ) ( ) ) = IDR 45.323.000, Because the new policy ( = 1) is the same as the initial policy ( = 0), the iteration is stopped. The decision results of each initial inventory ( ) with each ordering-level alternative ( ) are the optimal results with optimal costs as follows:  (20) = IDR 935.000. This means: 1) If the initial inventory ( ) is at the level of 0 kg, then the optimal ordering-level alternative ( ) is 45 kg with a minimum cost of IDR 860.000. 2) If the initial inventory ( ) is at the level of 5 kg, then the optimal ordering-level alternative ( ) is 40 kg with a minimum cost of IDR 875.000. 3) If the initial inventory ( ) is at the level of 10 kg, then the optimal ordering-level alternative ( ) is 35 kg with a minimum cost of IDR 890.000. 4) If the initial inventory ( ) is at the level of 15 kg, then the optimal ordering-level alternative ( ) is 30 kg with a minimum cost of IDR 905.000. 5) If the initial inventory ( ) is at the level of 20 kg, then the optimal ordering-level alternative ( ) is 25 kg with a minimum cost of IDR 920.000. 6) If the initial inventory ( ) is at the level of 25 kg, then the optimal ordering-level alternative ( ) is 20 kg with a minimum cost of IDR 935.000.
The calculation results show that for each initial inventory state ( ), the alternative order level that must be selected in order to produce a minimum cost is 45 − kg. However, the resulting minimum cost for a smaller starting inventory is greater than the greater initial inventory one. This can happen because of the storage costs and shortage costs as a result of a shortage of inventory. This result has a behavior similar to what has been done in previous studies by (Noorida, 2003).

D. CONCLUSION AND SUGGESTIONS
This research has reached the optimal solution using the Markov decision process with the policy iteration method. This policy iteration method can achieve an optimal solution in a small number of iterations. In this study, the calculation results obtained from the policy iteration method prove that no iteration occurs, it means that for the data in this research case the optimal solution has been reached using the Markov decision process with the policy iteration method. Suggestions that can be conducted in further research are to consider other operational factors such as the length of work on a product and the honorarium expenses to produce.