Regression Analysis in Excel

# Introduction

Regression analysis is a technique used by researchers to examine relationships between predictor and response variables. Critical to the discussion is the reality that the technique is applicable to different scenarios, which depend on the scope of the research being conducted. As evidenced, Wilson, Keating, and Beal, (2016) argues that simple linear regression, multivariate regression, and logistic regression techniques could be applied when trying to establish existing relationships between variables. Apparently, regression analysis is important to researchers from different fields; implying supply chain and logistic managers could apply the technique in their profession. Specifically, the technique could be applied to identify relationship between supply chain management, logistics, and organizational performance. It is because publications from Bavarsad, Azizi, and Alesadi (2013)have publications on the mentioned topic. This paper examines the application of regression analysis in supply chain and logistics management using a practical example and a publication.

# Applications to Logistics

It is notable that regression analysis could be used to examine whether two forms of complexity in their supply chains influence inventory levels in an organization. To be specific, the technique could be applied to determine whether the number of stocking locations and the size of stock-keeping units can influence the level of inventories within an organization. Critical to the debate is the fact that the independent variables for such a research problem will be the number of stocking locations and the number of stock-keeping units while the independent variable will be the number or levels of inventories in the organization. It is notable that a hypothesis should be formulated form the problem statement in order to identify the relationship between the variables in question. Consequently, the hypothesis for the problem statement in question would be the number of stocking locations and the size of stock-keeping units have no influence on the level of inventories within an organization.

It is important to highlight that employing regression analysis on the problem statement mentioned above may not be as easy as it sounds. This owes to the reality that the indicators used to measure the level of inventory and the number of stock-keeping units must be identified prior to collecting data. It is because measuring stock-keeping units could be attached to weights depending on the size and price of products and services in a given firm. It is notable that the problem statement under discussion could aid in business decision-making because it would help explain the variation in the level of inventory. For instance, supposing that a strong positive relationship between the predictor and response variables is established, then the management could control the level of inventory by controlling the number of stocking-keeping units and stocking locations. Specifically, the management could increase or lower the level of inventory by increasing or lowering number of stocking locations and stock-keeping units.

It is vital to highlight that the management often requires varying the level of their inventory depending on the market conditions. It follows that the problem statement would aid the management make appropriate variations on the level of inventory. In simple terms, management could increase the number of stocking-keeping units and stocking locations when the demand for their product is high, which would increase the level of inventory. Similarly, the management could reduce the number sock-keeping units and stocking locations with an aim of reducing the level of inventory. Alternatively, a model developed after regression analysis could be used to predict the level of inventory based on a set number of stocking locations and stock keeping units. Further explanation of how to predict the level of inventory based on the number of stocking units is present in the hypothesis test section below.

# Limitations of Regression Analysis

Regression analysis is always accompanied by four main limitations, which could make application of the technique challenging depending on the nature of data and scope of research. For example, Mayes, and Shank (2015) argue that regression analysis should not be used unless all the necessary assumptions are certified. However, the problem could be solved by transforming the variables under study with an aim of achieving linearity and a constant variance. It is critical that the Box Jenkins procedure could be used transform such variables because the procedure highlights how to transform the variables depending on the data. It is also notable that regression analysis could be hampered by highly correlated independent variables (parameter estimation). Nonetheless, the problem could be eliminated by removing single variable from the model. This indicates that the assumption violation and parameter estimation limitations could cause problems, but they can be solved.

Further, regression analysis is also limited by multicollinearity. Simply put, predictor variables that are highly correlated my distort results from the t-test and the *F* test. This owes to the truth that highly correlated variables may result in parameter estimates that oppose what is expected. Regardless, the problem could be solved by running a correlation analysis test on all the independent variables. After running the analysis, one of each of variables from pairs of correlated variables should be removed from the model to eliminate multicollinearity. As mentioned before, the nature of data could also result in problems when dealing with regression analysis. Specifically, practical data often contains missing values, but the problem could be solved through imputation techniques such as nearest neighbor imputation or ordinary least squares imputation. Ultimately, the data could be categorical, which should be solved by coding the categorical variables using dummy variables.

# Hypothesis Test Example

As mentioned earlier, an example of a hypothesis test was conducted to demonstrate how regression analysis could be used. Critical to the discussion is the fact that a fictitious data set was used owing to the time span required to collect data. A sample of the data is depicted in figure 1 of the appendix below. Before the test was conducted, the researcher hypothesized that stocking locations and the size of stock-keeping units have no influence on the level of inventories within an organization. The test diagram depicted in figure 1 below could demonstrate the same hypothesis. It is notable that the hypothesis test resulted in the following model that was used for regression analysis.

Level of Inventory= B_{0}+B_{1}stock-keeping units+ B_{2}stocking locations

Figure 1: test diagram

In order to apply the test, regression analysis was run in Excel by installing the data analysis tool pack that is present in Microsoft Excel. This was done by clicking file, options, and then adds ins, selecting the analysis tool pack, and clicking ok. The next procedure involved clicking on data, data analysis, and selecting the regression analysis option. This opens up a dialogue box that requests for the Y variable, which was inserted by highlighting the all observations that represent the level of inventory. The same dialogue box also requests for the x variables, which were inserted by selecting all the observations that represent stocking locations and stock-keeping units. All the check boxes were selected before clicking ok, which resulted in a regression output.The results from the analysis are depicted in figure 2 below.

Figure 2: Regression Output.

From the results in figure 2 above, the critical value for the F statistic is 3.26*10^{-221}, which is less than 0.05. According to Carlberg (2016), the f statistic is used to check for the overall significance of the model. Considering that the level of significance for the F statistic is less than 0.05, the model could be used for making inferences from the analysis. Similarly, the adjusted R squared and the R squared are used to examine the variation on the dependent variable that results from the variation in the predictor variables. From figure 2 above, the adjusted R squared 0.994 implying that 99.4 percent of the variation in the level of inventories is caused by the variation in the number of stock-keeping units and stocking locations. The same results indicate that the intercept, stock-keeping units, and stocking locations are significant predictors of thee level of inventory because they all have p values that are less than 0.05.

It is important to note that the model assumptions for regression analysis were also checked. For instance, the assumption of normality of error terms was checked using a normal probability plot depicted in figure 3 below. From the figure, it is evident that the error terms are normally distributed because the curve produced by the normal probability plot that has points below and above the trend line in the beginning and the end of the plot respectively. The independence of observation assumption was checked by examining the correlation coefficient between the stock-keeping units and stocking locations. The correlation between the two variables was 0.56, which indicates the presence of a moderate positive relationship between the variables in question. It follows that the independence of observations was ascertained.

Figure 3: Normality of the error terms

In addition, regression analysis requires that the relationship between the independent and dependent variables be linear. Consequently, a line graph with fitted trend lines was used to confirm that the level of inventory has a linear relationship with stock-keeping units and stocking locations as shown in figure four below. Critical to the discussion is the fact that there is a linear relationship between the level of inventory, stock-keeping units, and stocking locations. This owes to the fact that the trend line produced by both independent variables is straight. The assumption of homogeneity of variances was checked by plotting level of inventories against its residuals. The results are depicted in figure five below indicate homogeneity of variances because the residuals are equally distributed below and above the trend line.

Figure 4: Linearity Figure 5: Homogeneity of variances

Ultimately, the insignificant outlier assumption was checked using figure 4 above. The results indicate the presence of few outliers in the stocking location variable. However, the number of outlier in the upper side of the trend line is almost equal to the number of outliers on the lower side of the trend line. It follows that the effect of the outliers cancel out, implying they did not have a significant effect on the model. Model fitting was also checked by summing the errors obtained from subtracting the predicted level of inventories from the observed level of inventories and obtaining the standard deviation of the result. This resulted in a standard deviation of 0.164, which implies that the model can predict actual outcomes. Clearly, all the necessary assumptions for regression analysis were conducted to confirm that the test met the required conditions. It follows that the data rejects the hypothesis implying that stocking locations and the number of stocking units are significant predictors of the level of inventory.

From the analysis, the level of inventories could be predicted by the following equation:

Level of inventory = 50.23-28.04*stock keeping units + 0.31* number of stocking locations. This implies that the management could use the model to predict the level of inventory that could be achieved based on the stock keeping units and number of stocking locations available to the organization. For instance, supposing the organization has 2 stock-keeping units and 68 stocking locations, the organization could predict the level of inventory by fitting the stock keeping units and the number of stocking locations in the equation above.

Level of inventory = 50.23 -28.04*2 + 0.31* 68

Level of inventory = 127.46

Therefore, the management would produce 127.46levels of inventories from 2 stock-keeping units and 68 stocking locations. This illustrates that regression analysis could be used for making useful predictions and influence business decision-making.

# Example from Literature

Apparently, regression analysis is important to supply chain and logistics managers owing to the publication made by Bavarsad, Azizi, and Alesadi (2013). Specifically, the authors examined the relationship between supply chain management, logistics, and organizational performance. Critical to the discussion is the fact that regression analysis was used in the study. Consequently, the researchers conducted a thorough literature survey on the topic, which aided in identifying variables. It is important to highlight that the literature survey did not only aid the authors to identify variables, but also helped them to establish how variables used in the study could be measured. Additionally, the literature survey helped in formulating hypotheses for the study because the hypotheses were set based on evidence from previous studies. Consequently, Bavarsad, Azizi, and Alesadi (2013, pp. 1312) formulated six hypotheses that guided them in conducting the study.

In particular, Bavarsad, Azizi, and Alesadi (2013, pp. 1312)made the following hypotheses. The relationship between logistics performance and supply chain management strategy is significant; the relationship between marketing performance and supply chain management strategy is significant; and the relationship between financial performance and supply chain management strategy is significant. Bavarsad, Azizi, and Alesadi (2013, pp. 1312) also hypothesized that the relationship between logistics performance and marketing performance logistics performance is significant; the relationship between financial performance and logistics performance is significant; and the relationship between financial performance and marketing performance is significant. Apparently, the authors focused on the role played by logistics performance and supply management on marketing performance and financial performance. It follows that the study intended to describe the nature of the relationship between organizational performance, supply chain management, and logistics management.

Based on the hypotheses mentioned above, Bavarsad, Azizi, and Alesadi (2013, pp. 1314) concluded that there is a weak positive relationship between supply chain management strategy, marketing performance (r=0.445), and organizational performance (r=0.489). However, the authors established a moderate positive relationship between supply chain management strategy and logistics performance (r=0.56). The researchers also established a strong positive and significant relationship between financial performance and marketing performance (r=0.826). This implies that financial performance of an organization increases with the increase in logistics performance, marketing performance, and supply chain management strategy. It follows that supply chain and logistics management is important to businesses because they aid in determining the profitability of an organization. Therefore, organization should acknowledge the need of investing in their supply chain and logistics departments because of its effect on financial performance.

# Conclusion

In conclusion, this paper discusses the application of regression analysis, the limitations of the technique, provides an example of a hypothesis, and reviews a publication that applies the technique. For instance, it is evident that regression analysis could be used to identify relationships between variables that influence business decision-making as well as predict outcomes that could influence business decisions. Additionally, it is clear that regression analysis has limitations such as multicollinearity, nature of the data, violation of regression assumptions, and correlated variables. However, solutions to the problems are discussed implying there is a way to work around the limitations. Critical to the debate is the fact that a worked hypothesis on regression analysis is discussed using a fictitious data set. Moreover, the paper reviews a publication from Bavarsad, Azizi, and Alesadi (2013) by discussing the problem statement of the publication, its hypothesis and results.

References

Bavarsad, B., Azizi, D. H. A., and Alesadi, F. J., 2013.Study of Relationship between Supply

Chain Management Strategy with Logistics Performance and Organizational Performance. *Interdisciplinary Journal of Contemporary Research in Business*, 4(9): 1308-1317.

Carlberg, C., 2016. Regression analysis Microsoft Excel. Place of publication not identified: Que.

Mayes, T. R., and Shank, T. M.,2015. *Financial analysis with Microsoft Excel*. Boston, MA, USA: Cengage Learning

Wilson, J. H., Keating, B., and Beal, M. 2016. *Regression analysis: Understanding and building*

*business and economic models using Excel*. New York, NY: Business Expert Press.

Appendix

Figure 1