Ivan William Taylor, PhD: Belief in Modelling and Simulation

Recently I read Nate Silver's book The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. Nate Silver develops models and uses them to make predictions. He suggests that if the developer of the model thinks his model is good, he or she should be willing to bet on the predictions that it makes.

I also read Daniel Kahneman's book Thinking, Fast and Slow. Kahneman discusses Philip Tetlock's book Expert Political Judgment: How Good Is It? How Can We Know? which suggests that the best experts in making political estimates and forecasts are no more accurate than fairly simple mathematical models of their estimative processes. This is yet another confirmation of what Robyn Dawes termed "the robust beauty of improper linear models." The inability of human experts to out-perform models based on their expertise has been demonstrated in over one hundred fields of expertise over fifty years of research; one of the most robust findings in social science.

In an earlier post, I mentioned the company PRICE Systems Inc who use linear regression to estimate the acquisition cost of military equipment. In a paper by Francois Melesse and David Rose on behalf of the Armed Forces Comptroller called "The Mother of All Guesses", the authors suggest that linear regression can be used not only to estimate the cost of a new military equipment but also to estimate a confidence interval around the cost estimate.

Here is an example of how this can be done based on a sample of 13 observations.

Cost $Million	X1	X2	X3	X4
52.7	55	9	20	13
73.5	49	15	27	9
61.4	44	11	15	15
32.6	43	7	8	6
28.9	38	7	11	1
47.4	38	8	14	4
40.5	37	5	10	14
21.4	28	4	6	4
15.4	26	2	4	4
37.5	24	6	6	6
57.1	21	5	6	4
21.1	19	3	3	4
20.0	10	1	2	4

Using the Microsoft Excel linear regression application, they found the following statistics.

Regression Statistics
Multiple R	0.91694416
R Square	0.84078659
Adjusted R Square	0.76117989
Standard Error	8.89208319
Observations	13

ANOVA
	df	SS	MS	F	Significance F
Regression	4	3340.43608	835.109	10.56176	0.00280361
Residual	8	632.553147	79.06914
Total	12	3972.98923

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	21.2250048	8.29795744	2.557859	0.033759	2.08988065	40.36013	2.089881	40.36013
X Variable 1	-0.72742589	0.40687546	-1.78783	0.111608	-1.6656824	0.210831	-1.66568	0.210831
X Variable 2	4.22796199	1.91636066	2.206245	0.058422	-0.1911736	8.647098	-0.19117	8.647098
X Variable 3	0.70018905	1.13353292	0.617705	0.553942	-1.9137425	3.314121	-1.91374	3.314121
X Variable 4	1.18724	0.70677474	1.6798	0.131508	-0.4425855	2.817065	-0.44259	2.817065

So the regression equation is

Cost = 21.225 - 0.727(X1) + 4.228(X2) + 0.700(X3) + 1.187(X4)

Our new equipment has the following values for X1, X2, X3 and X4.

X1 = 33, X2 = 5, X3 = 8, X4 = 1

So the predicted cost is

Cost = 21.225 - 0.727(33) + 4.228(5) + 0.700(8) +1.187(1)= 25.15

That is, $25.15 million.

The standard error is $8.89 million.

So based on the normal distribution, 90% of the time the true value will be 1.645 standard deviations away from the estimate. Thus, the 90% confidence interval on the estimate is

[25.15 – 1.645(8.89), 25.15 + 1.645(8.89)]

= [$10.52 million, $39.78 million].

I would suggest that this estimate based on a linear regression model of past cost data would be a better prediction of the final cost than the expert opinion of the project manager who is subject to optimism bias. You can bet on it.

Ivan William Taylor, PhD

Friday, 29 March 2013

Belief in Modelling and Simulation

No comments:

Post a Comment

About Me