Recently I read Nate Silver's book
The Signal and the Noise: Why So Many Predictions Fail but Some Don't. Nate Silver develops models and uses them to make
predictions. He suggests that if the developer of the model thinks
his model is good, he or she should be willing to bet on the
predictions that it makes.
I also read Daniel Kahneman's book
Thinking, Fast and Slow. Kahneman discusses Philip Tetlock's book
Expert Political Judgment: How Good Is It? How Can We Know?
which suggests that the best experts in
making political estimates and forecasts are no more accurate than
fairly simple mathematical models of their estimative processes. This
is yet another confirmation of what Robyn Dawes termed "the
robust beauty of improper linear models." The inability of human
experts to out-perform models based on their expertise has been
demonstrated in over one hundred fields of expertise over fifty years
of research; one of the most robust findings in social science.
In an earlier post, I mentioned the
company PRICE Systems Inc who use linear regression to estimate the
acquisition cost of military equipment. In a paper by Francois
Melesse and David Rose on behalf of the Armed Forces Comptroller
called "The Mother of All Guesses", the authors suggest that linear
regression can be used not only to estimate the cost of a new
military equipment but also to estimate a confidence interval around
the cost estimate.
Here is an example of how this can
be done based on a sample of 13 observations.
Cost $Million | X1 | X2 | X3 | X4 |
52.7 | 55 | 9 | 20 | 13 |
73.5 | 49 | 15 | 27 | 9 |
61.4 | 44 | 11 | 15 | 15 |
32.6 | 43 | 7 | 8 | 6 |
28.9 | 38 | 7 | 11 | 1 |
47.4 | 38 | 8 | 14 | 4 |
40.5 | 37 | 5 | 10 | 14 |
21.4 | 28 | 4 | 6 | 4 |
15.4 | 26 | 2 | 4 | 4 |
37.5 | 24 | 6 | 6 | 6 |
57.1 | 21 | 5 | 6 | 4 |
21.1 | 19 | 3 | 3 | 4 |
20.0 | 10 | 1 | 2 | 4 |
Using the Microsoft Excel linear regression application, they found the following statistics.
Regression Statistics | ||||||||
Multiple R | 0.91694416 | |||||||
R Square | 0.84078659 | |||||||
Adjusted R Square | 0.76117989 | |||||||
Standard Error | 8.89208319 | |||||||
Observations | 13 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 4 | 3340.43608 | 835.109 | 10.56176 | 0.00280361 | |||
Residual | 8 | 632.553147 | 79.06914 | |||||
Total | 12 | 3972.98923 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 21.2250048 | 8.29795744 | 2.557859 | 0.033759 | 2.08988065 | 40.36013 | 2.089881 | 40.36013 |
X Variable 1 | -0.72742589 | 0.40687546 | -1.78783 | 0.111608 | -1.6656824 | 0.210831 | -1.66568 | 0.210831 |
X Variable 2 | 4.22796199 | 1.91636066 | 2.206245 | 0.058422 | -0.1911736 | 8.647098 | -0.19117 | 8.647098 |
X Variable 3 | 0.70018905 | 1.13353292 | 0.617705 | 0.553942 | -1.9137425 | 3.314121 | -1.91374 | 3.314121 |
X Variable 4 | 1.18724 | 0.70677474 | 1.6798 | 0.131508 | -0.4425855 | 2.817065 | -0.44259 | 2.817065 |
So the regression equation is
Cost = 21.225 - 0.727(X1) + 4.228(X2) + 0.700(X3) + 1.187(X4)
Our new equipment has the following
values for X1, X2, X3 and X4.
X1 = 33, X2 = 5, X3 = 8, X4 = 1
So the predicted cost is
Cost = 21.225 - 0.727(33) + 4.228(5) +
0.700(8) +1.187(1)= 25.15
That is, $25.15 million.
The standard error is $8.89
million.
So based on the normal distribution, 90% of the time the true value will be 1.645 standard deviations away from the estimate. Thus, the 90% confidence interval on the estimate is
[25.15 – 1.645(8.89),
25.15 + 1.645(8.89)]
= [$10.52 million, $39.78 million].
I would suggest that this estimate based on a linear regression model of past cost data would be a better prediction of the final cost than the expert opinion of the project manager who is subject to optimism bias. You can bet on it.
No comments:
Post a Comment