Friday 29 March 2013

Belief in Modelling and Simulation

Recently I read Nate Silver's book The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. Nate Silver develops models and uses them to make predictions. He suggests that if the developer of the model thinks his model is good, he or she should be willing to bet on the predictions that it makes.

I also read Daniel Kahneman's book Thinking, Fast and Slow. Kahneman discusses Philip Tetlock's book Expert Political Judgment: How Good Is It? How Can We Know? which suggests that the best experts in making political estimates and forecasts are no more accurate than fairly simple mathematical models of their estimative processes. This is yet another confirmation of what Robyn Dawes termed "the robust beauty of improper linear models." The inability of human experts to out-perform models based on their expertise has been demonstrated in over one hundred fields of expertise over fifty years of research; one of the most robust findings in social science.

In an earlier post, I mentioned the company PRICE Systems Inc who use linear regression to estimate the acquisition cost of military equipment. In a paper by Francois Melesse and David Rose on behalf of the Armed Forces Comptroller called "The Mother of All Guesses", the authors suggest that linear regression can be used not only to estimate the cost of a new military equipment but also to estimate a confidence interval around the cost estimate.

Here is an example of how this can be done based on a sample of 13 observations.

  Cost $Million       X1      X2        X3      X4
      52.7     55      9      20     13
      73.5     49     15      27      9
      61.4     44     11      15     15
      32.6     43      7       8      6
      28.9     38      7      11      1
      47.4     38      8      14      4
      40.5     37      5      10     14
      21.4     28      4       6      4
      15.4     26      2       4      4
      37.5     24      6       6      6
      57.1     21      5       6      4
      21.1     19      3       3      4
      20.0     10      1       2      4

Using the Microsoft Excel linear regression application, they found the following statistics.
 
Regression Statistics
Multiple R 0.91694416
R Square 0.84078659
Adjusted R Square 0.76117989
Standard Error 8.89208319
Observations 13
ANOVA
  df SS MS F Significance F
Regression 4 3340.43608 835.109 10.56176 0.00280361
Residual 8 632.553147 79.06914
Total 12 3972.98923      
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 21.2250048 8.29795744 2.557859 0.033759 2.08988065 40.36013 2.089881 40.36013
X Variable 1 -0.72742589 0.40687546 -1.78783 0.111608 -1.6656824 0.210831 -1.66568 0.210831
X Variable 2 4.22796199 1.91636066 2.206245 0.058422 -0.1911736 8.647098 -0.19117 8.647098
X Variable 3 0.70018905 1.13353292 0.617705 0.553942 -1.9137425 3.314121 -1.91374 3.314121
X Variable 4 1.18724 0.70677474 1.6798 0.131508 -0.4425855 2.817065 -0.44259 2.817065

So the regression equation is

Cost = 21.225 - 0.727(X1) + 4.228(X2) + 0.700(X3) + 1.187(X4) 

Our new equipment has the following values for X1, X2, X3 and X4.

X1 = 33, X2 = 5, X3 = 8, X4 = 1

So the predicted cost is

Cost = 21.225 - 0.727(33) + 4.228(5) + 0.700(8) +1.187(1)= 25.15

That is, $25.15 million.

The standard error is $8.89 million.

So based on the normal distribution, 90% of the time the true value will be 1.645 standard deviations away from the estimate.  Thus, the 90% confidence interval on the estimate is

[25.15 – 1.645(8.89), 25.15 + 1.645(8.89)] 

= [$10.52 million, $39.78 million].

I would suggest that this estimate based on a linear regression model of past cost data would be a better prediction of the final cost than the expert opinion of the project manager who is subject to optimism bias.  You can bet on it.

No comments:

Post a Comment