Ivan William Taylor, PhD: March 2013

In my last post, I talked about the potential use of linear regression in cost estimation. Linear regression is a simple type of model. I spent much of my career building and using simulation models and mathematical models to predict system behaviour. These models, that were sometimes complicated, were simplifications of the real-world that could be solved using a computer.

I recently read Jim Manzi's new book, Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. In it, he discusses the potential value of randomization to divide subjects into test and control groups during experiments.

Manzi begins the book with a summary of the history of experimentation. He notes that the success of the physical sciences is directly related to the fact that the problems in those fields have low causal density.

Manzi suggests that the reason that social science has made relatively little progress is that predicting human behaviour involves high causal density and is holistically integrated.

Manzi says that randomized field trials, that have proven useful in clinical trials, are being applied successfully by modern businesses. He suggests that they should be applied more widely in social science and public policy. He says that randomized field trials are the only scientific way to determine if the findings from social science research are valid and the proposed public policies will have the desired effect.

In Manzi's opinion, theory and experimentation are a continuous cycle of knowledge development. However, they are quite separate activities. Theories can be developed in any manner one may wish. However, experimentation involves a rigorous method that includes test and control groups and the ability to conduct replications.

By this reasoning, modelling and simulation can be considered an extensive form of theory development.

For the predictions from models and simulations to be verified, one would need to conduct randomized field trials in the real-world.

Recently I read Nate Silver's book The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. Nate Silver develops models and uses them to make predictions. He suggests that if the developer of the model thinks his model is good, he or she should be willing to bet on the predictions that it makes.

I also read Daniel Kahneman's book Thinking, Fast and Slow. Kahneman discusses Philip Tetlock's book Expert Political Judgment: How Good Is It? How Can We Know? which suggests that the best experts in making political estimates and forecasts are no more accurate than fairly simple mathematical models of their estimative processes. This is yet another confirmation of what Robyn Dawes termed "the robust beauty of improper linear models." The inability of human experts to out-perform models based on their expertise has been demonstrated in over one hundred fields of expertise over fifty years of research; one of the most robust findings in social science.

In an earlier post, I mentioned the company PRICE Systems Inc who use linear regression to estimate the acquisition cost of military equipment. In a paper by Francois Melesse and David Rose on behalf of the Armed Forces Comptroller called "The Mother of All Guesses", the authors suggest that linear regression can be used not only to estimate the cost of a new military equipment but also to estimate a confidence interval around the cost estimate.

Here is an example of how this can be done based on a sample of 13 observations.

Cost $Million	X1	X2	X3	X4
52.7	55	9	20	13
73.5	49	15	27	9
61.4	44	11	15	15
32.6	43	7	8	6
28.9	38	7	11	1
47.4	38	8	14	4
40.5	37	5	10	14
21.4	28	4	6	4
15.4	26	2	4	4
37.5	24	6	6	6
57.1	21	5	6	4
21.1	19	3	3	4
20.0	10	1	2	4

Using the Microsoft Excel linear regression application, they found the following statistics.

Regression Statistics
Multiple R	0.91694416
R Square	0.84078659
Adjusted R Square	0.76117989
Standard Error	8.89208319
Observations	13

ANOVA
	df	SS	MS	F	Significance F
Regression	4	3340.43608	835.109	10.56176	0.00280361
Residual	8	632.553147	79.06914
Total	12	3972.98923

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	21.2250048	8.29795744	2.557859	0.033759	2.08988065	40.36013	2.089881	40.36013
X Variable 1	-0.72742589	0.40687546	-1.78783	0.111608	-1.6656824	0.210831	-1.66568	0.210831
X Variable 2	4.22796199	1.91636066	2.206245	0.058422	-0.1911736	8.647098	-0.19117	8.647098
X Variable 3	0.70018905	1.13353292	0.617705	0.553942	-1.9137425	3.314121	-1.91374	3.314121
X Variable 4	1.18724	0.70677474	1.6798	0.131508	-0.4425855	2.817065	-0.44259	2.817065

So the regression equation is

Cost = 21.225 - 0.727(X1) + 4.228(X2) + 0.700(X3) + 1.187(X4)

Our new equipment has the following values for X1, X2, X3 and X4.

X1 = 33, X2 = 5, X3 = 8, X4 = 1

So the predicted cost is

Cost = 21.225 - 0.727(33) + 4.228(5) + 0.700(8) +1.187(1)= 25.15

That is, $25.15 million.

The standard error is $8.89 million.

So based on the normal distribution, 90% of the time the true value will be 1.645 standard deviations away from the estimate. Thus, the 90% confidence interval on the estimate is

[25.15 – 1.645(8.89), 25.15 + 1.645(8.89)]

= [$10.52 million, $39.78 million].

I would suggest that this estimate based on a linear regression model of past cost data would be a better prediction of the final cost than the expert opinion of the project manager who is subject to optimism bias. You can bet on it.

Ivan William Taylor, PhD

Sunday, 31 March 2013

Modelling and Simulation as Thought Experiments

Friday, 29 March 2013

Belief in Modelling and Simulation

About Me