y-intercept is not zero and 99.9% sure that the true slope is not zero.10

T A B L E 2-3

Abbreviated Table of T-Statistics

A B C D

4 Selected t Statistics

5 d.f.\Pr. 0.050 0.025 0.010

6 3 2.353 3.182 4.541

7 8 1.860 2.306 2.896

8 12 1.782 2.179 2.681

9 120 1.658 1.980 2.358

10 In¬nity 1.645 1.960 2.326

10. For spreadsheets that do not provide P-values, another way of calculating the statistical

signi¬cance is to look up the t-statistics in a Student™s t-distribution table and ¬nd the level

of statistical signi¬cance that corresponds to the t-statistic obtained in the regression.

PART 1 Forecasting Cash Flows

36

The F test is another method of testing the null hypothesis. In mul-

tivariable regressions, the F-statistic measures whether the independent

variables as a group explain a statistically signi¬cant portion of the var-

iation in Y.

We interpret the con¬dence intervals as follows: there is a 95% prob-

ability that true ¬xed costs (the y-intercept) fall between $22,496 (F33) and

$91,045 (G33); similarly, there is a 95% probability that the true variable

cost (the slope coef¬cient) falls between $0.77 (F34) and $0.84 (G34).

The denominator of equation (2-6) is called the standard error of b,

or sb. The standard error of the Y-estimate, which is de¬ned as

n

1 ˆ

Yi)2

s (Yi

n 2 i1

is $16,014 (B23). The larger the amount of scatter of the points around

the regression line, the greater the standard error.11

Precise Con¬dence Intervals12

Earlier in the chapter, we estimated 95% con¬dence intervals by subtract-

ing and adding two standard errors of the y-estimate around the regres-

sion estimate. In this section, we demonstrate how to calculate precise

95% con¬dence intervals around the regression estimate using the equa-

tions:

x2

1 o

t0.25s (2-8)

x2

n i

95% confidence interval for the mean forecast

x2

1 o

t0.025s 1 (2-9)

x2

n i

95% confidence interval for a specific year™s forecast

In the context of forecasting adjusted costs as a function of sales,

equation (2-8) is the formula for the 95% con¬dence interval for the mean

adjusted cost, while equation (2-9) is the 95% con¬dence interval for the

costs in a particular year. We will explain what that means at the end of

this section, after we present some material that illustrates this in Table

2-1B, page 2.

Note that these con¬dence intervals are different than those in equa-

tion (2-7), which was around the forecast slope only, i.e., b. In this section,

11. This standard error of the Y-estimate applies to the mean of our estimate of costs, i.e., the

average error if we estimate adjusted costs and expenses many times. This is appropriate in

valuation, as a valuation is a forecast of net income and / or cash ¬‚ows for an in¬nite

number of years. The standard error”and hence 95% con¬dence interval”for a single

year™s costs is higher.

12. This section is optional, as the material is somewhat advanced, and it is not necessary to

understand this in order to be able to use regression analysis in business valuation.

Nevertheless, it will enhance your understanding should you choose to read it.

CHAPTER 2 Using Regression Analysis 37

we are calculating con¬dence intervals around the entire regression fore-

cast.

The ¬rst 15 rows of Table 2-1B, page 2, are identical to the ¬rst page

and require no explanation. The $989,032 in B16 is the average of the 10

years of sales in B6“B15.

Column D is the deviation of each observation from the mean, which

is the sales in Column B minus the mean sales in B16. For example, D6

( $739,032) is equal to B6 ($250,000) minus B16 ($989,032). D7

( $489,032) equals B7 ($500,000) minus B16 ($989,032). The total of all

deviations from the mean must always equal zero, which it does (D16).

Column E is the squared deviations, i.e., the square of Column D. In

statistics, the independent variable(s) is known as X, while the deviations

from the mean are known as x, which explains the column labels in B5

and D5. The sum of squared deviations,

1997

x i2

i 1988

1012 (E16).

equals 1.28

The next step is to compute the squared deviations for our sample

forecast year. We assume that forecast sales for 1997 is $1.6 million (B17).

We repeat the coef¬cients from the regression formula from the ¬rst page

of the table in B36 and B37. Applying the regression equation, we would

then forecast expenses at $1,343,928 (C17).

In order to compute a 95% con¬dence interval around the expense

forecast of $1,343,928, we apply equations (2-8) and (2-9). 1998 forecast

sales are $610,968 (D17 B17 B16) above the mean of the historical

period. That is the x0 in (3-8) and (3-9). We square the term to get 3.73

1011 (E17). Then we divide that by the sum of the squared deviations in

1012 (E16) to get 0.2905650 (F17), which we

the historical period 1.28

repeat below in Row 28.

In Row 25, we insert the t-statistic of 2.306, which one can ¬nd in a

table for a 95% con¬dence level (the 0.025 column in a two-tailed distri-

bution) and eight degrees of freedom (n 10 observations 1 indepen-

dent variable 1). In Row 26 we show the standard error of the y-

estimate of $16,014, which came from Table 2-1B, B23. Row 27 is 1/n

1/10 0.1, where n is the number of observations.

Row 28 is a repetition of F17, the ratio of the squared deviation of

the forecast to the sum of the squared deviations of the independent

variables from their mean.

In B29 we add zero, and in C29 we add 1, according to equations

(2-8) and (2-9), respectively. We will explain the difference in the two

formulas shortly.

In Row 30 we add Rows 27 to 29, which are the terms in the square

root sign in the equations. Obviously, C30 B30 1. In Row 31 we take

the square root of Row 30.

Finally, we are able to calculate our 95% con¬dence intervals as Row

25 Row 26 Row 31. The 95% con¬dence interval for the mean is

2.306 $16,014 0.6249520 $23,078 (B32), approximately 1.44 times

the size of the standard error of the y-estimate. The 95% con¬dence in-

PART 1 Forecasting Cash Flows

38

terval for the speci¬c year™s cost forecast is $43,547 (C32), approximately

2.72 times the size of the standard error of the y-estimate. The 95% con-

¬dence intervals are 1.7% (B33) and 3.2% (C33) of the forecast costs for

the mean and the speci¬c year™s forecast, respectively.

You can see that both the calculation of 95% con¬dence interval for

the mean and the speci¬c year™s forecast cost is roughly two times the

standard error of the y-estimate. Statisticians often loosely approximate

the 95% con¬dence intervals as two standard errors below and above the

regression estimate. Equations (2-8) and (2-9) are more precise.

Now we will discuss the difference between equations (2-8) and

(2-9). We forecast sales to be $1.6 million in 1998, which means that our

forecast of adjusted costs for that year according to the regression equa-

tion is $1,343,928. Of course, the actual expenses will not equal that num-

ber, even if actual sales by some miracle will equal forecast sales. The

95% con¬dence interval for the mean tells us that if we add and subtract

$23,078 to our forecast of $1,343,928, then we are 95% sure that the true

regression line at sales of $1.6 million should have been between

$1,320,850 and $1,367,006. If we would experience sales of $1.6 million

many times”say 1,000 times”we would be 95% sure that the average

cost would fall in our con¬dence interval.13 Equation (2-8) is the equation

describing this con¬dence interval.

That does not mean that we are 95% sure that costs would be be-

tween $1,320,850 and $1,367,006 in any particular year when sales is $1.6

million. We need a wider con¬dence interval to be 95% sure of costs in

a particular year, given a particular level of sales. Equation (2-9) describes

the con¬dence interval for a particular year.

Thus, the $23,078 con¬dence interval”meaning that we add and

subtract that number from forecast costs”appropriately quanti¬es our

long-run expectation of the con¬dence interval around forecast costs,

given the level of sales. In business valuation we are not very concerned

that every individual year conform to our forecasts. Rather, we are con-

cerned with the long-run accuracy of the regression equation. Thus, equa-

tion (2-8) is the relevant equation for 95% con¬dence intervals for valu-

ation analysts. Remember that the con¬dence interval expands the further

we move away from the mean of the historical period. Therefore, if we

forecast the costs to go with a forecast sales of, say, $5 million in the year

2005, the con¬dence interval around the cost estimate is wider than the

1.7% (B33) around 1998 forecast.

Selecting the Data Set and Regression Equation

Table 2-4 is otherwise identical to Table 2-1B, except that instead of all 10

years of data, it only contains the last 5 years. The regression equation

for the 5 years of data is (Table 2-4, B27 and B28)

Adjusted costs $71,252 ($0.79 Sales)

Examining the regression statistics, we ¬nd that the adjusted R 2 is

13. This ignores the need to recompute the regression equation with new data.

CHAPTER 2 Using Regression Analysis 39

T A B L E 2-4

Regression Analysis 1993“1997

A B C D E F G

4 Year Sales Adjusted Costs

5 1993 $1,123,600 $965,043

6 1994 $1,191,016 $1,012,745

7 1995 $1,262,477 $1,072,633

8 1996 $1,338,226 $1,122,714

9 1997 $1,415,000 $1,199,000

11 SUMMARY OUTPUT

13 Regression Statistics

14 Multiple R 99.79%

15 R square 99.58%

16 Adjusted R square 99.44%

17 Standard error 6,840

18 Observations 5

20 ANOVA

21 df SS MS F Signi¬cance F

22 Regression 1 3.35E 10 3.35E 10 716 1.15E 04

23 Residual 3 1.40E 08 4.68E 07

24 Total 4 3.36E 10

26 Coef¬cients Standard Error t Stat P-value Lower 95% Upper 95%

27 Intercept [1] 71,252 37,624 1.89 0.15 (48,485) 190,989

28 Sales [2] 0.79 0.03 26.75 0.00 0.70 0.89

Regression Plot

[1] This is the regression estimate of ¬xed costs $1,250,000

[2] This is the regression estimate of variable costs

$1,200,000 y = 0.7924x + 71252

R2 = 0.9958

$1,150,000

$1,100,000

$1,050,000

Adj. Costs

$1,000,000

$950,000

$900,000

$850,000

$800,000

$1,100,000 $1,200,000 $1,300,000 $1,400,000 $1,500,000

Sales

PART 1 Forecasting Cash Flows

40

99.44% (B16), still indicating an excellent relationship. We do see a dif-

ference in the t-statistics for the two regressions.

The t-statistic for the intercept is now 1.89 (D27), indicating it is no

longer signi¬cant at the 95% level, whereas it was 3.82 in Table 2-1B.

Another effect of fewer data is that the 95% con¬dence interval for the

intercept value is $48,485 (F27) to $190,989 (G27), a range of $239,475.

In addition, the t-statistic for the slope coef¬cient, while still signi¬cant,

has fallen from 56.94 (Table 2-1B, D34) to 26.75 (D28). The 95% con¬dence

interval for the slope now becomes $0.70 (F28) to $0.89 (G28), a range

that is 31„2 times greater than that in Table 2-1B and indicates much more

uncertainty in the variable cost than we obtain using 10 years of data.

The standard error of the Y-estimate, however, decreases from

$16,014 (Table 2-1B, B23) to $6,840. This indicates that decreasing the

number of data points improves the Y-estimate, an opposite result from

all of the preceding. Why?

Earlier, we pointed out that using only a small range for the inde-

pendent variable leads to a small denominator in the variance of b, i.e.,

2

n

x2

i

i1

which leads to larger con¬dence intervals. However, larger data sets (us-

ing more years of data) tend to lead to a larger standard error of the y-

estimate, s. As we mentioned earlier,

n

1 ˆ

Yi)2

s (Yi

n 2 i1

ˆ

where Yi are the forecast (regression ¬tted) costs, Yi are the historical

costs, and n is the number of observations.14 Thus, we often have a trade-

off in deciding how many years of data to include in the regression. More

years of data leads to better con¬dence intervals, but fewer years may

lead to smaller standard errors of the y-estimate.

Table 2-4 was constructed to demonstrate that you should evaluate

all of the regression statistics carefully to determine if the relationship is

suf¬ciently strong to merit using it and which data set is best to use.

Simply looking at the adjusted R 2 value is insuf¬cient; all the regression

statistics should be evaluated in their entirety, as an improvement in one

may be counterbalanced by a deterioration in another. Therefore, it is best

to test different data sets and compare all of the regression statistics to

select the regression equation that represents the best overall relationship

between the variables.

14. We divide by n 2 instead of n because it takes two points to determine a line. If we only had

two years of historical data, we could determine a regression line, but we would know

absolutely nothing about the variance around the line. It takes a minimum of three years of

data to be able to say anything at all about how well the regression line ¬ts the data, and

three years is usually insuf¬cient. It is much better to have at least ¬ve years of data,

though four years can often suf¬ce.

CHAPTER 2 Using Regression Analysis 41

PROBLEMS WITH USING REGRESSION ANALYSIS FOR

FORECASTING COSTS

Although regression analysis is a powerful tool, its blind application can

lead to serious errors. Various problems can be encountered, and one

should be cognizant of the limitations of this technique. Aside from the

obvious problems of poor ¬t and insuf¬cient data, structural changes in

the company can also invalidate the historical relationship of sales and

costs.

Insuf¬cient Data

Insuf¬cient data leads to increased error in the regression, which in turn

will lead to increased error in the forecast data. As mentioned previously,

to optimize the regression equation it is best to examine overlapping data

sets to determine which gives the best results.

Substantial Changes in Competition or Product/Service

Although regression analysis is applicable in most situations, substantial

structural changes in a business may render it inappropriate. As men-

tioned previously, the appraiser can often compensate for changes in the

competitive environment by making pro forma adjustments to historical

sales, keeping costs the same. However, when a company changes its

business, the past is less likely to be a good indicator of what may occur

in the future, depending on the signi¬cance of the change.

USING REGRESSION ANALYSIS TO FORECAST SALES

Table 2-5 is an example of using regression techniques to forecast sales.

In order to do this, it must be reasonable to assume that past performance

is a reasonable indicator of future expectations. If there are fundamental

changes in the industry that render the past a poor indicator of the future,

then regression may useless and even quite misleading. As cautioned by

Pratt, Reilly, and, Schweihs (1996), blind application of regression, where

past performance is the sole indicator of future sales, can be misleading

and incorrect. Instead, careful analysis is required to determine whether

past income generating forces will be duplicated in the future. Neverthe-

less, regression analysis is often useful as a benchmark in forecasting.

In our example in Table 2-5, the primary independent variable is

gross domestic product (GDP), which we show for the years 1988“1998

in billions of dollars in cells B5:B15 (the cell references separated by a

colon will be our way to indicate contiguous spreadsheet ranges). In C5:

C15, we show the square of GDP in billions of dollars, which is our

second potential independent variable.15 Our dependent variable is sales,

which appears in D5:D15.

15. Another variation of this procedure is to substitute the square root of GDP for its square.

PART 1 Forecasting Cash Flows