Combining quantitative and qualitative data in business cycle research

It may be too much of a generalization to state that a schism exists between the protagonists and users of quantitative and qualitative data respectively. However, the main thrust of business cycle research in the United States has been directed towards the construction of quantitative indicators, whereas the accent in Europe has been on the analysis of qualitative survey data. Efforts directed towards a synthesis were mainly due to econometricians trying to improve the forecasting accuracy of models based on quantitative data by introducing qualitative variables into their equations. Our own efforts have shown that quantitative variables tend to suppress qualitative variables when used in combination and that qualitative variables tend to suppress each other due to high levels of multicollinearity (Smit, 1983). On balance, qualitative data have the advantages of early availability, of capturing expectations data, and of providing information on a broader spectrum of variables than traditionally published in quantitative form, whereas quantitative data excel providing 'richer' or 'fuller' information due to measurement on a higher measurement scale. The problem is how to optimally integrate and use both types of data in economic analyses and forecasting. It is believed that the Kalman filter technique opens new horizons in this regard. In 1975 Aiginger used the Kalman filter to illustrate how quantitative investment anticipations could be used to correct constants in the investment equation (Aiginger, 1975). This paper reports on research aimed at incorporating qualitative data into a quantitative model via the Kalman filter. The problem is the establishment of a leading indicator for the South Arican business cycle. Two approaches are described: a quantitative technique used by Barr (1983) and a qualitative technique after Smit & Kotze (1984). The Kalman filter is then used to integrate these approaches.


Introduction
It may be too much of a generalization to state that a schism exists between the protagonists and users of quantitative and qualitative data respectively.However, the main thrust of business cycle research in the United States has been directed towards the construction of quantitative indicators, whereas the accent in Europe has been on the analysis of qualitative survey data.Efforts directed towards a synthesis were mainly due to econometricians trying to improve the forecasting accuracy of models based on quantitative data by introducing qualitative variables into their equations.Our own efforts have shown that quantitative variables tend to suppress qualitative variables when used in combination and that qualitative variables tend to suppress each other due to high levels of multicollinearity (Smit, 1983).On balance, qualitative data have the advantages of early availability, of capturing expectations data, and of providing information on a broader spectrum of variables than traditionally published in quantitative form, whereas quantitative data excel providing 'richer' or 'fuller' information due to measurement on a higher measurement scale.The problem is how to optimally integrate and use both types of data in economic analyses and forecasting.It is believed that the Kalman filter technique opens new horizons in this regard.
In 1975 Aiginger used the Kalman filter to illustrate how quantitative investment anticipations could be used to correct constants in the investment equation (Aiginger, 1975).This paper reports on research aimed at incorporating qualitative data into a quantitative model via the Kalman filter.The problem is the establishment of a leading indicator for the South Arican business cycle.Two approaches are described: a quantitative technique used by Barr (1983) and a qualitative technique after Smit & Kotze (1984).The Kalman filter is then used to integrate these approaches.

Measuring the business cycle: van Collar's coinciding Indicator
Since 1977 the Bureau for Economic Research at the University of Stellenbosch has been publishing composite leading and coinciding indicators of the South African business cycle on a monthly basis.These indices were developed by van Coller (1980) using the methods of the NBER.The five series constituting the coinciding indicator are as follows: registered unemployment, index of physical volume of manufacturing production, wholesale sales, retail sales, and imports.
This composite coinciding indicator has proved to be of great value in the South African context and m~y firms use this variable as an input in their own forecastmg models, justifying the effort exerted in forecasting the coinciding indicator.The leading indicator developed by van Coller, speaking on the average, has su~cient lead time _to be a_ truly ex-ante indicator, but it has the disadvantage of mcons1stent lead times at peaks and troughs ranging between 3 and 21 months.This has led to efforts to establish an alternative leading indicator showing more consistent timing at peaks and troughs.
An alternative based on quantitative data: Barr's proposal Barr (1983) applied a stepwise regression method for selecting, in an optimal way, from 23 variables considered, those variables which represent the best set of predictors for van Coller's coinciding indicator.This method, he argued, obviates the need to make ex-ante predictions about the relevance of the variables as leading indicators or to analyse consistency of turning-point relations because the variable with the highest correlation will necessarily provide the best average correlation with the business cycle and thus be the most consistent measure of turning points.Furthermore, this method provides weights which are automatically selected in an optimal way and hence obviates the selection of subjective weightings when averaging.The model is fitted over the period 1967 Ml -1981 MlO and the following explanatory variables contribute towards a R 2 value of 0,95: new motor cars licenced, imports, commercial bank loans and discounts, and commercial bank demand deposits.Due to a lag of three months between the dependent and all of the independent variables it is possible to calculate a predicted value for the coinciding indicator three months ahead of its realization.However, due to delays in the publication of quantitative data, data on the variables at time t may only become available after another four months have elapsed, destroying the true ex-ante character of the leading indicator.
An alternative based on qualitative data: Smit and Kotze's proposal Smit and Kotze (1984) stressed the importance of the psychological disposition of businessmen and argued that the expectations and plans of economic agents provide the link between future and present economic activity.They showed that psychological links do not only occur in the socalled psychological business cycle theories, but that they also feature within the so-called real theories, although not in the same exalted position.The underlying idea is that if changes in moods and expectations should lead to changes in real activity, then in theory it should be possible to predict the development of the business cycle when measurements on psychological variables are available.
They tested this hypothesis using the subjective qualitative expectations and plan data published by the Bureau for Economic Research at the U Diversity of Stellenbosch as part of the Opinion Survey and Building Survey results.All exante variables related to expectations (statements about future events outside the direct sphere of influence of the firm) and plans (statements about the firm's own future course of action) are included in the analysis provided that sufficient observations are available.
These qualitative series, when compared to the quantitative series usually used in indicator research, have the disadvantages of being available only on a quarterly instead of a monthly basis and of covering only a few sectors in the S.-Afr.Tydskr.Bedryfsl. 1986, 17(l) economy.Furthermore about 50% of the series only date back to 1974:4.However, these series have the major advantage of being available three months ahead of the The variables are measured relative to the same quarter in the previous year (higher, the same, lower or better, the same, worse) with the exception of the variables related to stocks which are reported as too high, just sufficient or too low.The qualitative measurements are transformed to balance series on the interval [O, 100].
It has been argued that it is unnecessary to make ex-ante evaluations about the economic significance of the variables as leading indicators or to analyse the consistency of turningpoint relations when a stepwise regression approach is used.
Nor is it necessary to select subjective weightings when averaging because these will also be selected optimally.Similar arguments apply in the case where the weighting is done by means of factor-and regression analysis.The latter should also take care of the consistency of timing at peaks and troughs.As far as economic significance is concerned, important sectors of the economy are not covered by the survey (amongst others agriculture, mining and banking) but because of the high degree of economic interdependence between the manufacturing sector, the wholesale, retail and motor trade and the sectors which are not surveyed, it may be argued that the subjective opinions expressed in• ~e survey are representative of a much wider field of economic activity.
The high intercorrelations between the different balance series necessitate a reduction in the dimensions of the explanatory data base before regression relationships between the balance series and a quantitative business cycle indicator can be explored.The original data base is reduced to three factors by means of principal components analysis followed by a V arimax factor rotation in order to provide a representation of the factor loadings matrix which facilitates interpretation.This rotation simply transforms the axes retained from the principal components solution into a ne~ orthogonal space of the same dimensions.The procedure IS aimed at improving the ease of interpretation of the reference Table1 Rotated factor matrix The total variance accounted for is 77,6%.Unfortunately the object of obtaining a relatively high loading on a single factor only is not achieved which complicates the identification of the underlying factors.The inverse relationship between the stock variables and the rest of the pro-cyclical variables is emphasized by the above factor loadings.
In the factor analysis standardized values of the survey variables are used, therefore the original balance series are standardized over the period 1974:4-1984:4 before the three factors, Fl, F2 and F3, are calculated as linear combinations of the survey variables.CI, van Coller's coinciding indicator averaged on a quarterly basis and transformed to percentual yearly changes on a quarterly basis.
CIC, = (CI, -CI,_ 4 )/CI,_ 4 x 100 is used as the dependent variable in the regression analysis.The rationale for using percentual changes in the case of variables measured relative to the same quarter in the previous year is to be found in the work of Anderson (~952), whilst Smit (1983) demonstrated that the stock vanables (measured as high, just sufficient or too low) correlate significantly with yearly differences in certain components of final demand.
The model is completed by the additio~ of a seC?nd equation which transforms the percentual differences mto levels of CI: 145 CI, = CI,_4 + CIC, X CI,_4/100 The model is simulated over the sample period (1975:2-1984:4) by means of forecasts one period ahead.This reflects the practical forecasting situation for which the model is designed.It is possible to forecast a value for CI almost three months early and the lead time from forecast to the publication of the data is more than six months.
The Kalman filter Meinhold & Singpurwalla (1983) 'demystified' the Kalman filter by putting it in the context of Bayesian inference.Their exposition was as follows.
Given a data set Y,, Y,_ 1 , ••. , Y 1 where Y, is dependent on a non-observable quantity 8 1 known as the state of nature and the object is inference about 8,, the observation equation describes the relationship between Y, and 8, as 8, is a scalar or vector independent of the dimension of Y,, and v, is the observational error which is assumed to be normally distributed with mean O and variance V,.
The state of nature changes over time according to the systems equation: The systems error is w, which is also assumed to be normally distributed with mean O and variance W,.Furthermore it is assumed that v 1 and w 1 are independent.
The Kalman filter is a recursive method of inference about 8 based on a Bayesian procedure.
1 At time t -I the knowledge about 8,_ 1 conditional on the information Y,_ 1 is embodied in the following probability statement N(0,_ 1 , I,-1) where 8 0 and I 0 are best guesses for the mean and variance of (8,_ 1 /Y,_ 1 ) at time 0.
. Looking ahead at t in two stages, the best ch01ce ~f 8, before Y, is observed is given by the systems equation, therefore (8,/Y,_ 1 ) This is the prior distribution of (8,/Y,_i).Define_e, as the error made in the prediction of Y, from time tl, 1.e.= Y, -F,G,8,_1.
(v) Continue with the recursive procedure.
(vi) K, = R,F,'(V, + F,R,F,')-1 is known as the Kalman filter.( vii) The mean of the posterior distribution is the regression of 8 1 one,.
(viii) The mean of the posterior distribution is therefore equal to the sum of the mean of the prior distribution and the regression coefficient of 8 1 on e, multiplied by the error committed in predicting the next observation.The aforementioned principles can now be applied to a dynamic linear system consisting of p observation equations and q systems equations Y, The usual ~ump~ons apply.Given Y 1 , Y 2 , . . ., Y,_ 1 , the problem is to esttmate Sk.If k = t -1, the problem is referred to as Kalman filtering, if k > t -1 it is referred to as Kalman forecasting, and if k < t -1 it is a problem in Kalman smoothing.
In order to apply the filtering technique to the data ~e~bed in earlier sect~ons, th~ values of the coinciding mdicator, CI,, are assOCiated wtth the systems variable s and the expectations about CI,, namely CI7, as generated~ Smit and Kotze, are associated with the observed variable Y,.Equations generating CI, and CI7, respectively, are introduced and the information is combined via the Kalman filter.
It must be kept in mind that the formulae were derived under the assumption that no past or present observations on the state variable are available.In the problem considered here, in addition to the observations of Y,, it is also possible ~ observe S,_ 1 .This being the case, it is possible to replace Sr-1 ;,-1 by S,_ 1 in all the equations and because this modified estimate is obtained by using more information, it is to be expected that the resultant estimates will be more efficient than those containing s,_ 1 11 _ 1 .

The results
In the analysis of the results presented here, it should be kept in mind that not all the forecasts are made at the same 147 Once more, the filtered estimates outperform the estimates based on the systems equation only.Table 3 also illustrates the improvement in results due to a better specification of the systems equation.In this case, the improvement due to filtering is considerably less than in the case of Model 1.

Model3
In the period 1983:3-1984:3 the South African economy experienced a mini-boom in the midst of a downswing phase due to excessive government expenditure.This caused most leading indicators to break down 2 and the indicator based on qualitative data, for one, never registered this phenomenon.
At the same time a quick growth in the money supply also Once more it is evident that the Kalman filter improves the accuracy of the estimate of CI,.

Conclusion
This article has explored an approach aimed at using qualitative expectations data as well as the lead time in the publication of this data.Three approaches have been compared: an approach using quantitative data, one based on qualitative data and one integrating the two data sets via the Kalman filter.For three different models the Kalman filter has led to improved forecasts and further research into the application of the Kalman filter to qualitative data seems a worthwhile pursuit.

Notes
1. Models 1 and 2 do not use the same factor loadings as presented in Table 1.This is due to the fact that the principal components analysis has been done twice over the past ten quarters.The factors loadings have remained very stable over time and the addition of new information.2. This breakdown was already evident in Model 2 where the WAND variable switched signs in the systems equation.
event they relate to.The following variables have qualified retail; expected volume of sales; retail; stocks relative to expected demand; retail; expected business climate; motor trade; stocks of new cars/used cars/ spares relative to expected demand; motor trade; expected business climate; manufacturers of building materials; expected production capacity; BBUSCLI builders; expected business climate.

Afr.J.Bus.~gnit. 1986, 17
At the same time the Kalman filter estimate for an unknown S,_ 1 can be calculated.At point B, one month into the next quarter, quantitative information about the lagged variables in the Barr model becomes available and the quantitative model can be applied.At the same time the Kalman filter estimate for a known S,_ 1 can be calculated.