A generic instrument to measure customer satisfaction with the controllable elements of the in-store shopping experience

In this study an attempt is made to develop a generic instrument that could be used to measure customer satisfaction with the controllable elements of the in-store shopping experience. By closely following the most contemporary guidelines for scale development, and involving 11 063 respondents in four different surveys, the authors emerge with a 22-item instrument to measure satisfaction with the in-store shopping experience. The evidence of the psychometric properties of the proposed ISE instrument offered here is compelling in terms of its uni-dimensionality, with-in-method convergent validity, cross-validation of dimensions in a cross validation sample, reliability of the instrument, its discriminant validity and its nomological validity.


Introduction
Retailers are under constant pressure to adapt in response to ever-changing and accelerating environmental circumstances (Dabholkar, Thorpe & Rentz, 1996:3).More sophisticated and demanding customers, competition from both domestic and foreign sources and breathtaking new technological developments, are just a few of the variables pressurising retailers to find new ways to differentiate themselves from others.These attempts have ranged from a focus on service delivery to loyalty schemes -all with limited success (Egan, 1999;Sopanen, 1996;Berry, 1986;Hummel & Savitt, 1988;Reichheld & Sasser, 1990;Dabholkar et al., 1996).Kerin, Jain and Howard (1992:394) earlier contended that it is largely the store shopping experience itself which determines customer perceptions of a store.This school of thought contends that, from a measurement and management perspective, that a comprehensive instrument that captures all the dimensions of a shopping experience should be the focus as opposed to just one dimension such as service quality (Parasuraman, Zeithaml & Berry, 1988).Our contention is that in a retail environment where a mix of goods and services is offered, the comprehensive approach would be preferable.When considered in this way service quality, for instance, is only one component of the consumer's in-store shopping experience, as are several other components.If only one component of the in-store shopping experience is considered in isolation, it may be detrimental to our understanding of customers' experiences, and this in turn could lead to strategies that either overemphasise or neglect the importance of one or more retail experience components.
We have focused our investigation on the measurement and management of the controllable elements of the in-store shopping experience (ISE).We thus exclude the uncontrollable variables of a retail experience such as parking facilities, for instance.We have also focused our efforts on in-store retailing which, by definition, excludes retail formats such as catalogue and internet retailing, which do not typically have a significant personal interaction component (between customer and sales staff).

The objectives of this study
This study reports on several phases of a long-term examination of the controllable elements of the in-store shopping experience.The outcome of this stream of research is a generic, multi-item instrument that can be used to measure customer satisfaction with the controllable components of the in-store shopping experience in a variety of different retail environments.Based on the disconfirmation paradigm (Oliver, 1980), the objective of first two phases of this three-phase process was to identify the dimensions of importance to consumers when assessing their satisfaction with an in-store shopping experience.In other words, consumers were asked what their expectations of an in-store shopping experience were.Only once these dimensions had been identified and empirically confirmed, could we proceed to the development of a valid and reliable instrument to measure customer satisfaction with the instore shopping experience at retailer or shop level.
The scale development process Churchill (1979) proposed a well-accepted procedure for the development of valid and reliable multi-item instruments.This process consists of the following steps: domain specification, generation of questionnaire items, empirical surveying, an iterative process of scale purification based on reliability assessment and validity checks, and the development of norms.This process has been enhanced in recent years by the availability of statistical procedures such as confirmatory factor analysis providing additional evidence of construct validity (Tull & Hawkins, 1993: 318).The process suggested by Churchill (1979) has been followed in the first two phases of this study.

The in-store shopping experience
The total retail experience consists of all the elements that encourage or inhibit consumers during their contact with a retailer (Berman & Evans, 1998:19) and can be either noncontrollable (e.g.street parking, deliveries from suppliers and consumption taxes) or controllable and include in-store and external elements.In this study the emphasis is on the controllable elements of the in-store shopping experience.
The retail literature suggests that the controllable components of the in-store shopping experience may be grouped under six dimensions, namely: Service quality Merchandise quality Merchandise variety and assortment Internal store environment

Product prices Store policies
Two dimensions that some may argue are part of the ISE are excluded from this study, namely store image and store location.In the retailing literature it is often suggested that a favorable store image leads to store loyalty (Hirschman, 1981).Store image, in turn, has been described as consisting of the following three general factors: merchandise-related aspects, service-related aspects, and pleasantness of shopping at a store (Mazursky & Jacoby, 1985).Our approach is that all three of these dimensions are captured in ISE (Figure 1) and that loyalty is an outcome of a positive in-store shopping experience (ISE) rather than an underlying dimension.
Although store location is normally one of the important reasons why customers visit a particular store, it is not addressed in this study because we are of the opinion that over time a favorable location may deteriorate in value because of changes in road patterns, the opening of competitive shops, and changing demographic patterns in the community.
Figure 1 illustrates the theoretical structure of the ISE, and the shaded dimensions are the ones included in the first empirical survey.Because of the size of the complete ISE model and the large number of items measuring the three dimensions, as well as the very real threat of respondent fatigue, it was decided to include only three of the dimensions in the first empirical phase of the project, namely service quality (responsiveness, reliability, empathy, assurance, tangibles), merchandise quality, and merchandise variety and assortment.For a discussion of these three dimensions, see Terblanche and Boshoff (2001a: 101-103).These three dimensions of the instore shopping experience (service quality, merchandise quality, and merchandise variety and assortment) and their associated items were then subjected to an empirical survey to assess their importance to retail shoppers.

Methodology of the first phase
The first phase of the empirical research was undertaken to assess which dimensions and items measure the three dimensions of ISE shaded in Figure 1 (Service Quality, Merchandise Quality and Merchandise Variety and Assortment).All 22 items of SERVQUAL were used to measure Service Quality.The Merchandise Quality dimension was measured using three Merchandise Quality items suggested by Finn and Kayandé (1997) as well as two self-generated items.Merchandise Variety and Assortment was measured using a self-generated five-item instrument.
All 32 items were linked to a 7-point Likert-type scale.
Respondents were asked to rate the importance of the various components of ISE on a 7-point scale where a 7 meant that the aspect under consideration was 'extremely important' and a 1 meant that it was 'not important'.
Customers from two fast food firms, two clothing firms, two supermarkets and two hardware stores constituted the population for the first phase of the research.A total of 2063 questionnaires were completed during personal interviews which were conducted after respondents had completed their shopping.In the scale purification process that followed, a Maximum Likelihood Exploratory Factor Analysis was conducted, specifying a Direct Quartimin oblique rotation (Jennrich & Sampson, 1966) of the original factor matrix, using the computer programme BMDP (Jennrich & Sampson, 1966).This was followed by an assessment of the internal consistency of each dimension as suggested by Churchill (1979).

Results of the first empirical survey
As part of the scale purification process, several different factor solutions were considered.The most interpretable factor structure (factor loading exceeding 0.4 and no crossloadings) to emerge, was a 3-factor solution.All the factors in the 3-factor solution had Eigen values above 1.00, and a sufficient number of items loading on them to a significant (0.40) extent (Hair, Anderson, Tatham & Black, 1998).The three factors that emerged were named Personal Interaction (measured by 3 assurance, 4 empathy, 2 reliability and 3 responsiveness items of SERVQUAL), Physical Cues (measured by 1 reliability and 2 tangibles items of SERVQUAL and 4 merchandise quality items) and Merchandise Variety and Assortment (represented by 5 variety items).The items that did not load to a significant extent or did not demonstrate sufficient discriminant validity were deleted.The remaining 24 items were then subjected to a reliability analysis to assess the internal consistency of the instrument.All three factors, as well as the overall instrument, returned Cronbach alpha coefficients above the 0.7 level suggested by Peterson (1994).The three remaining empirical factors and their associated items are listed in Table 1.
The outcome of the first survey was that Figure 1 had to be re-configured slightly, as shown in Figure 2, to reflect the results of the scale purification process (Table 1).The second survey A second empirical survey was then conducted, as suggested by Churchill (1979).The six items with the highest factor loadings that measured personal interaction were retained for the second empirical survey.Three of the items used to measure physical cues measured product quality, and it was decided to retain only one of these three items.The following item was added to measure physical cues: 'Products that function the way they are supposed to'.Two new items, apart from the five that were used to measure merchandise variety and assortment in the first survey, were added for the second empirical survey.These two items were: 'A choice of different brand names' and 'A good selection of well-known brands'.A total of sixteen items remaining from the scale purification process of phase 1, as well the three new items (in total nineteen items measuring Personal Interaction, Physical Cues and Merchandise Variety and Assortment) were combined with items to measure the three remaining (untested) dimensions of ISE, namely Internal Store Environment, Product Prices and Store Policies.The latter three dimensions were discussed in depth in Terblanche and Boshoff (2001b: 12-13).Figure 2 is a schematic presentation of the dimensions of ISE that were subjected to empirical assessment during the second survey.

Sample and data collection: Second survey
The sampling procedure used in the second survey was a combination of convenience and random sampling.Respondents were visitors to two regional shopping centers.Individual respondents (visitors) were selected on a simple random basis.Personal interviews, using a structured questionnaire, were conducted with visitors to the shopping center when they exited the center.The interviews were conducted over a period of two days to include all the different types of visitors who usually frequent these shopping centers.Respondents were again asked to rate the importance of the various components of ISE on a 7-point scale.A total of 2 504 questionnaires were completed, of which 1 197 were supermarket customers and 1 307 clothing store customers.
The instrument used in the second survey included the 19 items remaining after the scale purification process subsequent to the first survey.The 19 items mentioned above measured the dimensions Personal Interaction, Physical Cues and Merchandise Variety and Assortment in the second survey.The 'new' dimensions in the second survey were Internal Store Environment (9 items), Product Prices (5 items) and Store Policies (7 items).

Empirical results: Second survey
The second survey data analyses procedures again closely followed the guidelines for scale development suggested by Churchill (1979).To assess the discriminant validity of the instrument, a Maximum Likelihood Exploratory Factor Analysis was again conducted, specifying a Direct Quartimin oblique rotation (Jennrich & Sampson, 1966) of the original factor matrix.
Although it was expected that a six-factor solution would emerge (in line with Figure 2) several different factor solutions were considered.The most interpretable factor structure (factor loading exceeding 0,4 and no crossloadings) to emerge, was a 5-factor solution (Table 2).All five factors in Table 2 had Eigen values above or very close to 1.00 and a sufficient number of items loading on them to a significant (0,40) extent (Hair et al., 1998).The three factors retained from the first survey (Personal Interaction, Physical Cues and Product Variety and Assortment) remained fairly stable during the second survey.Of the 'new' dimensions added for the second survey, Store Policies emerged as a separate factor, as expected.Many of the items expected to measure Physical Cues and Product Prices, however, loaded on a common factor that was labeled Merchandise Value (Table 2).As some items expected to measure Physical Cues now loaded on the factor Merchandise Value, the remaining items were in fact measurements of Internal Store Environment, and were therefore labeled as such.
The items that remained, measuring the dimension Store Policies after the scale purification process, all referred to the narrower concept Complaint Handling, rather than the broader Store Policies and it was re-named as such.
To summarise: the empirical factor structure that emerged after the second phase, suggested that the in-store shopping experience was multi-dimensional and consisted of five dimensions, namely: Merchandise Value, Internal Store Environment, Personal Interaction, Merchandise Variety, and Complaint Handling (see Figure 3) -all measured with scales that demonstrated more than adequate reliability (Table 2).

Convergent validity
Any measuring instrument should be both reliable and valid (Churchill 1979).A variety of different types of validity should be considered before any claims of validity can be made (Tull & Hawkins, 1993).To test the convergent validity of the ISE instrument, the total ISE score (mean 153,52 : SD 18,04) was correlated with scores that were expected to measure the adequacy of the retailer's parking facilities, satisfaction with in-store promotions, and the image of the store.It was expected that consumer perceptions of three retail issues (the adequacy of parking facilities, satisfaction with in-store promotions, and the image of the store) would be positively associated with their ISE scores.
The empirical results reported in Table 3 confirm this contention.The Pearson correlation coefficients shown in Table 3 reveal a consistent pattern of significant positive correlations with the total ISE scores, namely perceptions of adequacy of parking facilities (PARK, mean 6,30, SD 1,22), satisfaction with special in-store promotions (PROM, mean 5,32, SD 1,51) and the image of the company (IMAGE, mean 5,75, SD 1,41), suggesting at least some evidence of the convergent validity of the ISE instrument.
We realize that using the same data set to perform both an exploratory factor analysis and a CFA is often questioned, but it is certainly not without precedent (Finch & West, 1997).In the end we did proceed because the CFA results do provide evidence of construct validity (Tull & Hawkins, 1993).A second consideration was that this study is a longterm project with several phases and surveys, and that the CFA results of this survey would allow some comparisons with future survey (and particularly CFA) results.By closely following the guidelines for multi-item scale development suggested by Churchill (1979), and based on the results of two empirical surveys, we concluded that there are five dimensions of importance to consumers when assessing their expectations/satisfaction with a in-store shopping experience.These dimensions are: Merchandise Value, Internal Store Environment, Personal Interaction, Merchandise Variety and Complaint Handling.These five dimensions and thus the In-store Shopping Experience, can be measured by means of 26 items.The proposed instrument in its current form (at that stage) demonstrated high levels of reliability, and some evidence of discriminant validity, convergent validity, and construct validity.Consistent with the guidelines suggested by Churchill (1979), the instrument in its then current form needed to be subjected to a third empirical assessment and particularly a cross validation assessment to provide conclusive evidence of its construct validity.

The third survey
Valid measurement is, according to Peter (1979), the sine qua non of science.Peter even argues that if a discipline does not use instruments that are valid and reliable, it cannot be regarded as a science.At the most basic level it means that the set of items making up a measuring instrument must measure only one thing in common (Hattie, 1985).Steenkamp and Trijp (1991) recommend that traditional methods of assessing the uni-dimensionality of an instrument (such as item-to-total correlations, exploratory factor analysis, reliability assessment) as described above and recommended by Churchill (1979) are particularly useful to reduce the original number of items, and to provide preliminary scales that can subsequently be tested and refined by means of a confirmatory factor analysis to assess the construct validity of the instrument.Gerbing and Anderson's Monte Carlo study (1992) produced empirical evidence to support this view.In a later study they concluded that ' …exploratory factor analysis can contribute to a useful heuristic strategy for model specification prior to cross validation with confirmatory factor analysis (Gerbing & Anderson, 1996: 62).
In  The methodology of the third survey The sampling procedure used for the third survey was again a combination of convenience and random sampling.
Respondents were customers of a national clothing retailer and customers of a major national grocery supermarket.Individual respondents (customers) were selected on a simple random basis.Personal interviews, using a structured questionnaire, were conducted with customers of theses stores over a period of two different days to facilitate the cross validation of results.During the first day 1 686 retail shoppers (sample 1) were interviewed, and during day two 1 657 were interviewed (sample 2).
Unlike the first two phases, respondents were asked to rate their satisfaction with the in-store shopping experience of a particular store on a 7-point Likert-type scale.A total of 3 343 questionnaires were completed of which 2 096 were clothing shop customers (1 063 on day 1 and 1 033 on day 2) and 1 247 were supermarket customers (623 on day 1 and 624 on day 2).
The instrument used in the third survey included all 26 items remaining after the scale purification process after the first two surveys, measuring the following five dimensions: Personal Interaction (6 items), Merchandise Value (8 items), Merchandise Variety and Assortment (4 items), Internal Store Environment (5 items), and Complaint Handling (3 items).

Empirical results: Third survey
Confirmatory factor analyses results: Sample 1 In line with the recommendation of Steenkamp and Trijp (1991) and others, a confirmatory factor analysis (Maximum Wishart Likelihood estimation) of the data of sample 1 was conducted as it is able to provide, amongst others, evidence of uni-dimensionality.The kurtosis measures (normalized kurtosis = 291,5; relative kurtosis = 1,744) revealed, however, that the data were not adequately normally distributed.The analysis was then re-run using a more robust Maximum Likelihood by analyzing the asymptotic covariance matrix.Maximum Likelihood parameter estimates of an asymptotic covariance matrix are robust against moderate violations of multivariate normality, provided that the sample size exceeds 100 (Gerbing & Anderson, 1988).
Uni-dimensionality Steenkamp and Trijp (1991) recommend that in the case of poor model fit, one assess the standardized residuals for the potential reasons.Although the measurement model fitted the data very well we nevertheless considered the standardized residuals.An inspection of the standardized residuals did reveal several values higher than the⏐2.58⏐cut-offvalue proposed by Jöreskog and Sörbom (1988), which suggests that that some misspecification may have occurred.However, Steenkamp and Trijp (1991) caution that standardized residuals ought to be treated with circumspection as they are heavily influenced by deviations from multivariate normality, large sample sizes (1 686 in this case) and thus by the power of the test.We nevertheless decided to re-run the model after deleting three Merchandise Value items (MEVAL1, MEVAL6 and MEVAL7) for overfitting and one Personal Interaction item (PERIN6) for underfitting.
Although there were still some standardized residuals higher than the ⏐2.58⏐cut-off value, no pattern was apparent.In other words, the ISE instrument demonstrated sufficient evidence of uni-dimensionality.

Confirmatory factor analyses results: Crossvalidation (sample 2)
When developing a new scale, cross-validation is desirable because there is always the possibility that one has capitalized on chance.The ISE instrument was again administered to a sample of 1657 respondents (retail shoppers) similar to the previous sample described above.
These results confirm the earlier conclusion that the ISE instrument demonstrates excellent uni-dimensionality.

Within-method convergent validity
Several ways to assess the within-method convergent validity of an item have been proposed.The statistical significance of the regression co-efficient, the correlation of the item with the construct and the overall fit of the model are all indicators of within-method convergent validity.In this model all regression coefficients are strongly significant (p < 0,000) with the lowest t-value being 26,32), all items correlate significantly with each underlying dimension (lowest correlation coefficient is 0,519), whilst the overall model fit has already been alluded to.All these measures point to excellent within-method convergent validity.

Confidence
Steenkamp and Trijp (1991) also suggest that confidence in a model can be enhanced by an analysis in which the meaning of the construct is kept invariant by constraining all the parameters in the measurement model in a cross validation sample, to be equal to the measurement model parameters of the first sample.In other words, an assessment is done (in a second, cross validation sample) to determine whether or not the measurement model is identical across the two samples.The null hypothesis (H0) is that the factor loadings, measurement error variance, factor variances and co variances are all identical in both samples.The alternate hypothesis (H1) states that two or more of these parameters are different across the two samples.A Chi-square difference test is used to address H0 and H1.The model under the null hypothesis is fitted to the data by specifying equality constraints across the two samples, while these equality constraints are relaxed to fit the model under H1.The Chi-square difference test statistic value is then obtained as the difference between the Chi-square test statistic values for the H0 and H1 models.The corresponding degrees of freedom are the difference between the degrees of freedom of the H0 and H1 models.The Chi-square difference test results for the in-store shopping experience measurement model are summarized in Table 5.The small p-value (p < 0.001) for the Chi-square difference test statistic value in Table 5 suggests that there is sufficient evidence to conclude that the model differs across the two samples.In other words, the overall cross-validation of the measurement model for the total in-store shopping experience instrument is not supported by the data.
The results in Table 6 show that the measurement model did not cross-validate in its entirety (factor loadings, factor variances, factor co-variances, and measurement error variances).However, this does not imply that the crossvalidations of the four specific parameter types are not supported.To test this proposition (that any one or all of the factor loadings, factor variances, factor correlations and error terms may be different in the two samples), a series of Chi-square difference tests were conducted.
The next step was then to determine whether or not the factor loadings were identical across the two samples.The null hypothesis (H0) states that all the factor loadings were identical across the two samples, while the alternate hypothesis (H1) states that at least two factor loadings are different across the two samples.As before, a Chi-square difference test is used to test H0 and H1.The results for this Chi-square difference test are listed in Table 6.The large p-value (p > 0,01) for the Chi-square difference test statistic value in Table 6 suggests that there is insufficient evidence to reject the null hypothesis (H0) that the factor loadings are identical across both samples.In other words, the data support the cross-validation of the factor loadings, as measured by the in-store shopping experience instrument, across the two samples.
To assess the cross-validation of the factor variances, the null and alternate hypotheses considered were: H0: The five factor variances are equal across the two samples H1: The five factor variances differ across the two samples The p-value (p > 0.01) for the Chi-square difference test statistic value in Table 7 suggests that there is insufficient evidence to reject the null hypothesis (H0) that the factor variances are identical across both samples.In other words, the data support the cross validation of the factor variances, as measured by the in-store shopping experience instrument, across the two samples.
To assess the cross validation of the factor co-variances, the null and alternate hypotheses considered were: H0: The ten factor co-variances are equal across the two samples H1: The ten factor co-variances differ across the two samples A summary of the corresponding Chi-square difference test results are provided in Table 8.The small p-value (p < 0,01) for the Chi-square difference test statistic value in Table 8 suggests that there is sufficient evidence to conclude that the factor co-variances differ across the two samples.In other words, the cross-validation of the factor co-variances for the in-store shopping experience instrument is not supported by the data.
Finally, the cross validation of the measurement error variances had to be considered.For this assessment the following null and alternate hypotheses were tested: H0: The 22 measurement error variances are equal across the two samples H1: At least 2 of the 22 measurement error variances differ across the two samples The small p-value (p < 0.001) for the Chi-square difference test statistic value in Table 9 suggests that there is sufficient evidence to conclude that the measurement error variances differ across the two samples.In other words, the overall cross-validation of the measurement error variances for the in-store shopping experience instrument is not supported by the data.
In summary, the cross validation of the entire measurement model across the two samples is not supported by the data.This result is due to the fact that the cross validations of the factor co-variances and the measurement error variances are not supported by the data.However, the cross validations of the factor loadings and the factor variances are indeed supported by the data.
In other words, although the model did not cross validate in its entirety, Table 6 offers evidence that the factor loadings are identical in both samples.We want to argue that the fact that the factor loadings of the ISE measurement model are identical in two separate samples is, from a management perspective, a significant finding.

Reliability
Reliability is generally regarded as a necessary condition for validity (Peter, 1979).Table 10 shows that the internal reliability of the dimensions purported to make up the instore shopping experience construct.All the Cronbach alpha co-efficients of the underlying dimensions were above the generally accepted cut-off value of 0,7 (Peterson, 1994), and for the whole scale it is 0,951.

Discriminant validity
To assess the discriminant validity of the ISE instrument, each underlying dimension of the in-store shopping experience was considered separately.The null hypothesis (H0) in each instance is that one dimension (for example, PERIN: Personal Interaction) is perfectly correlated with all the other dimensions (MEVAL: Merchandise value, COHAN: Complaint Handling, STENV: Internal Store Environment, VAROS: Merchandise Variety and Assortment) in each of five separate chi-square difference tests.The alternate hypothesis (H1) is that at least two of these correlations are not perfect correlations.
The null and alternate hypotheses are tested by means of a Chi-square difference test.The Chi-square test statistic value for the model under H0 is obtained by specifying a model wherein the four correlations are fixed at unity, while that for H1 is obtained by specifying a model in which these four correlations are free parameters to be estimated.The Chi-square difference test statistic value is obtained as the difference between the Minimum Fit Chi-square test statistic values for the models under H0 and H1.The associated degrees of freedom are the difference between the degrees of freedom for the models specified by H0 and H1.In all these Chi-square difference tests, there are 4 degrees of freedom.The results for samples 1 and 2 are shown in Tables 11 and 12 respectively.
The extremely small p-values in Tables 11 and 12 confirm that each dimension is indeed a separate dimension from at  least one other dimension of the in-store shopping experience, and thus provide further evidence that the discriminant validity of the ISE instrument is supported by the data of the two samples.In other words, the dimensions of the ISE are indeed separable from each other and they do not represent a single dimension.
To confirm these findings, we needed to assess whether or not each dimension of ISE is different from ALL the other dimensions (latent variables).This is accomplished by testing whether or not the correlation between any two dimensions is perfect.The corresponding hypotheses are: H0: The correlation between the two latent variables (dimensions) is perfect H1: The correlation between the two latent variables (dimensions) is not perfect As before, a Chi-square difference test is used for each assessment.In this case, the Chi-square test statistic value for the model under H0 is obtained by fitting a model which specifies the correlation to be unity, to the data.The Chisquare test statistic value for the model under H1 is obtained by fitting a model which specifies the correlation as a free parameter to be estimated, to the data.Each of these Chisquare difference tests has one degree of freedom.The results of the 20 Chi-square difference tests for samples 1 and 2 are shown in Tables 13 and 14 respectively.The extremely small p-values in Tables 13 and 14 show that there is sufficient evidence to reject the perfect correlation hypothesis across both samples.In other words, the data from the two samples support the discriminant validity of the five dimensions of the ISE instrument.

Nomological validity
Nomological validity of a construct is assessed by investigating the relationships of the construct with other constructs in a nomological net.The relationships in the nomological net are based on a theoretical (causal) model for the constructs involved.Although this is often assessed by means of a correlation or regression analysis, these techniques do not allow for formal testing of the nomological net (theory) and they do not incorporate measurement errors for the latent constructs of the nomological net (Steenkamp & Trijp, 1991).On the other hand, Structural Equation Modeling with Latent Variables does allow for measurement error and a formal test of the nomological net.Consequently, Structural Equation Modeling is a powerful statistical tool to assess the nomological validity of a construct.
In this study the nomological net that was tested was based on industry empirical evidence such as the PIMS studies of the 1990's, academic research (Rust & Zahorik, 1993;Sirohi, Mclaughlin & Wittink, 1998:240) and on the anecdotal evidence provided by theorists (Hoffman & Bateson, 1997: 290;Heskett, Jones, Loveman, Sasser & Schlesinger, 1994;Oliver, 1997;Zeithaml, Parasuraman & Berry, 1990: 9) and depicted in Figure 4.The bulk of evidence from all these sources suggests that satisfaction with the individual components of an in-store shopping experience will result in customer satisfaction, which will lead to customer retention and loyalty over the long term.It is acknowledged, though, that this relationship is not always linear (Oliva, Oliver & MacMillan, 1992), particularly in a highly competitive industry with limited differentiation potential and low switching costs such as retailing (Jones & Sasser, 1995).
We decided to subject the ISE instrument to one final empirical assessment to assess its nomological validity, by testing the theoretical model depicted in Figure 4.
This time, the ISE instrument was administered to a sample of customers on the database of a retailer selling mainly cosmetics, house ware and gifts.Customer satisfaction was measured with a three-item instrument based on Anderson, Fornell and Lehmann (1994) and Macintosh and Lockshin (1997); loyalty was measured with a four item instrument based on Zeithaml, Berry and Parasuraman (1996); Sirohi et al., (1998) and East, Hammond, Harris and Lomax (2000).
In total 34 000 questionnaires were mailed and 3 153 received back for an effective response rate of 9,27%.
Besides altering the type of retailer, we also used a mail survey as opposed to the mall intercept interviews used earlier.We believed that using a different type of retailer as well as a different type of data collection technique would add additional insight into the robustness of the ISE instrument.
As proposed by Steenkamp and Trijp (1991), LISREL 8.51 for Windows (Jöreskog & Sörbom ,2001) was used to fit the model depicted in Figure 4 to the data, to avoid the limitations associated with correlation and regression analyses.More specifically, the Weighted Least Squares (WLS) method for Polychoric Correlation Matrices was used.The WLS estimates of the path coefficients are shown in Table 15 (χ 2 = 2634,1; df = 334; RMSEA = 0,047; GFI = 0,978; AGFI = 0,974; NFI = 0,923).Table 15 shows that all the estimates are in the predicted direction and all, but the influence of Customer Complaint Handling (COHAN) on Customer Satisfaction, are statistically significant.In other words, the results summarized in Table 15 provide strong empirical support for the relationships of the dimensions of the ISE instrument with other constructs in a nomological net as predicted by theory.

Conclusion
Retailing consists of varied and diverse activities.There is more than anecdotal evidence that Monday morning shoppers are different from Wednesday afternoon shoppers, who are all different from weekend shoppers.Retail shopping is also varied in terms of the types of products bought.Marketing and retail textbooks point to the differences between convenience goods, shopping goods and specialty goods -not because marketers need to memorize product classifications but because we know that consumers behave differently when buying and shopping for different types of products.And the retail environment is further complicated by the diversity of retail formats such as catalogue retailing and more recently, electronic retailing.
Against this background an attempt to develop a generic instrument to measure satisfaction with in-store shopping experiences across such a diversity of retail circumstances is indeed an ambitious one.This is particularly true if one considers the problems experienced by others (such as SERVQUAL) who have attempted to develop generic instruments.
Despite the daunting challenge, this study attempted to develop a generic instrument that could be used to measure customer satisfaction with the controllable elements of the in-store shopping experience.The final questionnaire consists of 22 items measuring 5 dimensions of the in-store shopping experience that we believe are generic across all in-store retail environments, based on customer expectations (see Appendix A).

Figure 4: Theoretical model to assess nomological validity
We have closely followed the most contemporary guidelines for scale development involving 11 063 respondents in four different surveys.The evidence of the psychometric properties of the proposed ISE instrument offered here is compelling in terms of its unidimensionality, with-in method convergent validity, cross-validation of dimensions in a cross-validation sample (Tables 5-9), the reliability of the instrument (Table 10), its discriminant validity (Tables 11-14), and its nomological validity (Table 15).
We also would like to point out that the instrument has been administered to customers in a variety of different retail industries (fast food, clothing, grocery, hardware, cosmetics, gifts and house ware).Also it has proved to be robust across different data collection techniques (mall intercept and mail surveying).
The only area where the instrument did not succeed completely was in terms of the cross-validation of the entire model.From a management perspective, however, it is important that the factor loadings (or ISE dimensions) did indeed cross-validate successfully in the cross-validation sample.
Some critics may point to the fact that one of the five ISE dimensions (Customer Complaint Handling or COHAN) is not a significant predictor of Customer satisfaction in the nomological net depicted in Figure 4. We must point out that Customer complaint handling has consistently been shown to be an important issue to respondents.However, we have to acknowledge that most of our respondents have probably not lodged a complaint to the retailer about whom we asked them, and this may be the reason why there is no significant relationship between Customer Complaint Handling and Customer Satisfaction.
We suggest, however, that any measure of customer satisfaction at store level in a retail environment needs to include all five dimensions of the ISE to ensure that the uniqueness of any retail situation can be adequately captured.

Figure 1 :
Figure 1: Schematic presentation of the theoretical structure of the in-store shopping experience

Figure 2 :
Figure 2: Schematic presentation of the dimensions of the ISE tested in the second phase of the study