Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter’s (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues’ (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test–retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument’s ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter’s (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument’s factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie’s (1989) review of Harter’s original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn’t any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter’s instruments more generally).
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test–retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models’ sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true–false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test–retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman’s (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true–false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test–retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test–retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).