Why reversed items can be problematic in survey research

In quantitative psychological research, questionnaires with Likert-style items are mostly used to assess variables like emotions, cognitions, and dispositions. Sometimes, it is possible to fall back on already existing questionnaires, sometimes existing questionnaires need to be adapted to the question at hand, and sometimes a new questionnaire needs to be developed for this purpose. Regardless of whether a validated or new scale is used, it is always advisable to test the scale’s internal consistency with a factor analysis to ensure that a homogeneous variable is used in the analyses (and to be able to compute reliability estimates beyond problematic Cronbach’s alpha).

When I started to test the scales I used in my studies with confirmatory factor analyses in my early PhD days, I observed that reversed (also known as negatively worded) items often failed to achieve satisfactory factor loadings and had to be excluded from further analyses. While this observation puzzled me and made me cautious about using negatively worded items, it was not until a few years ago that I learned the reasons behind this observation. When developing a new scale on my own (which only includes positively formulated items), I had to learn that some researchers have very strong opinions about mixed-item scales (i.e., scales that include both positively and negatively worded items). One journal rejected the respective paper for the reason that the newely developed questionnaire only contains positive items. Afterward, the journal and I had a very constructive conversation, during which I conducted a detailed literature review on negatively worded items. Through this blog post, I aim to share what I learned about negatively worded items to help others make informed decisions about (not) using them in their research.

What are negatively worded items and why are they used?

Negatively worded items (also called reversed or reverse coded items) are items that are formulated in the opposite direction to the measured construct. For example, if we aim to measure participants’ current state of happiness, we could ask them to respond to the statement “I am happy right now” using a Likert scale ranging from “strongly disagree” to “strongly agree”. Corresponding negatively worded items could be “I am not happy right now” (negated regular item) or “I am sad right now” (polar opposite item). Initially, such items were thought to reduce response styles, such as the tendency to agree with all items, which can result in acquiescence bias. However, in recent decades, empirical evidence has emerged highlighting several issues with this technique. Below, I will outline three interrelated key problems with negatively worded items, as well as some alternative solutions recommended in the literature.

1) Negatively worded items load on additional factors

The first issue described is that positively and negatively worded items often do not load onto the same factors (for an overview, see Dalal & Carter, 2014). Oftentimes, even a perfect split between positively and negatively worded items can be observed. There are two potential explanations for this phenomenon. Either, negatively worded items produce some sort of method effect (DiStefano & Motl, 2006; Podsakoff et al., 2003; Roszkowski & Soven, 2010) or they measure a different meaningful construct (Lance et al., 2009). The latter explanation is described in detail under point 3. A method effect is the influence of a specific measurement method on responses. A prominent example of method effect is “common method variance”, which is biased estimates of the relations between different constructs measured through the same method (e.g., a questionnaire). In the case of negatively worded items, method effects are attributed to careless responding, not realizing that a questionnaire contains negatively worded items, or issues with understanding the negative part of the item. Another possibility is that method effects are due to meaningful response styles that differ between individuals. Interestingly, such method effects disappear when negatively worded items are reworded into positive ones. Therefore, it is not the content of the items, but rather the fact that they are negatively formulated, that appears to be the reason behind the alleged method effect.

2) Negatively worded items decrease reliability and validity

The second issue is partly a result of the first one. Forcing items that load on different factors into one scale decreases reliability. Using a scale that includes both positive and negative items seems to produce both systematic and random error. The observed scores are not solely the result of the construct of interest because negative items share variance that is unrelated to the construct of interest. Systematic error increases because negatively worded items share variance with each other that is not attributable to measurement error or to the underlying construct. Interestingly, positively worded items do not share systematic variance. Random error increases due to careless responding, confusion with negative wording, or response styles. Both random and systematic error decrease scale reliability and validity (Chyung et al., 2018; Dalal & Carter, 2014). Several studies indicate that only using positively formulated items produces the least measurement error while using mixed items produces the most error (Schriesheim et al., 1991; van Sonderen et al., 2013). Because of this (systematic and random) error, an unbiased assessment of scale validity is not possible. Ultimately, although the goal of negatively worded items was to eliminate response bias (e.g., acquiescence), this same technique introduces even more bias to a scale (see e.g., Schriesheim & Hill, 1981).

3) Negatively worded items may measure a different meaningful construct

When adding negatively worded items to a scale, an implicit theoretical assumption is made. This assumption is that a reversed survey item will still measure the same theoretical construct on the same continuum, with the reversed item measuring the lower end of this continuum. This issue may depend, in part, on whether a polar opposite or negated item is used. Returning to the example of happiness from above, using the negated item “I am not happy right now” may still measure a negated form of happiness (i.e., the lower end of the continuum). However, using the polar opposite item “I am sad right now” theoretically implies that happiness and sadness are two extremes of the same variable. One easily notices that happiness and sadness are different emotional states with distinct qualities. An absence of happiness does not automatically imply that one is sad and vice versa. Of course, I chose an easy example for illustrative purposes, in which it becomes relatively clear that these are distinct concepts. Other examples may be much more difficult and it might not directly become apparent that a polar opposite measures another meaningful variable. In many prominent questionnaires, this implicit assumption is made without considering the possibility that this procedure leads to the measurement of a different variable. In conclusion, especially polar opposite items may measure a different meaningful variable which is why caution is advised when using scales that include such items.

Conclusion

Negatively worded items introduce method bias, measurement error, and may even measure a completely different construct than intended. Accordingly, negatively worded items can cause severe problems in quantitative research. Does this mean researchers should avoid using them altogether? It depends. In some cases, it may be useful to include them in questionnaires using Likert scales. However, researchers should carefully consider the issues addressed above before making such decisions. In addition to testing for measurement error and factor structure, there must be a strong theoretical justification for why negatively worded items still measure the same variable. Also, the reasons for including negatively worded items should be clear as there may be alternatives that do not cause problems. Dalal and Carter (2014) present several such alternatives to negatively worded items. For instance, if the aim is to filter out inattentive participants, instructed response items (e.g., instructing participants to select a particular scale point) may be a better choice. Moreover, Dalal and Carter describe a couple of statistical remedies to identify invalid responses post-hoc (e.g., analyzing response patterns or response time) so that negatively worded items are not necessary in the first place.

If you want to learn more about this topic, I highly recommend reading the book chapter by Dalal and Carter (2014) who go into more detail behind the above addressed issues and present alternatives to negatively worded items.

Cited references

Chyung, S. Y., Barkin, J. R., & Shamsy, J. A. (2018). Evidence‐based survey design: The use of negatively worded items in surveys. Performance Improvement, 57(3), 16-25. https://doi.org/10.1002/pfi.21749

Dalal, D. K., & Carter, N. T. (2014). Negatively worded items negatively impact survey research. In C. E. Lance & R. J. Vandenberg (Eds.) More statistical and methodological myths and urban legends: Doctrine, verity and fable in organizational and social sciences (pp. 112-132). Routledge.

DiStefano, C., & Motl, R. W. (2006). Further investigating method effects associated with negatively worded items on self-report surveys. Structural Equation Modeling, 13(3), 440-464. https://doi.org/10.1207/s15328007sem1303_6

Lance, C. E., Baranik, L. E., Lau, A. R. & Scharlau, E. R. (2010). If it ain’t trait it must be method: (Mis) application of the multitrait-multimethod design in organizational research. In In C. E. Lance & R. J. Vandenberg (Eds.) Statistical and methodological myths and urban legends (pp. 357-380). Routledge.

Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437-455. https://doi.org/10.1037/a0028085

Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879-903. https://psycnet.apa.org/doi/10.1037/0021-9010.88.5.879

Roszkowski, M. J., & Soven, M. (2010). Shifting gears: Consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education, 35(1), 113-130. https://doi.org/10.1080/02602930802618344

Schriesheim, C. A., Eisenbach, R. J., & Hill, K. D. (1991). The effect of negation and polar opposite item reversals on questionnaire reliability and validity: An experimental investigation. Educational and Psychological Measurement, 51(1), 67-78. https://doi.org/10.1177/0013164491511005

Schriesheim, C. A., & Hill, K. D. (1981). Controlling acquiescence response bias by item reversals: The effect on questionnaire validity. Educational and Psychological Measurement, 41(4), 1101-1114. https://doi.org/10.1177/001316448104100420

Van Sonderen, E., Sanderman, R., & Coyne, J. C. (2013). Ineffectiveness of reverse wording of questionnaire items: Let’s learn from cows in the rain. PloS one, 8(7), e68967. https://doi.org/10.1371/journal.pone.0068967