Likert Scale
The Likert Scale is a common attitudinal survey format, requiring survey participants to select an option between “Strongly agree” and “Strongly disagree”. The most common scales have 5 or 7 points, but others such as 4 or 3 point scales are also used.
- Examples of Likert scales
- Interpreting and analysing Likert-scale surveys
- Scoring multi-question Likert scale surveys
- How many participants are required for a Likert scale survey?
The method is named after Rensis Likert, who published examples of such surveys in the 1932 paper A Technique For The Measurement of Attitudes. Likert is often incorrectly listed as the inventor of the method by people who have not read the original paper. Instead, the paper presents an overview of several typical types of surveys conducted around 1930s to poll American audiences on topics such as racial segregation and military interventions, including Yes/No and other types of “surveys of opinions”.
Although Likert did not invent the scales, his important contribution was to popularise them, and to show that just assigning numerical values to each element of the scale then adding up the response values from a set of related questions provides an easy and useful way to measure people’s attitudes.
Examples of Likert scales
In a wider sense, the Likert scale can refer to any rating between two extreme feelings or behaviours, such as: such as:
- Attitude: Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
- Approval: Strongly Approve, Approve, Undecided, Disapprove, Strongly Disapprove
- Satisfaction: Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied
- Likelihood: Very Unlikely, Unlikely, Not Sure, Likely, Very Likely
Likert’s original paper actually shows approval scales rather than agreement scales, but the disagree/agree opposites are now effectively the most commonly used option in surveys (so much that some authors insist that only Disagree/Agree scales should be called “Likert scales”). Examples of popular surveys using such scales are SUS and UMUX.
Many survey formats do not label all points on the scale, but just label the extreme values.
UMUX-LITE
# | Question | Strongly Disagree | Strongly Agree | |||||
---|---|---|---|---|---|---|---|---|
1 | [This system’s] capabilities meet my requirements | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
2 | [This system] is easy to use | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Generally, scales with fewer points may cause users to be indecisive, as they might not feel strongly about any option. Scales with more points allow for finer-grained selection, but they can also start to confuse if the difference between subsequent points is not clear. Scales with five to seven points seem to strike a good balance between those forces. Scales of more than 11 points are rarely used, as it’s difficult for respondents to significantly differentiate between the choices.
Scales with an odd number of points usually include a neutral or undecided opinion in the middle. A potential downside of this type of scale is that responses for “not applicable” and “undecided” get grouped with the responses for “neutral”. Some Likert scales include an explicit choice for “not applicable”, set after the regular scale. Some surveys include an even number of scale items, excluding the middle point, forcing the participants to make a decision.
Some surveys consisting of multiple Likert-style questions alternate between positive and negative statements, to detect bias and respondents that lazily tick the same response in each row. However, alternating the attitude scale can also carry the risk of misinterpreting and mistaken answers (and if scoring the surveys manually, miscoding the responses).
Interpreting and analysing Likert-scale surveys
In the article The Likert Scale What It Is and How To Use It, Katherine Batterton and Kimberly Hale point out that a key source of confusion for interpreting Likert scale data is that the individual responses are ordinal, but the aggregate scores can be treated as data from an interval.
Gail M Sullivan and Anthony R Artino Jr suggest in Analyzing and Interpreting Data From Likert-Type Scales that when looking at individual questions, it’s best to treat data as ordinal and look at the frequency of responses. Effectively, it’s much more useful to know that the most common answer to a question is “Highly Unlikely”, even if the average trends towards neutral.
Instead of tracking frequency of all the responses, some single-question surveys reduce variability by comparing just the frequency of the highest item on the scale (top-box) or two highest items (top-2-box) to all others. (e.g. the percent of respondents who selected “Satisfied” or “Very Satisfied”). The benefit of top-box and top-two-box scoring is that makes survey responses easy to compare over time and spot trends, without treating the responses as numeric intervals. The downside of top-box and top-two box scoring is that it can only spot changes from the undecided/middle part of the scale to positive attitudes. It cannot spot easily if the undecided people start having more negative attitudes over time. Some single-question surveys (notably NPS) subtract the number of bottom-box responses from the number of top-box responses to make negative trends also visible.
However, when looking at surveys composed of several related questions, treating the data as numeric intervals and computing a single score is quite useful, in order to be able to track relative comparisons. In the book Quantifying The User Experience, Sauro and Lewis suggest computing the mean and standard deviation of the responses, then using the using confidence intervals based on the t-distribution. This is similar to the standard normal distribution for larger sample sizes, and helps to account smaller groups of participants (30 or less). With smaller groups, Sauro and Lewis suggest using a geometric mean to reduce variability.
Scoring multi-question Likert scale surveys
A common way to score multi-question surveys based on Likert scales is to assign a numerical value to each point in the scale (for example 1 for Strongly Disagree, 2 for Disagree, 3 for Neutral and so on), then calculate an average value for each question from all the survey respondents. The score for the whole survey is then usually the sum of individual question scores.
Some survey formats, such as SUS, alternate between redundant positive and negative statements to detect bias. In such cases, the scoring method requires normalisation. A typical way to do that is to subtract 1 from positive statement scores (normalizing the scale to start with 0), and to subtract the negative statement scores from the maximum value (inverting and normalizing the scale to start with 0). For example, choosing 1 on a positive statement would score as 0, but choosing 1 on a negative statement with a 5-point scale would score as 4. This allows an average to be calculated from both positive and negative statement responses, and even to score the entire survey with a single numerical value.
If it’s likely that the participants will not complete the entire questionnaire, it’s possible to use incomplete questionnaires if the whole survey is scored using the average ratings from the completed questions (instead of sums), according to the suggestions about PSSUQ from Quantifying the User Experience.
How many participants are required for a Likert scale survey?
If using the standard normal (z) distribution, Batterton and Hale suggest including at least 200 to 300 people in Likert-style surveys, quoting work by Carmen R. Wilson VanVoorhis and Betsy L. Morgan. Wilson and Morgan actually suggest that the survey sample size should depend on the number of independent factors (questions), but as a general rule of thumb they suggest 50 as very poor, 100 as poor, 200 as fair, 300 as good, 500 as very good and 1000 as excellent.
In the book Quantifying The User Experience, Jeff Sauro and James R Lewis suggest that even smaller samples of 30 or so can be used with t-distribution.
Learn more about the Likert Scale
- A Technique For The Measurement of Attitudes, from the Archives of Psychology, Volume 22 by Rensis Likert (1932)
- Quantifying the User Experience: Practical Statistics for User Research, 2nd Edition, ISBN 978-0128023082, by Jeff Sauro, James R Lewis (2016)
- The Likert Scale What It Is and How To Use It, from Phalanx, Vol. 50, No. 2 by Katherine A. Batterton, Kimberly N. Hale (2017)
- Understanding Power and Rules of Thumb for Determining Sample Sizes, from the Tutorials in Quantitative Methods for Psychology, volume 3 by Betsy L. Morgan, Carmen R. Wilson VanVoorhis (2007)
- Analyzing and Interpreting Data From Likert-Type Scales, from the Journal of Graduate Medical Education by Gail M Sullivan, Anthony R Artino Jr