Introduction
The overall purpose of the Common Industry Format (CIF) for Usability Test Reports is to promote incorporation of usability as part of the procurement decision-making process for interactive products. Examples of such decisions include purchasing, upgrading and automating. It provides a common format for human factors engineers and usability professionals in supplier companies to report the methods and results of usability tests to customer organizations.
Full Product Description
This section identifies the formal product name and release or
version. It describes what parts of the product were evaluated. This
section should also specify: the user population for which the product
is intended, any groups with special needs, a brief description of the
environment in which it should be used,the type of user work that is
supported by the product.
Test Objectives
This section describes all of the objectives for the test and any
areas of specific interest. Possible objectives include testing user
performance of work tasks and subjective satisfaction in using the
product. This section should include: The functions and components of
the product with which the user directly and indirectly interacted in
this test.If the product component or functionality that was tested is a
subset of the total product, explain the reason for focusing on the
subset.
This is the first key technical section. It must
provide sufficient information to allow an independent tester to
replicate the procedure used in testing.
Participants
This section describes the users who participated in the test in
terms of demographics, professional experience, computing experience and
special needs. This description must be sufficiently informative to
replicate the study with a similar sample of participants. If there are
any known differences between the participant sample and the user
population, they should be noted here, e.g., actual users would attend a
training course whereas test subjects were untrained. Participants
should not be from the same organization as the testing or supplier
organization. Great care should be exercised when reporting differences
between demographic groups on usability metrics. A general description
should include important facts such as: The total number of participants
tested. A minimum of 8 per cell (segment) is recommended [10].
Segmentation of user groups tested (if more than one user group was
tested). Example: novice and expert programmers. The key characteristics
and capabilities expected of the user groups being evaluated. How
participants were selected and whether they had the essential
characteristics and capabilities. Whether the participant sample
included representatives of groups with special needs such as: the
young, the elderly or those with physical or mental disabilities.
A table specifying the characteristics and capabilities of the
participants tested should include a row in the table for each
participant, and a column for each characteristic. Characteristics
should be chosen to be relevant to the product’s usability; they should
allow a customer to determine how similar the participants were to the
customers’ user population; and they must be complete enough so that an
essentially similar group of participants can be recruited. The table
below is an example; the characteristics that are shown are typical but
may not necessarily cover every type of testing situation. Gender
Age Education Occupation / role Professional
Experience Computer Experience Product Experience P1
P2 Pn
For ‘Gender’, indicate male or female. For ‘Age’, state the
chronological age of the participant, or indicate membership in an age
range (e.g. 25-45) or age category (e.g. under 18, over 65) if the
exact age is not known. For ‘Education’, state the number of years of
completed formal education (e.g., in the US a high school graduate would
have 12 years of education and a college graduate 16 years). For
‘Occupation/role’, describe what the user’s job role when using the
product. Use the Role title if known. For ‘Professional experience’,
give the amount of time the user has been performing in the role. For
‘Computer experience’, describe relevant background such as how much
experience the user has with the platform or operating system, and/or
the product domain. This may be more extensive than one column. For
‘Product experience’ indicate the type and duration of any prior
experience with the product or with similar products.
|
Gender
|
Age
|
Education
|
Occupation
|
Professional Experience
|
Computer Experience
|
Product Experience
|
P1
|
|
|
|
|
|
|
|
P2
|
|
|
|
|
|
|
|
Context of Product Use in the Test
This section describes the tasks, scenarios and conditions under
which the tests were performed, the tasks that were part of the
evaluation, the platform on which the application was run, and the
specific configuration operated by test participants. Any known
differences between the evaluated context and the expected context of
use should be noted in the corresponding subsection. Tasks A thorough
description of the tasks that were performed by the participants is
critical to the face validity of the test. Describe the task scenarios
for testing. Explain why these tasks were selected (e.g. the most
frequent tasks, the most troublesome tasks).Describe the source of these
tasks (e.g. observation of customers using similar products, product
marketing specifications). ? Also, include any task data given to the
participants, and ? and any completion or performance criteria
established for each task. Test Facility This section refers to the
physical description of the test facility. ? Describe the setting, and
type of space in which the evaluation was conducted (e.g., usability
lab, cubicle office, meeting room, home office, home family room,
manufacturing floor). ? Detail any relevant features or circumstances
which could affect the quality of the results, such as video and audio
recording equipment, one-way mirrors, or automatic data collection
equipment.
Participant’s Computing Environment
The section should include all the detail required to replicate and
validate the test. It should include appropriate configuration detail on
the participant’s computer, including hardware model, operating system
versions, and any required libraries or settings. If the product uses a
web browser, then the browser should be identified along with its
version and the name and version of any relevant plug-ins.
- Display
Devices
- If the product has a screen-based visual interface, the screen
size, monitor resolution, and color setting (number of colors) must be
detailed. If the product has a print-based visual interface, the media
size and print resolution must be detailed. If visual interface elements
can vary in size, specify the size(s) used in the test. This factor is
particularly relevant for fonts.
- Audio Devices
- If the product has an
audio interface, specify relevant settings or values for the audio bits,
volume, etc.
- Manual Input Devices
- If the product requires a manual
input device (e.g., keyboard, mouse, joystick) specify the make and
model of devices used in the test.
- Test Administrator Tools
- If a
standard questionnaire was used, describe or specify it here. Include
customized questionnaires in an appendix.Describe any hardware or
software used to control the test or to record data. Experimental Design
Experimental Design
Describe the logical design of the test. Define independent variables
and control variables. Briefly describe the measures for which data were
recorded for each set of conditions. Procedure This section details the
test protocol. ? Give operational definitions of measures and any
presented independent variables or control variables. Describe any time
limits on tasks, and any policies and procedures for training, coaching,
assistance, interventions or responding to questions.
- Include the
sequence of events from greeting the participants to dismissing them.
- Include details concerning non-disclosure agreements, form completion,
warm-ups, pre-task training, and debriefing.
- Verify that the
participants knew and understood their rights as human subjects [1].
- Specify the steps that the evaluation team followed to execute the test
sessions and record data.
- Specify how many people interacted with the
participants during the test sessions and briefly describe their roles.
- State whether other individuals were present in the test environment
and their roles.
- State whether participants were paid or otherwise
compensated.
Participant General Instructions
Include here or in an
appendix all instructions given to the participants (except the actual
task instructions, which are given in the Participant Task Instructions
section). ? Include instructions on how participants were to interact
with any other persons present, including how they were to ask for
assistance and interact with other participants, if applicable.
Usability Metrics
Explain what measures have been used for each category of usability
metrics: effectiveness, efficiency and satisfaction. Conceptual
descriptions and examples of the metrics are given below.
Effectiveness
Effectiveness relates the goals of using the product to the accuracy
and completeness with which these goals can be achieved. Common measures
of effectiveness include percent task completion, frequency of errors,
frequency of assists to the participant from the testers, and frequency
of accesses to help or documentation by the participants during the
tasks. It does not take account of how the goals were achieved, only the
extent to which they were achieved. Efficiency relates the level of
effectiveness achieved to the quantity of resources expended.
- Completion
Rate
- The results must include the percentage of participants who completely
and correctly achieve each task goal. If goals can be partially
achieved (e.g., by incomplete or sub-optimum results) then it may also
be useful to report the average goal achievement, scored on a scale of 0
to 100% based on specified criteria related to the value of a partial
result. For example, a spell-checking task might involve identifying and
correcting 10 spelling errors and the completion rate might be
calculated based on the percent of errors corrected. Another method for
calculating completion rate is weighting; e.g., spelling errors in the
title page of the document are judged to be twice as important as errors
in the main body of text. The rationale for choosing a particular method
of partial goal analysis should be stated, if such results are included
in the report.
Note: The unassisted completion rate (i.e. the rate achieved without
intervention from the testers) should be reported as well as the
assisted rate (i.e. the rate achieved with tester intervention) where
these two metrics differ.
- Errors
- Errors are instances where test participants did not complete the task successfully, or had to attempt portions of the task more than once. It is recommended that scoring of data include classifying errors according to some taxonomy, such as in [2].
- Assists
-
When participants cannot proceed on a task, the test administrator sometimes gives direct procedural help in order to allow the test to proceed. This type of tester intervention is called an assist for the purposes of this report. If it is necessary to provide participants with assists, efficiency and effectiveness metrics must be determined for both unassisted and assisted conditions. For example, if a participant received an assist on Task A, that participant should not be included among those successfully completing the task when calculating the unassisted completion rate for that task. However, if the participant went on to successfully complete the task following the assist, he could be included in the assisted Task A completion rate. When assists are allowed or provided, the number and type of assists must be included as part of the test results.
In some usability tests, participants are instructed to use support tools such as online help or documentation, which are part of the product, when they cannot complete tasks on their own. Accesses to product features which provide information and help are not considered assists for the purposes of this report. It may, however, be desirable to report the frequency of accesses to different product support features, especially if they factor into participants' ability to use products independently.
Errors
Errors are instances where test participants did not complete the
task successfully, or had to attempt portions of the task more than
once. It is recommended that scoring of data include classifying errors
according to some taxonomy, such as in [2]. Assists When participants
cannot proceed on a task, the test administrator sometimes gives direct
procedural help in order to allow the test to proceed. This type of
tester intervention is called an assist for the purposes of this report.
If it is necessary to provide participants with assists, efficiency and
effectiveness metrics must be determined for both unassisted and
assisted conditions. For example, if a participant received an assist on
Task A, that participant should not be included among those successfully
completing the task when calculating the unassisted completion rate for
that task. However, if the participant went on to successfully complete
the task following the assist, he could be included in the Unassisted Task
A completion rate. When assists are allowed or provided, the number and
type of assists must be included as part of the test results. In some
usability tests, participants are instructed to use support tools such
as online help or documentation, which are part of the product, when
they cannot complete tasks on their own. Accesses to product features
which provide information and help are not considered assists for the
purposes of this report. It may, however, be desirable to report the
frequency of accesses to different product support features, especially
if they factor into participants’ ability to use products independently.
Efficiency
Efficiency relates the level of effectiveness achieved to the
quantity of resources expended. Efficiency is generally assessed by the
mean time taken to achieve the task. Efficiency may also relate to other
resources (e.g. total cost of usage). A common measure of efficiency is
time on task.
- Task time
- The results must include the mean time taken to
complete each task, together with the range and standard deviation of
times across participants. Sometimes a more detailed breakdown is
appropriate; for instance, the time that users spent looking for or
obtaining help (e.g., including documentation, help system or calls to
the help desk). This time should also be included in the total time on
task.
- Completion Rate/Mean Time-On-Task.
- The measure Completion Rate /
Mean Time-On-Task is the core measure of efficiency. It specifies the
percentage of users who were successful (or percentage goal achievement)
for every unit of time. This formula shows that as the time on task
increases, one would expect users to be more successful. A very
efficient product has a high percentage of successful users in a small
amount of time. This allows customers to compare fast error-prone
interfaces (e.g., command lines with wildcards to delete files) to slow
easy interfaces (e.g., using a mouse and keyboard to drag each file to
the trash). Note: Effectiveness and efficiency results must be reported,
even when they are difficult to interpret within the specified context
of use. In this case, the report must specify why the supplier does not
consider the metrics meaningful. For example, suppose that the context
of use for the product includes real time, open-ended interaction
between close associates. In this case, Time-On-Task may not be
meaningfully interpreted as a measure of efficiency, because for many
users, time spent on this task is “time well spent”.
Satisfaction
Satisfaction describes a user’s subjective response when using the
product. User satisfaction may be an important correlate of motivation
to use a product and may affect performance in some cases.
Questionnaires to measure satisfaction and associated attitudes are
commonly built using Likert and semantic differential scales. A variety
of instruments are available for measuring user satisfaction of software
interactive products, and many companies create their own. Whether an
external, standardized instrument is used or a customized instrument is
created, it is suggested that subjective rating dimensions such as
Satisfaction, Usefulness, and Ease of Use be considered for inclusion,
as these will be of general interest to customer organizations. A number
of questionnaires are available that are widely used. They include: ASQ
[5], CUSI [6], PSSUQ [6], QUIS [3], SUMI [4], and SUS [7]). While each
offers unique perspectives on subjective measures of product usability,
most include measurements of Satisfaction, Usefulness, and Ease of Use.
Suppliers may choose to use validated published satisfaction measures or
may submit satisfaction metrics they have developed themselves. Results
This is the second major technical section of the report. It includes a
description of how the data were scored, reduced, and analyzed. It
provides the major findings in quantitative formats.
Results
Data Scoring
The method by which the data collected were scored should be
described in sufficient detail to allow replication of the data scoring
methods by another organization if the test is repeated. Particular
items that should be addressed include the exclusion of outliers,
categorization of error data, and criteria for scoring assisted or
unassisted completion.
Data Reduction
The method by which the data were reduced should be described in
sufficient detail to allow replication of the data reduction methods by
another organization if the test is repeated. Particular items that
should be addressed include how data were collapsed across tasks or task
categories.
Statistical Analysis
The method by which the data were analyzed should be described in
sufficient detail to allow replication of the data analysis methods by
another organization if the test is repeated. Particular items that
should be addressed include statistical procedures (e.g. transformation
of the data) and tests (e.g. t-tests, F tests and statistical
significance of differences between groups). Scores that are reported as
means must include the standard deviation and optionally the standard
error of the mean. Presentation of the Results ? Effectiveness,
Efficiency and Satisfaction results must always be reported. Both
tabular and graphical presentations of results should be included.
Various graphical formats are effective in describing usability data at
a glance. Examples are included in the Sample Test Report in Appendix C.
Bar graphs are useful for describing subjective data such as that
gleaned from Likert scales. A variety of plots can be used effectively
to show comparisons of expert benchmark times for a product vs. the mean
participant performance time. The data may be accompanied by a brief
explanation of the results but detailed interpretation is discouraged.
Performance Results
It is recommended that efficiency and effectiveness results be
tabulated across participants on a per unit task basis. A table of
results may be presented for groups of related tasks (e.g. all program
creation tasks in one group, all debugging tasks in another group) where
this is more efficient and makes sense. If a unit task has sub-tasks,
then the sub-tasks may be reported in summary form for the unit task.
For example, if a unit task is to identify all the misspelled words on a
page, then the results may be summarized as a percent of misspellings
found. Finally, a summary table showing total mean task times and
completion rates across all tasks should be presented. Testers should
report additional tables of metrics if they are relevant to the
product’s design and a particular application area.
Task A
User #
|
Unassisted Task Effectiveness (%
Complete)
|
Assisted Task Effectiveness (%
Complete)
|
Task Time (min)
|
Effectiveness / Mean Time-on-Task
|
Errors
|
Assists
|
1
|
|
|
|
|
|
|
2
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
Mean
|
|
|
|
|
|
|
Std Dev
|
|
|
|
|
|
|
Min
|
|
|
|
|
|
|
Max
|
|
|
|
|
|
|
Satisfaction Results
On some measurement scale.
User #
|
Scale 1
|
Scale 2
|
Scale 3
|
...
|
Scale N
|
1
|
|
|
|
|
|
2
|
|
|
|
|
|
..
|
|
|
|
|
|
Mean
|
|
|
|
|
|
Std Dev
|
|
|
|
|
|
Min
|
|
|
|
|
|
Max
|
|
|
|
|
|
Appendices
Custom questionnaires, Participant General Instructions and
Participant Task Instructions are appropriately submitted as appendices.
Release Notes, which would include any information the supplier would
like to include since the test was run that might explain or update the
test results (e.g. if the UI design has been fixed since the test),
should be placed in a separate appendix.
References
1. American Psychological Association. Ethical Principles in
the Conduct of Research with Human Participants. 1982.