Documentation for fxl
Visualizing Procedural Fidelity with the fxl R Package
Written by Shawn P. Gilroy (Last Updated: 2024-12-20)Procedural FidelityVisual Analysis
Clinical and applied research typically discuss the consistency and accuracy of implementation. This is often referred to as procedural fidelity, treatment integrity, or treatment adherence. In single-case research, procedural fidelity is essential for interpreting the effects of an intervention (e.g., positive outcomes despite inconsistent implementation draw into question the causal link between treatment and outcomes).
Procedural fidelity is often assessed by recording and summarizing the degree to which a treatment or protocol was implemented as designed. In most cases, this is driven by a list or collection of steps deemed critical and/or relevant to the independent variable (e.g., the core components of an intervention). In practice, this takes the form of some observer(s) recording the presence or absence of specific steps or components of the intervention for each session/measurement in the study.
Once sufficient data are available, these can be inspected to assess whether there are any noteworthy threats to the internal validity of the study.
Methods for Aggregating Integrity Data
Although most aspects of SCED research emphasize individual-level variability, reports of procedural fidelity are often reported as percentages. For example, this may consist of the total percentage of steps implemented correctly across all participants, or the percentage of steps implemented correctly within each phase of the study. Various options exist and each vary in terms of their sensitivity to identifying systematic deviations from the intended research strategy.
To illustrate these differential sensitivities, hypothetical data was prepared to show how each approach to reporting can influence judgement of internal validity. A visual of this hypothetical data is provided below.
# Note: Value = "TRUE" indicates correct implementation, "FALSE" indicates incorrect implementation
# Note: Step = #1 step on a hypothetical checklist
head(csv_data)
## Participant Session Condition Step Value
## 1 1 1 Baseline 1 TRUE
## 2 1 2 Baseline 1 TRUE
## 3 1 3 Baseline 1 TRUE
## 4 1 4 Baseline 1 TRUE
## 5 1 5 Intervention 1 TRUE
## 6 1 6 Intervention 1 TRUE
Participant
It is often the case in single-case experimental design research that procedural fidelity is broken down at the individual level. This is generally recommended practice as it suggests that the treatment was implemented as intended for each participant.
The table illustrated below provides an example of how overall percentages of procedural fidelity can be calculated and reported at the participant-level.
csv_data %>%
group_by(Participant) %>%
summarise(Count = n(),
Sum = sum(Value == "TRUE")) %>%
ungroup() %>%
mutate(`Percent Correct Implementation` = round((Sum / Count) * 100, 2)) %>%
kable(format = "markdown")
Participant | Count | Sum | Percent Correct Implementation |
---|---|---|---|
1 | 135 | 128 | 94.81 |
2 | 135 | 124 | 91.85 |
3 | 135 | 123 | 91.11 |
This summary provides a high-level overview of the degree to which, on the whole, there was more evidence suggesting correct implementation than incorrect implementation.
Participant + Condition
It is often the case that reviewers of research will want information regarding the degree to which a treatment was implemented as intended across conditions (e.g., Baseline and Intervention). This is particularly relevant in studies where the treatment is expected to have a differential effect across conditions, as the contrast is only meaningful if it facilitates a clear comparison between detect with and without the influence of the independent variable.
The table below provides an example of how overall percentages of procedural fidelity can be calculated and reported at the participant-level as well as across conditions.
csv_data %>%
group_by(Participant, Condition) %>%
summarise(Count = n(),
Sum = sum(Value == "TRUE")) %>%
ungroup() %>%
mutate(`Percent Correct Implementation` = round((Sum / Count) * 100, 2)) %>%
kable(format = "markdown")
Participant | Condition | Count | Sum | Percent Correct Implementation |
---|---|---|---|---|
1 | Baseline | 20 | 19 | 95.00 |
1 | Intervention | 115 | 109 | 94.78 |
2 | Baseline | 40 | 38 | 95.00 |
2 | Intervention | 95 | 86 | 90.53 |
3 | Baseline | 60 | 57 | 95.00 |
3 | Intervention | 75 | 66 | 88.00 |
Participant + Condition + Item
Although aggregate percentages across participants and conditions are useful for communicating the overall degree to which a treatment or protocol was implemented as designed, there are certain assumptions made when interpreting data.
First, by focusing on aggregates, this assumes that errors are random (and equally distributed across items). That is not always the case and becomes increasingly unlikely when certain items are more effortful or complex than others.
Second, another assumption is that high accuracy (i.e., 80+ Percent) means that systematic deviations in the data are unlikely. For example, a high percentage of correct implementation may suggest that most steps were implemented as designed, but there may be a single (potentially critical) element that was not implemented correctly. For complex interventions (e.g., 10+ items are tracked), this can be problematic as trends in errors may be washed out by correct implementation of simpler, less critical, and more numerous items.
The table below breaks down the hypothetical data further by individual items, which reveals systematic differences in correct implementation by item.
csv_data %>%
group_by(Step, Condition) %>%
summarise(Count = n(),
Sum = sum(Value == "TRUE")) %>%
ungroup() %>%
mutate(`Percent Correct Implementation` = round((Sum / Count) * 100, 2)) %>%
kable(format = "markdown")
Step | Condition | Count | Sum | Percent Correct Implementation |
---|---|---|---|---|
1 | Baseline | 24 | 24 | 100.00 |
1 | Intervention | 57 | 57 | 100.00 |
2 | Baseline | 24 | 24 | 100.00 |
2 | Intervention | 57 | 55 | 96.49 |
3 | Baseline | 24 | 24 | 100.00 |
3 | Intervention | 57 | 57 | 100.00 |
4 | Baseline | 24 | 18 | 75.00 |
4 | Intervention | 57 | 35 | 61.40 |
5 | Baseline | 24 | 24 | 100.00 |
5 | Intervention | 57 | 57 | 100.00 |
An inspection of the table above reveals that, although the overall percentage across participants and conditions may be sufficient (i.e., >80%), this may obscure significant deviations from the intended research strategy. The table above reveals significant issues with Step #4 that persists across both baseline and intervention conditions.
Furthermore, it is even possible systematic differences may exist as a function of time, which could also introduce uncertainty regarding the effects/non-effects associated with the independent variable (e.g., errors committed earlier on are likely to be more impactful than those later on).
A Process for Visualizing Procedural Fidelity
Many of the challenges associated with the aggregation of data are due to certain assumptions that may not be true when only the mean, standard deviation, and range are provided. This introduces a challenge for researchers who need to understand the degree to which a treatment was implemented as designed to draw valid inferences regarding outcomes.
In recent years, some have proposed visual approaches for visualizing error patterns for research participants (e.g., Mitteer & Greer, 2022); however, this also has utility for visualizing item-level data related to implementation.
Adopting a ‘heatmap’ style approach to binary data (i.e., correct vs. incorrect), we can visualize the degree to which each item was implemented correctly across participants, conditions, and items throughout the course of a study.
An example of this, drawn using the fxl R package, is illustrated in the space below.
# Note: Styles are applied using the Correct/Incorrect Value applied to the Phase argument. This allows for flexible styling
scr_plot(csv_data, aesthetics = var_map(x = Session,
y = Step,
p = Value,
facet = Participant),
family = "Times New Roman",
mai = c(0.375,
1.5,
0,
0.5),
omi = c(0.25,
0,
0.25,
0.25)) |>
scr_yoverride(
list(
"1" = list(
y0 = 0.5,
y1 = 5.25,
yticks = c(1, 2, 3, 4, 5)
),
"2" = list(
y0 = 0.5,
y1 = 5.25,
yticks = c(1, 2, 3, 4, 5)
),
"3" = list(
y0 = 0.5,
y1 = 5.25,
yticks = c(1, 2, 3, 4, 5)
)
),
ytickslabs = c(
"Step #1 Hypothetical Item",
"Step #2 Hypothetical Item",
"Step #3 Hypothetical Item",
"Step #4 Hypothetical Item",
"Step #5 Hypothetical Item"
)
) |>
scr_ylabel("") |>
scr_label_phase(
facet = "1",
cex = 1.5,
adj = 0.5,
x = 1,
y = 5.625,
labels = list(
"Baseline" = list(
x = 2.5
),
"Intervention" = list(
x = 15
)
)
) |>
scr_xoverride(c(0.5, 27),
xticks = 1:27,
xtickslabs = as.character(1:27)) |>
# Note: The 'x' icon is slightly smaller for convenience
scr_points(cex = list(
"TRUE" = 3,
"FALSE" = 2
),
# Note: an 'x' is used to depict 'Incorrect'
pch = list(
"TRUE" = 22,
"FALSE" = 4
),
# Note: the 'Correct' marker is made subtle intentionally
fill = list("TRUE" = "white",
"FALSE" = "gray"),
color = list("TRUE" = "gray",
"FALSE" = "black")) |>
scr_plines_mbd(
lines = list(
"A" = list(
"1" = list(
x1 = 4.5,
y1 = 7,
y2 = 0.5
),
"2" = list(
x1 = 8.5,
y1 = 5,
y2 = 0.5
),
"3" = list(
x1 = 12.5,
y1 = 5,
y2 = 0.5
)
)
)
) |>
scr_legend(
panel = "1",
position = list(
x = 22,
y = 6
),
legend = c(
"Incorrect ",
"Correct "
),
col = c(
"black",
"gray"
),
pt_bg = c(
"gray",
"white"
),
lty = 0,
pch = c(
4,
22
),
bty = "n",
pt_cex = c(
2,
3
),
cex = 1.25,
text_col = "black",
horiz = TRUE,
box_lty = 1
)
Summary and Overview
The heatmap visualization provides a straightforward supplement to the visual analysis of data. This approach provides an easy means of communicating procedural integrity in a way that does not rest on assumptions made with aggregates. Said more directly, systematic patterns can be recognized visually without the need to infer such threats from standard deviations and range statistics. This benefit is one that is typically realized when avoiding the application of statistics but less so for information related to implementation.
The fxl R package provides a simple means of generating these visualizations, which can be easily incorporated into reports and manuscripts. The package is designed to be flexible and can be used to generate visualizations for a variety of research designs and data structures. For example, heatmaps can be constructed using data structures that are not dichotomous (e.g., Incorrect, Partially Correct, Correct) for even more detailed summaries of fidelity data
References:
Mitteer, D.R., Greer, B.D. (2022). Using GraphPad Prism’s Heat Maps for Efficient, Fine-Grained Analyses of Single-Case Data. Behavior Analysis in Practice, 15, 505–514. doi: https://doi.org/10.1007/s40617-021-00664-7