Documentation for fxl

Visualizing Procedural Fidelity with the fxl R Package

Written by Shawn P. Gilroy (Last Updated: 2024-12-20)Procedural Fidelity Visual Analysis

Clinical and applied research typically discuss the consistency and accuracy of implementation. This is often referred to as procedural fidelity, treatment integrity, or treatment adherence. In single-case research, procedural fidelity is essential for interpreting the effects of an intervention (e.g., positive outcomes despite inconsistent implementation draw into question the causal link between treatment and outcomes).

Procedural fidelity is often assessed by recording and summarizing the degree to which a treatment or protocol was implemented as designed. In most cases, this is driven by a list or collection of steps deemed critical and/or relevant to the independent variable (e.g., the core components of an intervention). In practice, this takes the form of some observer(s) recording the presence or absence of specific steps or components of the intervention for each session/measurement in the study.

Once sufficient data are available, these can be inspected to assess whether there are any noteworthy threats to the internal validity of the study.

Methods for Aggregating Integrity Data

Although most aspects of SCED research emphasize individual-level variability, reports of procedural fidelity are often reported as percentages. For example, this may consist of the total percentage of steps implemented correctly across all participants, or the percentage of steps implemented correctly within each phase of the study. Various options exist and each vary in terms of their sensitivity to identifying systematic deviations from the intended research strategy.

To illustrate these differential sensitivities, hypothetical data was prepared to show how each approach to reporting can influence judgement of internal validity. A visual of this hypothetical data is provided below.

# Note: Value = "TRUE" indicates correct implementation, "FALSE" indicates incorrect implementation
# Note: Step = #1 step on a hypothetical checklist
head(csv_data)
##   Participant Session    Condition Step Value
## 1           1       1     Baseline    1  TRUE
## 2           1       2     Baseline    1  TRUE
## 3           1       3     Baseline    1  TRUE
## 4           1       4     Baseline    1  TRUE
## 5           1       5 Intervention    1  TRUE
## 6           1       6 Intervention    1  TRUE

Participant

It is often the case in single-case experimental design research that procedural fidelity is broken down at the individual level. This is generally recommended practice as it suggests that the treatment was implemented as intended for each participant.

The table illustrated below provides an example of how overall percentages of procedural fidelity can be calculated and reported at the participant-level.

csv_data %>%
  group_by(Participant) %>%
  summarise(Count = n(),
            Sum = sum(Value == "TRUE")) %>%
  ungroup() %>%
  mutate(`Percent Correct Implementation` = round((Sum / Count) * 100, 2)) %>%
  kable(format = "markdown")

Participant	Count	Sum	Percent Correct Implementation
1	135	128	94.81
2	135	124	91.85
3	135	123	91.11

This summary provides a high-level overview of the degree to which, on the whole, there was more evidence suggesting correct implementation than incorrect implementation.

Participant + Condition

It is often the case that reviewers of research will want information regarding the degree to which a treatment was implemented as intended across conditions (e.g., Baseline and Intervention). This is particularly relevant in studies where the treatment is expected to have a differential effect across conditions, as the contrast is only meaningful if it facilitates a clear comparison between detect with and without the influence of the independent variable.

The table below provides an example of how overall percentages of procedural fidelity can be calculated and reported at the participant-level as well as across conditions.

csv_data %>%
  group_by(Participant, Condition) %>%
  summarise(Count = n(),
            Sum = sum(Value == "TRUE")) %>%
  ungroup() %>%
  mutate(`Percent Correct Implementation` = round((Sum / Count) * 100, 2)) %>%
  kable(format = "markdown")

Participant	Condition	Count	Sum	Percent Correct Implementation
1	Baseline	20	19	95.00
1	Intervention	115	109	94.78
2	Baseline	40	38	95.00
2	Intervention	95	86	90.53
3	Baseline	60	57	95.00
3	Intervention	75	66	88.00

Participant + Condition + Item

Although aggregate percentages across participants and conditions are useful for communicating the overall degree to which a treatment or protocol was implemented as designed, there are certain assumptions made when interpreting data.

First, by focusing on aggregates, this assumes that errors are random (and equally distributed across items). That is not always the case and becomes increasingly unlikely when certain items are more effortful or complex than others.

Second, another assumption is that high accuracy (i.e., 80+ Percent) means that systematic deviations in the data are unlikely. For example, a high percentage of correct implementation may suggest that most steps were implemented as designed, but there may be a single (potentially critical) element that was not implemented correctly. For complex interventions (e.g., 10+ items are tracked), this can be problematic as trends in errors may be washed out by correct implementation of simpler, less critical, and more numerous items.

The table below breaks down the hypothetical data further by individual items, which reveals systematic differences in correct implementation by item.

csv_data %>%
  group_by(Step, Condition) %>%
  summarise(Count = n(),
            Sum = sum(Value == "TRUE")) %>%
  ungroup() %>%
  mutate(`Percent Correct Implementation` = round((Sum / Count) * 100, 2)) %>%
  kable(format = "markdown")

Step	Condition	Count	Sum	Percent Correct Implementation
1	Baseline	24	24	100.00
1	Intervention	57	57	100.00
2	Baseline	24	24	100.00
2	Intervention	57	55	96.49
3	Baseline	24	24	100.00
3	Intervention	57	57	100.00
4	Baseline	24	18	75.00
4	Intervention	57	35	61.40
5	Baseline	24	24	100.00
5	Intervention	57	57	100.00

An inspection of the table above reveals that, although the overall percentage across participants and conditions may be sufficient (i.e., >80%), this may obscure significant deviations from the intended research strategy. The table above reveals significant issues with Step #4 that persists across both baseline and intervention conditions.

Furthermore, it is even possible systematic differences may exist as a function of time, which could also introduce uncertainty regarding the effects/non-effects associated with the independent variable (e.g., errors committed earlier on are likely to be more impactful than those later on).

A Process for Visualizing Procedural Fidelity

Many of the challenges associated with the aggregation of data are due to certain assumptions that may not be true when only the mean, standard deviation, and range are provided. This introduces a challenge for researchers who need to understand the degree to which a treatment was implemented as designed to draw valid inferences regarding outcomes.

In recent years, some have proposed visual approaches for visualizing error patterns for research participants (e.g., Mitteer & Greer, 2022); however, this also has utility for visualizing item-level data related to implementation.

Adopting a ‘heatmap’ style approach to binary data (i.e., correct vs. incorrect), we can visualize the degree to which each item was implemented correctly across participants, conditions, and items throughout the course of a study.

An example of this, drawn using the fxl R package, is illustrated in the space below.

# Note: Styles are applied using the Correct/Incorrect Value applied to the Phase argument. This allows for flexible styling
scr_plot(csv_data, aesthetics = var_map(x = Session,
                                        y = Step,
                                        p = Value,
                                        facet = Participant),
        family = "Times New Roman",
        mai = c(0.375,
                1.5,
                0,
                0.5),
        omi = c(0.25,
                0,
                0.25,
                0.25)) |>
  scr_yoverride(
    list(
      "1" = list(
        y0 = 0.5,
        y1 = 5.25,
        yticks = c(1, 2, 3, 4, 5)
      ),
      "2" = list(
        y0 = 0.5,
        y1 = 5.25,
        yticks = c(1, 2, 3, 4, 5)
      ),
      "3" = list(
        y0 = 0.5,
        y1 = 5.25,
        yticks = c(1, 2, 3, 4, 5)
      )
    ),
    ytickslabs = c(
          "Step #1 Hypothetical Item",
          "Step #2 Hypothetical Item",
          "Step #3 Hypothetical Item",
          "Step #4 Hypothetical Item",
          "Step #5 Hypothetical Item"
        )
  ) |>
  scr_ylabel("") |>
  scr_label_phase(
    facet = "1",
    cex = 1.5,
    adj = 0.5,
    x = 1,
    y = 5.625,
    labels = list(
      "Baseline" = list(
        x = 2.5
      ),
      "Intervention" = list(
        x = 15
      )
    )
  ) |>
  scr_xoverride(c(0.5, 27),
                xticks = 1:27,
                xtickslabs = as.character(1:27)) |>
  # Note: The 'x' icon is slightly smaller for convenience
  scr_points(cex = list(
               "TRUE" = 3,
               "FALSE" = 2
             ),
             # Note: an 'x' is used to depict 'Incorrect'
             pch = list(
               "TRUE" = 22,
               "FALSE" = 4
             ),
             # Note: the 'Correct' marker is made subtle intentionally
             fill = list("TRUE" = "white",
                         "FALSE" = "gray"),
             color = list("TRUE" = "gray",
                          "FALSE" = "black")) |>
  scr_plines_mbd(
    lines = list(
      "A" = list(
        "1" = list(
          x1 = 4.5,
          y1 = 7,
          y2 = 0.5
        ),
        "2" = list(
          x1 = 8.5,
          y1 = 5,
          y2 = 0.5
        ),
        "3" = list(
          x1 = 12.5,
          y1 = 5,
          y2 = 0.5
        )
      )
    )
  ) |>
  scr_legend(
    panel = "1",
    position = list(
      x = 22,
      y = 6
    ),
    legend = c(
      "Incorrect  ",
      "Correct  "
    ),
    col = c(
      "black",
      "gray"
    ),
    pt_bg = c(
      "gray",
      "white"
    ),
    lty = 0,
    pch = c(
      4,
      22
    ),
    bty = "n",
    pt_cex = c(
      2,
      3
    ),
    cex = 1.25,
    text_col = "black",
    horiz = TRUE,
    box_lty = 1
  )

Summary and Overview

The heatmap visualization provides a straightforward supplement to the visual analysis of data. This approach provides an easy means of communicating procedural integrity in a way that does not rest on assumptions made with aggregates. Said more directly, systematic patterns can be recognized visually without the need to infer such threats from standard deviations and range statistics. This benefit is one that is typically realized when avoiding the application of statistics but less so for information related to implementation.

The fxl R package provides a simple means of generating these visualizations, which can be easily incorporated into reports and manuscripts. The package is designed to be flexible and can be used to generate visualizations for a variety of research designs and data structures. For example, heatmaps can be constructed using data structures that are not dichotomous (e.g., Incorrect, Partially Correct, Correct) for even more detailed summaries of fidelity data

References:

Mitteer, D.R., Greer, B.D. (2022). Using GraphPad Prism’s Heat Maps for Efficient, Fine-Grained Analyses of Single-Case Data. Behavior Analysis in Practice, 15, 505–514. doi: https://doi.org/10.1007/s40617-021-00664-7