Filtered by tag: Causal Inference Remove Filter

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

When randomized control trials are not possible, quasi-experimental methods like Regression Discontinuity and Difference-in-Difference (DiD) often represent the best alternatives for high quality evaluation. Researchers using such methods frequently conduct exhaustive robustness checks to make sure the assumptions of the model are met, and that results aren’t sensitive to specific choices made in the analysis process. However, often there is less thought applied to how the outcomes for many quasi-experimental studies are created. For example, in studies that rely on survey data, scores may be created by adding up the item responses to produce total scores, or achievement tests may rely on scores produced by test vendors. In this study, several item response theory (IRT) models specific to the DiD design are presented to see if they improve on simpler scoring approaches in terms of the bias and statistical significance of impact estimates.

Why might using a simple scoring approach do harm in the quasi-experimental/DiD context?

While most researchers are aware that measurement error can impact the precision of treatment effect estimates, they may be less aware that measurement model misspecification can introduce bias into scores and, thereby, treatment effect estimates. Total/sum scores do not technically involve a measurement model, and therefore may seem almost free of assumptions. But in fact, they resemble a constrained measurement model that oftentimes makes unsupported assumptions, including that all items should be given the same weight when producing a score. For instance, on a depression survey, total scores would assume that items asking about trouble sleeping and self-harm should get the same weight in the score. Giving all items the same weight can bias scores. For example, if patterns of responses differ between treated and control groups, faulty total score assumptions could bias treatment effect estimates and mute variability in the outcome researchers wish to quantify.

What decisions involved in more sophisticated scoring approaches impact treatment estimates?

Read More

The Impact of a Virtual Coaching Program to Improve Instructional Alignment to State Standards

What is the virtual coaching program tested in this study?

Feedback on Alignment and Support for Teachers (FAST) is a virtual coaching program designed to help teachers better align their instruction to state standards and foster student learning. Key components of this 2-year program include collaborative meetings with grade-level teams, individual coaching sessions, instructional logs and video recordings of teachers’ own instruction, and models of aligned instruction provided by an online library of instructional resources. During the collaborative meetings and coaching sessions, teachers and coaches use the logs, video recordings, and models of aligned instruction to discuss ways of improving alignment of their instruction to state standards. Teachers were expected to complete 5 collaborative meetings, 5 individual coaching sessions, 5 video recordings of their instruction, and 5 instructional logs per year.

 How did we assess the impact of the virtual coaching program?

 We assessed the impact of the FAST program on teachers’ instructional alignment and students’ achievement through a multisite school-level randomized controlled trial, which took place in 56 elementary schools spanning five districts and three states. We randomly assigned 29 of the 56 schools to the treatment group and 27 to the control group. The study focused on Grade 4 math and Grade 5 English language arts (ELA) and used the respective state test scores as student achievement outcomes. We used an instructional survey to measure teachers’ instructional alignment. Teacher attendance, FAST coaching logs, teachers’ instructional logs, and video recordings of teachers’ instruction were collected to describe the implementation of the FAST program.

 What did we find?

Read More

A recipe for disappointment: policy, effect size and the winner’s curse

Adrian Simpson

PDF Version

Effect size and policy

Standardized effect size estimates are commonly used by the ‘evidence-based education’ community as a key metric for judging relative importance, effectiveness, or practical significance of interventions across a set of studies: larger effect sizes indicate more effective interventions. However, this argument applies rarely; only when linearly equatable outcomes, identical comparison treatments and equally representative samples are used in every study.

Read More

Mitigating Illusory Results through Preregistration in Education

Summary by: Claire Chuter

PDF Version

Good researchers thoroughly analyze their data, right? Practices like testing the right covariates, running your analyses in multiple ways to find the best fitting model, screening for outliers, and testing for mediation or moderation effects are indeed important practices… but with a massive caveat. The aggregation of many of these rigorous research practices (as well as some more dubious ones) can lead to what the authors call “illusory results” – results that seem real but are unlikely to be reproduced. In other words, implementation of these common practices (see Figure 1 in the article), often leads researchers to run multiple analytic tests which may unwittingly inflate their chances of stumbling upon a significant finding by chance.

Potential Solutions

Read More

Partially Identified Treatment Effects for Generalizability

Wendy Chan

PDF Version

Will this intervention work for me?

This is one of the questions that make up the core of generalization research. Generalizations focus on the extent to which the findings of a study apply to people in a different context, in a different time period, or in a different study altogether. In education, one common type of generalization involves examining whether the results of an experiment (e.g., the estimated effect of an intervention) apply to a larger group of people, or a population.

Read More

Bounding, an accessible method for estimating principal causal effects, examined and explained

Luke Miratrix, Jane Furey, Avi Feller, Todd Grindal, and Lindsay Page

PDF Version

Estimating program effects for subgroups is hard. Estimating effects for types of people who exist in theory, but whom we can’t always identify in practice (i.e., latent subgroups) is harder. These challenges arise often, with noncompliance being a primary example. Another is estimating effects on groups defined by “counterfactual experience,” i.e., by what opportunities would have been available absent treatment access. This paper tackles this difficult problem. We find that if one can predict, with some accuracy, latent subgroup membership, then bounding is a nice evaluation approach, relying on weak assumptions. This is in contrast to many alternatives that are tricky, often unstable, and/or rely on heroic assumptions.

What are latent subgroups again?

Read More

The Implications of Teacher Selection and the Teacher Effect in Individually Randomized Group Treatment Trials

Michael Weiss

PDF Version

Beware! Teacher effects could mess up your individually randomized trial! Or such is the message of this paper focusing on what happens if you have individual randomization, but teachers are not randomly assigned to experimental groups.

The key idea is that if your experimental groups are systematically different in teacher quality, you will be estimating a combined impact of getting a good/bad teacher on top of the impact of your intervention.

Read More