Filtered by author: Lauren Lynch Service Center Clear Filter

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

When randomized control trials are not possible, quasi-experimental methods like Regression Discontinuity and Difference-in-Difference (DiD) often represent the best alternatives for high quality evaluation. Researchers using such methods frequently conduct exhaustive robustness checks to make sure the assumptions of the model are met, and that results aren’t sensitive to specific choices made in the analysis process. However, often there is less thought applied to how the outcomes for many quasi-experimental studies are created. For example, in studies that rely on survey data, scores may be created by adding up the item responses to produce total scores, or achievement tests may rely on scores produced by test vendors. In this study, several item response theory (IRT) models specific to the DiD design are presented to see if they improve on simpler scoring approaches in terms of the bias and statistical significance of impact estimates.

Why might using a simple scoring approach do harm in the quasi-experimental/DiD context?

While most researchers are aware that measurement error can impact the precision of treatment effect estimates, they may be less aware that measurement model misspecification can introduce bias into scores and, thereby, treatment effect estimates. Total/sum scores do not technically involve a measurement model, and therefore may seem almost free of assumptions. But in fact, they resemble a constrained measurement model that oftentimes makes unsupported assumptions, including that all items should be given the same weight when producing a score. For instance, on a depression survey, total scores would assume that items asking about trouble sleeping and self-harm should get the same weight in the score. Giving all items the same weight can bias scores. For example, if patterns of responses differ between treated and control groups, faulty total score assumptions could bias treatment effect estimates and mute variability in the outcome researchers wish to quantify.

What decisions involved in more sophisticated scoring approaches impact treatment estimates?

Read More

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Multi-site randomized controlled trials (RCTs) produce rigorous evidence on whether educational interventions “work.” However, principals and superintendents need evidence that applies to their students and schools. This paper examines whether the average impact of an intervention in a particular site—school or district—can be accurately predicted using evidence from a multi-site RCT.

What Methods Did the Study Use to Predict Impacts?

This paper used three methods to predict the average impact in individual sites: (1) the average of the impact estimates in the other sites, (2) lasso regression, and (3) Bayesian Additive Regression Trees (BART). Lasso and BART used a variety of moderators as predictors, including characteristics of participating students, participating schools, the intervention as implemented, and the counterfactual condition.  

How Was the Accuracy of These Predictions Gauged?

Read More

Supporting Teachers in Argument Writing Instruction at Scale: A Replication Study of the College, Career, and Community Writers Program (C3WP)

This large-scale randomized experiment found that the National Writing Project’s (NWP’s) College, Career, and Community Writers Program (C3WP) improved secondary students’ ability to write arguments drawing from nonfiction texts.

What impacts did C3WP have on student achievement?

The study team collected and scored student writing from an on-demand argument writing task similar to those in some state assessments. At the end of the year, students in C3WP districts outscored students in comparison districts by about 0.24 on a 1- to 6-point scale on each of the four measured attributes (see graph). On average, these effects are equivalent to moving a student from the 50th percentile of achievement to the 58th percentile of achievement.

Read More

Which Students Benefit from Computer-Based Individualized Instruction? Experimental Evidence from Public Schools in India

Does computer-based individualized instruction boost math learning?

 Yes. In public schools in Rajasthan, India, students who scored in the bottom 25% of their class improved by 22% of a standard deviation in math test scores (top chart). However, the average student in grades 6-8 who had access to individualized instruction did not outperform those who did not over nine months. Our results suggest that computer-based individualized instruction is most beneficial for low performers.

What is computer-based individualized instruction?

 We provided all students with a computer-adaptive math learning software called “Mindspark.” When students first log in, they take a diagnostic test, which identifies what they know and can do, and the areas in which they can improve. Then, the software presents them with exercises appropriate for their preparation level based on the diagnostic test. The difficulty and topic covered by subsequent exercises dynamically adjust to each student’s progress.

Read More

Effect of Active Learning Professional Development Training on College Student Outcomes

Is there an effect of participating in Active Learning Professional Development (ALPD) training on student performance?

Students who took a course with an ALPD instructor were three percentage points more likely to take additional classes in the same subject area compared to students who were taught by non-participant. Non-participants persisted at a rate of about 68%, so a three percentage point increase represents a 5% improvement. Importantly, ALPD training is related to higher likelihood of implementing active learning instructional practices in the classroom. We do not find any differences in students’ current course grade or performance in the next class.

 

How to read this chart: This figure shows that students who took a course with an ALPD trained instructor were three percentage points more likely to take another course in the same field of study in the immediate next term (p<0.05). No clear difference in course grades was evident either in the ALPD-instructed course, or in the next course taken.

Read More

Experimental Design and Statistical Power for Cluster Randomized Cost-Effectiveness Trials

Cluster randomized trials (CRTs) are commonly used to evaluate educational effectiveness. Recently there has been greater emphasis on using these trials to explore cost-effectiveness. However, methods for establishing the power of cluster randomized cost-effectiveness trials (CRCETs) are limited. This study developed power computation formulas and statistical software to help researchers design two- and three-level CRCETs.

Why are cost-effectiveness analysis and statistical power for CRCETs important?

Policymakers and administrators commonly strive to identify interventions that have maximal effectiveness for a given budget or aim to achieve a target improvement in effectiveness at the lowest possible cost (Levin et al., 2017). Evaluations without a credible cost analysis can lead to misleading judgments regarding the relative benefits of alternative strategies for achieving a particular goal. CRCETs link the cost of implementing an intervention to its effect and thus help researchers and policymakers adjudicate the degree to which an intervention is cost-effective. One key consideration when designing CRCETs is statistical power analysis. It allows researchers to determine the conditions needed to guarantee a strong chance (e.g., power > 0.80) of correctly detecting whether an intervention is cost-effective.

How to compute statistical power when designing CRCETs?

Read More

The Impact of a Virtual Coaching Program to Improve Instructional Alignment to State Standards

What is the virtual coaching program tested in this study?

Feedback on Alignment and Support for Teachers (FAST) is a virtual coaching program designed to help teachers better align their instruction to state standards and foster student learning. Key components of this 2-year program include collaborative meetings with grade-level teams, individual coaching sessions, instructional logs and video recordings of teachers’ own instruction, and models of aligned instruction provided by an online library of instructional resources. During the collaborative meetings and coaching sessions, teachers and coaches use the logs, video recordings, and models of aligned instruction to discuss ways of improving alignment of their instruction to state standards. Teachers were expected to complete 5 collaborative meetings, 5 individual coaching sessions, 5 video recordings of their instruction, and 5 instructional logs per year.

 How did we assess the impact of the virtual coaching program?

 We assessed the impact of the FAST program on teachers’ instructional alignment and students’ achievement through a multisite school-level randomized controlled trial, which took place in 56 elementary schools spanning five districts and three states. We randomly assigned 29 of the 56 schools to the treatment group and 27 to the control group. The study focused on Grade 4 math and Grade 5 English language arts (ELA) and used the respective state test scores as student achievement outcomes. We used an instructional survey to measure teachers’ instructional alignment. Teacher attendance, FAST coaching logs, teachers’ instructional logs, and video recordings of teachers’ instruction were collected to describe the implementation of the FAST program.

 What did we find?

Read More