Filtered by author: Shaun Dougherty Clear Filter

Estimating Treatment Effects with the Explanatory Item Response Model

How Much do Scoring Methods Matter for Causal Inference?

The manner in which student outcomes are measured, scored, and analyzed often receives too little attention in randomized experiments. In this study, we aimed to explore the consequences of different scoring approaches for causal inference on test score data. We compared the performance of four methods, Classical Test Theory (CTT) sum scores, CTT mean scores, item response theory (IRT) scores, and the Explanatory Item Response Model (EIRM). In contrast to the CTT- and IRT-based approaches that score the test and estimate treatment effects in two separate steps, the EIRM is a latent variable model that allows for simultaneous estimation of student ability and the treatment effect. The EIRM has a long history in psychometric research, but applications to empirical causal inference settings are rare. Our results show that which model performs best depends on the context.



How to Read this Chart: Statistical power (y-axis) by missing item response rate (x-axis) and estimation method (color and shape) shows that the relative performance of each approach depends on the context. The EIRM and IRT-based scores are more robust to missing data and provide the most benefits to power when the latent trait is heteroskedastic. Legend: skew = latent trait is skewed, het = latent trait is heteroskedastic, mar = item responses are missing at random, sum = sum score, mean = mean score, 1PL = IRT theta score, EIRM = explanatory item response model.

Comparative Model Performance

Read More

Statistical Power Analysis for Univariate Meta-Analysis: A Three-Level Model

Statistical power analysis has long been a requirement for researchers seeking funding from the Institute of Education Sciences (IES). As in all individual studies, power analysis is also important when conducting meta-analytic review studies to ensure that the study has sufficient ability to detect an overall treatment effect of interest across a large group of related studies. For example, suppose a meta-analytic review determines that a school intervention significantly improves student performance and has good power to detect that effect, then, researchers will have more confidence in further investing in, developing, and recommending the specific intervention for extensive usage. Calculating statistical power can also inform researchers as they design studies. For instance, power analysis can inform the necessary number of studies needed in their sample to detect an effect across all of those studies in a meta-analysis. This study extends prior research on power analysis for univariate meta-analysis and adds new aspects that facilitate the calculations of statistical power.

A three-level model in meta-analysis considers heterogeneity across research groups

There are a few common approaches to conduct meta-analysis. However, recent realizations suggest that the same authors often publish several studies in a certain topic, and thus may be represented many times in the meta-analysis. To address this issue, approaches to calculating statistical power in these studies should account for the repeated representation of the same study teams. Thus, in our study, we formally introduce methodology that adds third level units in the meta-analysis.

In the proposed three-level meta-analysis, the effect sizes are nested within studies, which in turn are nested within research groups of investigators (see the illustrative figure). Specifically, in this illustration, one effect size (e.g., ES 1) is extracted from each study (e.g., Study 1) and several studies (e.g., Study 1 to Study i) are linked to a research group (e.g., Research Group 1) because they are conducted by the same authors. The variance between these third level units (i.e., research groups) may influence the power of the meta-analysis. Consequently, the proposed three-level model takes into account the between-study (second level) and the between-research group (third level) variances and produces more accurate power estimates.

Read More

Reading Aloud to Children, Social Inequalities and Vocabulary Development: Evidence from a Randomized Controlled Trial

The shared book reading intervention

We designed a four-month intervention that integrated a school-based book loan along with information on the benefits of shared book reading (SBR) for children and provided tips for effective reading practices. We did this using weekly flyers, a short phone call and six text messages sent to the parents. This intervention was aimed at fostering children’s language skills by enhancing the frequency and the quality of parent-child interactions around books. To assess the impact of this intervention, we used a randomized experiment, which involved a large, random sample of 4-year olds (N=1880) who attended 47 pre-primary schools located in the city of Paris. This evaluation design marks a significant improvement over previous studies in that the results are applicable to a much larger population. Our large sample size, sampling design and high participation rates of schools and families helped us achieve this outcome.

Important features of the SBR

Three features of this intervention are especially important. First, it was focused on accessibility of information messages for families with low education and an immigrant background. Second, the intervention has an intensive and continued format, aimed at fostering a persistent change in parenting routines. Third, its focus on parent-child interactions around books and the enjoyment of this activity for both parents and kids.  

Read More

The Impact of a Standards-based Grading Intervention on Ninth Graders’ Mathematics Learning

What is Standards-based Grading?

Typically, U.S. classroom teachers use some tests and assignments purely for summative purposes, recording scores indelibly to be used in a weighted average that determines final grades. In contrast, under a standards-based grading system the teacher uses such assessment evidence both to evaluate the extent to which a student is proficient in each of the course’s learning outcomes at that particular moment in time (summative assessment), and then to provide students with personalized feedback designed to guide further learning (formative assessment). A key feature of standards-based grading is that students are then given opportunities to do further work, at home or in school, and to be reassessed for full credit. In other words, summative assessments become formative tools designed to promote further learning, not just markers of how much students have learned already.

How did we conduct this study?

We conducted a cluster randomized controlled trial, recruiting 29 schools and randomly assigning half (14 schools) to a Treatment condition, and half (15 schools) to a Control condition.  Treatment schools implemented the standards-based grading program, called PARLO, in their ninth-grade algebra and geometry classrooms, and Control schools proceeded with business-as-usual. In our participating districts, instruction to learning standards and implementation of formative assessment were already commonly in use. Consequently, the PARLO program focused on implementing two necessary components of standards-based grading. The first was Mastery: students were rated as not-yet-proficient, proficient, or high-performance on each learning outcome, and final grades were computed using a formula based on the number of proficient and the number of high-performance learning outcomes. The second was Reassessment: after providing evidence that they had done further studying, any student could be reassessed for full credit on any learning outcome.

Read More