Filtered by tag: Methods Remove Filter

Mitigating Illusory Results through Preregistration in Education

Summary by: Claire Chuter

PDF Version

Good researchers thoroughly analyze their data, right? Practices like testing the right covariates, running your analyses in multiple ways to find the best fitting model, screening for outliers, and testing for mediation or moderation effects are indeed important practices… but with a massive caveat. The aggregation of many of these rigorous research practices (as well as some more dubious ones) can lead to what the authors call “illusory results” – results that seem real but are unlikely to be reproduced. In other words, implementation of these common practices (see Figure 1 in the article), often leads researchers to run multiple analytic tests which may unwittingly inflate their chances of stumbling upon a significant finding by chance.

Potential Solutions

Read More

Partially Identified Treatment Effects for Generalizability

Wendy Chan

PDF Version

Will this intervention work for me?

This is one of the questions that make up the core of generalization research. Generalizations focus on the extent to which the findings of a study apply to people in a different context, in a different time period, or in a different study altogether. In education, one common type of generalization involves examining whether the results of an experiment (e.g., the estimated effect of an intervention) apply to a larger group of people, or a population.

Read More

The Methodological Challenges of Measuring Institutional Value-added in Higher Education

Tatiana Melguizo, Gema Zamarro, Tatiana Velasco, and Fabio J. Sanchez

PDF Version

Assessing the quality of higher education is hard but there is growing pressure for governments to create a ranking system for institutions that can be used for assessment and funding allocations.  Such a system, however, would require a reliable methodology to fairly assess colleges using a wide variety of indicators. Countries with centralized governance structures have motivated researchers to develop “value-added” metrics of colleges’ contributions to student outcomes that can be used for summative assessment (Coates, 2009; Melguizo & Wainer, 2016; Shavelson et al. 2016). Estimating the “value-added” of colleges and programs, however, is methodologically challenging: first, high- and low-achieving students tend to self-select into different colleges– a behavior that if not accounted for, may yield to estimates that capture students’ prior achievement rather than colleges’ effectiveness at raising achievement; second, measures considering gains in student learning outcomes (SLOs) as indicators at the higher education level are scant. In our paper, we study these challenges and compare the methods used for obtaining value-added metrics in the context of higher education in Colombia.

How to best estimate value-added models in higher education?

Read More

Between-School Variation in Students’ Achievement, Motivation, Affect, and Learning Strategies: Results from 81 Countries for Planning Cluster-Randomized Trials in Education

Martin Brunner, Uli Keller, Marina Wenger, Antoine Fischbach & Oliver Lüdtke

PDF Version

Does an educational intervention work?

When planning an evaluation, researchers should ensure that it has enough statistical power to detect the expected intervention effect. The minimally detectable effect size, or MDES, is the smallest true effect size a study is well positioned to detect. If the MDES is too large, researchers may erroneously conclude that their intervention does not work even when it does. If the MDES is too small, that is not a problem per se, but it may mean increased cost to conduct the study.  The sample size, along with several other factors, known as design parameters, go into calculating the MDES. Researchers must estimate these design parameters. This paper provides an empirical bases for estimating design parameters in 81 countries across various outcomes.

Read More

Latent Profiles of Reading and Language and Their Association with Standardized Reading Outcomes in K-10th Grade

Barbara R Foorman, Yaacov Petscher, Christopher Stanley, & Adrea Truckenmiller

PDF Version

Differentiated instruction involves tailoring instruction to individual student’s learning needs. While critical to effective teaching, an understudied first step in differentiated instruction is understanding students’ learning profiles – that is, their strengths and weaknesses in knowledge and skills.  It is only after a student’s learning profile is understood that a teacher can individualize instruction. But how can educators best measure learning profiles to facilitate differentiated instruction?

Descriptive approaches such as informal reading inventories lack the psychometric rigor required for purposes of classification, placement, and monitoring growth.  However, quantitative approaches to classifying and clustering (i.e., grouping) students by skill classes and validating the clusters by relating them to standardized tests is a reliable tool for creating profiles. The objective of this study was twofold. First, to determine the profiles of reading and language skills that characterized 7,752 students in kindergarten through 10th grade. Second, to relate the profiles to standardized reading outcomes.

Read More

Bounding, an accessible method for estimating principal causal effects, examined and explained

Luke Miratrix, Jane Furey, Avi Feller, Todd Grindal, and Lindsay Page

PDF Version

Estimating program effects for subgroups is hard. Estimating effects for types of people who exist in theory, but whom we can’t always identify in practice (i.e., latent subgroups) is harder. These challenges arise often, with noncompliance being a primary example. Another is estimating effects on groups defined by “counterfactual experience,” i.e., by what opportunities would have been available absent treatment access. This paper tackles this difficult problem. We find that if one can predict, with some accuracy, latent subgroup membership, then bounding is a nice evaluation approach, relying on weak assumptions. This is in contrast to many alternatives that are tricky, often unstable, and/or rely on heroic assumptions.

What are latent subgroups again?

Read More

Using Multisite Experiments to Study Cross-Site Variation in Treatment Effects

Howard Bloom, Steve Raudenbush, Michael Weiss, & Kristin Porter

PDF version

Multisite randomized trials are experiments where individuals are randomly assigned to alternative experimental arms within each of a collection of sites (e.g., schools).  They are used to estimate impacts of educational interventions. However, little attention has been paid to using them to quantify and report cross-site impact variation. The present paper, which received the 2017 JREE Outstanding Article Award, provides a methodology that can help to fill this gap.

Why and how is knowledge about cross-site impact variation important?

Read More

The Implications of Teacher Selection and the Teacher Effect in Individually Randomized Group Treatment Trials

Michael Weiss

PDF Version

Beware! Teacher effects could mess up your individually randomized trial! Or such is the message of this paper focusing on what happens if you have individual randomization, but teachers are not randomly assigned to experimental groups.

The key idea is that if your experimental groups are systematically different in teacher quality, you will be estimating a combined impact of getting a good/bad teacher on top of the impact of your intervention.

Read More

Effect Sizes Larger in Developer-Commissioned Studies than in Independent Studies

Rebecca Wolf, Jennifer Morrison, Amanda Inns, Robert Slavin, and Kelsey Risman

PDF Version

Rigorous evidence of program effectiveness has become increasingly important with the 2015 passage of the Every Student Succeeds Act (ESSA). One question that has not yet been addressed is whether findings from program evaluations carried out or commissioned by developers are as trustworthy as those identified in studies by independent third parties. Using study data from the What Works Clearinghouse, we found evidence of a “developer effect,” where program evaluations carried out or commissioned by developers produced average effect sizes that were substantially larger than those identified in evaluations conducted by independent parties.

Why is it important to accurately determine the effect sizes of an educational program?

Read More