Filtered by tag: Methods Remove Filter

Does Early Mathematics Intervention Change the Processes Underlying Children’s Learning?

Summary by: Wen Wen

PDF Version

What are “state-” and “trait-” math achievements in early education?

Interventions can boost early math skills, but the role of these early skills on later math achievement is unclear. Consider that students who demonstrate stronger early math skills tend to demonstrate stronger later math achievement, yet some interventions that improve early math skills do not improve later math achievement – that is, the early benefits fade substantially after 2 or 3 years.

Read More

Design and Analytic Features for Reducing Biases in Skill-Building Intervention Impact Forecasts

Daniela Alvarez-Vargas, Sirui Wan, Lynn S. Fuchs, Alice Klein, & Drew H. Bailey

PDF Version

Despite policy relevance, long term evaluations of educational interventions are rare relative to the amount of end of treatment evaluations. A common approach to this problem is to use statistical models to forecast the long-term effects of an intervention based on the estimated shorter term effects. Such forecasts typically rely on the correlation between children’s early skills (e.g., preschool numeracy) and medium-term outcomes (e.g., 1st grade math achievement), calculated from longitudinal data available outside the evaluation. This approach sometimes over- or under-predicts the longer-term effects of early academic interventions, raising concerns about how best to forecast the long-term effects of such interventions. The present paper provides a methodological approach to assessing the types of research design and analysis specifications that may reduce biases in such forecasts.

What did we do?

Read More

Quantifying ‘promising trials bias’ in randomized controlled trials in education

Sam Sims, Jake Anders, Matthew Inglis, Hugues Lortie-Forgues

PDF Version

Randomized controlled trials (RCTs) have proliferated in education, in part because they provide an unbiased estimator for the causal impact of interventions. Yet RCTs are only unbiased in expectation (on average across many RCTs).

Estimates of the effect size from specific RCTs will in general diverge from the true effect due to chance differences between the treatment and control group. In suitably powered trials, this imbalance tends to be small and statistical inference helps to control erroneous findings.

Read More

A Framework for addressing Instrumentation Biases when using Observation Systems as Outcome Measures in Instructional Interventions

Mark White, Bridget Maher, Brian Rowan

PDF Version

Many educational interventions seek to directly shift instructional practice. Observation systems are used to measure changes in instructional practice resulting from such interventions. However, the complexity of observation systems creates the risk of instrumentation biases. Instrumentation bias is bias resulting from changes to the ways that an instrument functions across conditions (e.g., from pre-test to post-test or between control and intervention conditions). For example, teachers could intentionally show off intervention-specific practices whenever they are observed, but not otherwise use those practices. Alternatively, an instructional intervention could shift instruction in ways that increase observation scores without impacting the underlying instructional dynamics that support student learning.

This conceptual paper with a case study exemplar provides a validity framework for using observation systems to evaluate the impact of interventions. Inferences about an intervention’s impact generally involve determining whether a teaching practice has changed within some setting. Observation scores, the evidence for these conclusions, are specific raters’ views of how a rubric would describe observed lessons. The conclusions are far more generalized than the observation scores. The framework (see Figure below) systematically breaks down the processes necessary to operationalize an aspect of teaching practice and sample from a setting to obtain observation scores that can be generalized to draw conclusions.

Read More

A recipe for disappointment: policy, effect size and the winner’s curse

Adrian Simpson

PDF Version

Effect size and policy

Standardized effect size estimates are commonly used by the ‘evidence-based education’ community as a key metric for judging relative importance, effectiveness, or practical significance of interventions across a set of studies: larger effect sizes indicate more effective interventions. However, this argument applies rarely; only when linearly equatable outcomes, identical comparison treatments and equally representative samples are used in every study.

Read More

The Meta-Analytic Rain Cloud (MARC) Plot: A New Approach to Visualizing Clearinghouse Data

Kaitlyn G. Fitzgerald & Elizabeth Tipton

PDF Version

What type of data do clearinghouses communicate?

As the body of scientific evidence about what works in education grows, so does the need to effectively communicate that evidence to policy-makers and practitioners. Clearinghouses, such as the What Works Clearinghouse (WWC), have emerged to facilitate the evidence-based decision-making process and have taken on the non-trivial task of distilling often complex research findings to non-researchers. Among other things, this involves reporting effect sizes, statistical uncertainty, and meta-analytic summaries. This information is often reported visually. However, existing visualizations often do not follow data visualization best practices or take the statistical cognition of the audience into consideration.

Read More

Modeling and Comparing Seasonal Trends in Interim Achievement Data

James Soland & Yeow Meng Thum

PDF Version

Introduction

Interim achievement tests are often used to monitor student and school performance over time. Unlike end-of-year achievement tests used for accountability, interim tests are administered multiple times per year (e.g., Fall, Winter, and Spring) and vary across schools in terms of when in the school year students take them. As a result, scores reflect seasonal patterns in achievement, including summer learning loss. Despite the prevalence of interim tests, few statistical models are designed to answer questions commonly asked with interim test data (e.g., Do students whose achievement grows the most over several years, tend to experience below-average summer loss?). In this study we compare the properties of three growth models that can be used to examine interim test data.

Read More

Examining the Earnings Trajectories of Community College Students Using a Piecewise Growth Curve Modeling Approach

Summary by: Lily An

PDF Version

Traditional methods of estimating the returns to community college remain imprecise.

Historically, to estimate the labor market returns to a community college degree, researchers have compared the earnings of students who completed a degree to those who did not, at a single point in time, while controlling for background characteristics. With the expansion of longitudinal data sets, researchers have begun to consider how earnings before and during community college can affect returns to community college. However, even improved econometric analyses overlook some temporal influences on predicted earnings growth, such as the time between graduation and measured earnings, instead estimating averaged returns over time. These influences are particularly salient for community college students, who vary in their time-to-degree completion and often enter college with pre-existing or concurrent work experiences.

Read More

Parasympathetic Function: Relevance and Methodology for Early Education Research

Summary by: Lindsay Gomes

PDF Version

The definition of school readiness in the contexts of educational research, practice, and policy has changed considerably over the past 60 years. After a long period of prioritizing academic skills (e.g., letter-shape knowledge), many researchers now emphasize the extent to which young children can control their emotions and behaviors as key to school readiness. This capacity is commonly referred to as self-regulation, which is often defined in terms of volitional, cognitively-mediated processes such as executive functions. In this paper, we assert that understanding children’s parasympathetic function is essential to providing a holistic understanding of self-regulation in the classroom and for informing how the classroom environment can be tailored to most effectively promote young children’s development.

What is parasympathetic function and why is it important?

Read More

Gather-Narrow-Extract: A Framework for Studying Local Policy Variation Using Web-Scraping and Natural Language Processing

Kylie L. Anglin

PDF Version

Many education policy decisions are made at the local level. School districts make policies regarding hiring, resource allocation, and day-to-day operations. However, collecting data on local policy decisions has traditionally been expensive and time-consuming, sometimes leading researchers to leave important research questions unanswered.

This paper presents a framework for efficiently identifying and processing local policy documents posted online – documents like staff manuals, union contracts, and school improvement plans – using web-scraping and natural language processing.

Read More

Mitigating Illusory Results through Preregistration in Education

Summary by: Claire Chuter

PDF Version

Good researchers thoroughly analyze their data, right? Practices like testing the right covariates, running your analyses in multiple ways to find the best fitting model, screening for outliers, and testing for mediation or moderation effects are indeed important practices… but with a massive caveat. The aggregation of many of these rigorous research practices (as well as some more dubious ones) can lead to what the authors call “illusory results” – results that seem real but are unlikely to be reproduced. In other words, implementation of these common practices (see Figure 1 in the article), often leads researchers to run multiple analytic tests which may unwittingly inflate their chances of stumbling upon a significant finding by chance.

Potential Solutions

Read More

Partially Identified Treatment Effects for Generalizability

Wendy Chan

PDF Version

Will this intervention work for me?

This is one of the questions that make up the core of generalization research. Generalizations focus on the extent to which the findings of a study apply to people in a different context, in a different time period, or in a different study altogether. In education, one common type of generalization involves examining whether the results of an experiment (e.g., the estimated effect of an intervention) apply to a larger group of people, or a population.

Read More

The Methodological Challenges of Measuring Institutional Value-added in Higher Education

Tatiana Melguizo, Gema Zamarro, Tatiana Velasco, and Fabio J. Sanchez

PDF Version

Assessing the quality of higher education is hard but there is growing pressure for governments to create a ranking system for institutions that can be used for assessment and funding allocations.  Such a system, however, would require a reliable methodology to fairly assess colleges using a wide variety of indicators. Countries with centralized governance structures have motivated researchers to develop “value-added” metrics of colleges’ contributions to student outcomes that can be used for summative assessment (Coates, 2009; Melguizo & Wainer, 2016; Shavelson et al. 2016). Estimating the “value-added” of colleges and programs, however, is methodologically challenging: first, high- and low-achieving students tend to self-select into different colleges– a behavior that if not accounted for, may yield to estimates that capture students’ prior achievement rather than colleges’ effectiveness at raising achievement; second, measures considering gains in student learning outcomes (SLOs) as indicators at the higher education level are scant. In our paper, we study these challenges and compare the methods used for obtaining value-added metrics in the context of higher education in Colombia.

How to best estimate value-added models in higher education?

Read More

Between-School Variation in Students’ Achievement, Motivation, Affect, and Learning Strategies: Results from 81 Countries for Planning Cluster-Randomized Trials in Education

Martin Brunner, Uli Keller, Marina Wenger, Antoine Fischbach & Oliver Lüdtke

PDF Version

Does an educational intervention work?

When planning an evaluation, researchers should ensure that it has enough statistical power to detect the expected intervention effect. The minimally detectable effect size, or MDES, is the smallest true effect size a study is well positioned to detect. If the MDES is too large, researchers may erroneously conclude that their intervention does not work even when it does. If the MDES is too small, that is not a problem per se, but it may mean increased cost to conduct the study.  The sample size, along with several other factors, known as design parameters, go into calculating the MDES. Researchers must estimate these design parameters. This paper provides an empirical bases for estimating design parameters in 81 countries across various outcomes.

Read More

Latent Profiles of Reading and Language and Their Association with Standardized Reading Outcomes in K-10th Grade

Barbara R Foorman, Yaacov Petscher, Christopher Stanley, & Adrea Truckenmiller

PDF Version

Differentiated instruction involves tailoring instruction to individual student’s learning needs. While critical to effective teaching, an understudied first step in differentiated instruction is understanding students’ learning profiles – that is, their strengths and weaknesses in knowledge and skills.  It is only after a student’s learning profile is understood that a teacher can individualize instruction. But how can educators best measure learning profiles to facilitate differentiated instruction?

Descriptive approaches such as informal reading inventories lack the psychometric rigor required for purposes of classification, placement, and monitoring growth.  However, quantitative approaches to classifying and clustering (i.e., grouping) students by skill classes and validating the clusters by relating them to standardized tests is a reliable tool for creating profiles. The objective of this study was twofold. First, to determine the profiles of reading and language skills that characterized 7,752 students in kindergarten through 10th grade. Second, to relate the profiles to standardized reading outcomes.

Read More

Bounding, an accessible method for estimating principal causal effects, examined and explained

Luke Miratrix, Jane Furey, Avi Feller, Todd Grindal, and Lindsay Page

PDF Version

Estimating program effects for subgroups is hard. Estimating effects for types of people who exist in theory, but whom we can’t always identify in practice (i.e., latent subgroups) is harder. These challenges arise often, with noncompliance being a primary example. Another is estimating effects on groups defined by “counterfactual experience,” i.e., by what opportunities would have been available absent treatment access. This paper tackles this difficult problem. We find that if one can predict, with some accuracy, latent subgroup membership, then bounding is a nice evaluation approach, relying on weak assumptions. This is in contrast to many alternatives that are tricky, often unstable, and/or rely on heroic assumptions.

What are latent subgroups again?

Read More

Using Multisite Experiments to Study Cross-Site Variation in Treatment Effects

Howard Bloom, Steve Raudenbush, Michael Weiss, & Kristin Porter

PDF version

Multisite randomized trials are experiments where individuals are randomly assigned to alternative experimental arms within each of a collection of sites (e.g., schools).  They are used to estimate impacts of educational interventions. However, little attention has been paid to using them to quantify and report cross-site impact variation. The present paper, which received the 2017 JREE Outstanding Article Award, provides a methodology that can help to fill this gap.

Why and how is knowledge about cross-site impact variation important?

Read More

The Implications of Teacher Selection and the Teacher Effect in Individually Randomized Group Treatment Trials

Michael Weiss

PDF Version

Beware! Teacher effects could mess up your individually randomized trial! Or such is the message of this paper focusing on what happens if you have individual randomization, but teachers are not randomly assigned to experimental groups.

The key idea is that if your experimental groups are systematically different in teacher quality, you will be estimating a combined impact of getting a good/bad teacher on top of the impact of your intervention.

Read More

Effect Sizes Larger in Developer-Commissioned Studies than in Independent Studies

Rebecca Wolf, Jennifer Morrison, Amanda Inns, Robert Slavin, and Kelsey Risman

PDF Version

Rigorous evidence of program effectiveness has become increasingly important with the 2015 passage of the Every Student Succeeds Act (ESSA). One question that has not yet been addressed is whether findings from program evaluations carried out or commissioned by developers are as trustworthy as those identified in studies by independent third parties. Using study data from the What Works Clearinghouse, we found evidence of a “developer effect,” where program evaluations carried out or commissioned by developers produced average effect sizes that were substantially larger than those identified in evaluations conducted by independent parties.

Why is it important to accurately determine the effect sizes of an educational program?

Read More