An Introduction to Hierarchical Linear Models for
Causal Inference in Multilevel Settings

Stephen Raudenbush & Anthony Bryk

Many studies in education, human development, public health, and allied fields are longitudinal or multilevel, or both. In longitudinal studies, it is often possible to repeatedly observe participants. This allows the assessment of growth in academic achievement or change in health status. Multilevel data arise when participants are clustered within social settings such as classrooms, schools, and neighborhoods.

Data that are both longitudinal and multilevel include studies of teacher or school effects on student academic growth, and neighborhood and family effects on changes in health. In some cases the participants migrate across social settings over time. Children experience a sequence of classrooms and schools during their academic careers. Even when participants remain in place, the character of the local environment may change. Hierarchical linear models (HLM) provide a flexible framework to analyze longitudinal and multilevel data.

The short course will begin with an introduction to the hierarchical linear model, and its application in longitudinal and multilevel research. We will then consider problems of causal inference that arise in longitudinal and multilevel settings. Over the course of the three days, our focus will be on the formulation and application of models to real data. Participants will run analyses, discuss their findings, and consider the implications for the design and analysis of their own research.

Understanding how to analyze data from randomized experiments provides the foundation for comprehending causal inference more generally. We will explore experimental design and analysis in group-randomized trials and multi-site randomized trials. Multi-site trials are now prevalent in education, and offer rich opportunities to estimate both the average impact of an intervention and the distribution of impacts across social settings. We will discuss the suitability of certain widely-employed methods of analysis for these circumstances, and specify the conditions required for the appropriate use of HLM.

We will then turn to causal inference in non-randomized studies. Methods to address the problem of non-compliance in randomized experiments, and the utility of instrumental variables to study the impact of participation in a new program, will be examined. These approaches are now standard in single-level settings, but novel in multilevel settings. We will apply propensity-score matching to approximate group-randomized and multi-site trials.

In longitudinal settings, a key, though often overlooked, challenge is time-varying confounding.
Past instructional experience may influence the likelihood of receiving future instruction.
Understanding this dynamic process is critical in education research, and important in assessing human development and long-term health outcomes. We will demonstrate how weighting methods may remove observed time-varying confounding. Finally, we will consider value-added models and the problems they may encounter when striving to provide valid inference about teacher and school effects.