Causal discovery refers to a set of data-driven methods aimed at uncovering the underlying causal relationships among variables from observational data. Unlike traditional correlation-based analyses that capture associations, causal discovery attempts to infer directionality and identify the influence pathways. These methods often rely on statistical and algorithmic principles, such as conditional independence tests (e.g., PC algorithm). The goal is to recover a causal graph or structure that represents the true generative process.
Causal analysis can be approached from two complementary directions: causal inference and causal discovery. Causal inference typically estimates the magnitude of effects under a specified or theory-informed structure (e.g., with instrumental variables or structural equation models), whereas causal discovery seeks to learn the causal structure itself from data (e.g., via conditional-independence tests). In practice, these perspectives can inform one another: discovery can suggest plausible structures for subsequent inference, and inference can help evaluate and refine structures proposed by discovery.
Despite its growing methodological advances, causal discovery remains relatively underexplored in empirical research compared to causal inference. Many applied studies assume a pre-specified causal model based on theory or prior findings, overlooking the possibility that the true causal structure may differ from conventional assumptions. Incorporating causal discovery into empirical analysis offers an opportunity to complement theory-driven inference with data-driven structure learning, enabling researchers to uncover hidden causal mechanisms and refine existing theoretical models.
Investigate causal influence mechanism
Guide intervention design by targeting the learned causal parents of the outcome
Validation of instrumental variables (IVs)
Identify and quantify a cause-and-effect relationship
Evaluate policy performance and design optimal policy given estimated effects
Design personalized intervention strategies based on heterogeneous treatment effects
In my job-market paper, “Causal Product Networks: A Data-Driven Methodology for Modeling Basket-Shopping Consumer Behavior”, we develop a causal discovery approach to estimate product-level causal networks from observational basket data. Under standard assumptions (acyclicity, Markov, and faithfulness), we use conditional independence tests to (i) recover the network’s skeleton and (ii) orient the edges where identifiable from the data, yielding a causal DAG that captures causal relations among products consistent with the latent structure generating the observed baskets.