Representation Learning in Linear Factor Models
Joint with José Luis Montiel Olea
Abstract: In this work, we analyze recent theoretical developments in the representation learning literature through the lens of a linear Gaussian factor model. First, we derive sufficient representations—defined as functions of covariates that, upon conditioning, render the outcome variable and covariates independent. Then, we study the theoretical properties of these representations and establish their asymptotic invariance; which means the dependence of the representations on the factors’ measurement error vanishes as the dimension of the covariates goes to infinity. Finally, we use a decision-theoretic approach to understand the extent to which representations are useful for solving downstream tasks. We show that the conditional mean of the outcome variable given covariates is an asymptotically invariant and sufficient representation that can solve any task efficiently, not only prediction.