How to Learn when Data Gradually Reacts to Your Model

paper by Zachary Izzo, James Zou, Lexing Ying Presentation by

Kimia Kazemian

31/10/2022

what is this about?

  • Problem: training ML models in the performative setting.
  • Goal: minimize perforative risk: model loss on the distribution it induces.
  • Previous work: induced data distribution depends only on the deployed model.
  • Too simplistic? dependence on the “state”
  • Example: credit score drawing
  • Contribution: meta algorithm

Problem setup

  • $D : \Theta × M(Z) → M(Z)$
  • $\Theta$: set of admissible model parameters
  • $Z$: data sample space.
  • $M(Z)$: set of probability measures on $Z$.
  • $\rho_t = D(\theta_t, \rho_{t−1}).$ $\mu_t=m(\theta_t,\mu_{t-1})$
  • $\rho_∗(\theta) = \underset{t\to\infty} {lim} \rho_t \hspace{0.5em}$ where $\hspace{0.5em}\theta_t ≡ \theta \hspace{0.5em}$ for all $t$ $\mu_*=\underset{k \to \infty}{lim} \hspace{0.5em} m^{(k)}(\theta,\mu_{k-1})$
  • $\theta_{OPT}=\underset{\theta \in \Theta}{argmin \hspace{0.5em}} \mathcal{L}^*(\theta)$
  • Assume $\rho_t$ belongs to a parametric family with parameter $\mu_t$ and density $p(.,\mu_t)$

Problem setup

  • Assume $\theta,\mu \in \mathbb{R}^d$
  • $\partial_i f$ denotes derivative wrt $i$th argument
  • $\psi_t = [\theta_t^\top, \mu_t^\top] ^\top$ denotes the full input to $m$ at time $t$, and for any collection of vectors $v_i, v_{i:j}$ denotes the matrix with columns $v_i, v_{i+1}, …, v_j$.

How do we do it?

How do we do it?

How do we do it?

low dimensional statistics?

  • Observation: individuals modify their behavior based on a low- dimensional proxy, such as a credit score or classification probability

  • How can we apply stateful PerfGD for a high-dimensional model without incurring a large error due to the high dimension?

  • $\mu_t = m(\theta_t,\mu_{t-1})= \bar{m}(s(\theta_t,\mu_{t-1}),\mu_{t-1})$ $s(\theta,\mu)\in \mathbb{R}^{d_s}$ and $d_s \ll dim(\theta)$

  • $\partial_1 s(\theta_t,\mu_{t-1})=\partial_1 \bar{m}(s_t,\mu_{t-1})\partial s(\theta_t,\mu_{t-1})$

experiments

spam classification:

Reference:Strategic classification

what else?

  • societal impacts: possibly maximize a certain measure of negative externality

  • future work: what are our assumptions?

    • deterministic MDP

    • batch setting

Reference: Alternative microfoundations for strategic classification

fin