View on GitHub

SeminarLearningTheory

Website for the Seminar on Learning Theory, taught WS18/19 by Michael Kamp and Pascal Welke

A Proof Proposal for Theorem 7.2

Theorem 7.2: A hypothesis class $\mathcal{H}$ of binary classifiers is nonuniformly learnable if and only if it is a countable union of agnostic PAC learnable hypothesis classes.

Proof Proposal:

”$\Leftarrow$” as in the book.

”$\Rightarrow$” Assume that $\mathcal{H}$ is nonuniform learnable using some algorithm $A$. For every $n \in \mathbb{N}$ let

$ \mathcal{H}_{n} := \{ h \in \mathcal{H} : m_{\mathcal{H}}^{NUL} (1/10, 1/10, h) \leq n \} .$

It follows that $\mathcal{H} = \bigcup_{n \in \mathbb{N}} \mathcal{H}_{n}$.

We prove the “$\Rightarrow$” claim by contradiction. I.e., assume that $\mathcal{H}_{n}$ is not APAC learnable then we find a contradiction to its definition:

Suppose $\mathcal{H}_{n}$ is not APAC learnable. Then, by the fundamental Theorem (Thm. 6.7) $\mathcal{H}_{n}$ has infinite VC-dimension. (Note, that this implies that the domain $\mathcal{X}$, on which $\mathcal{H}_{n}$ and $\mathcal{H}$ operate must be infinite, as well). Thus, $\mathcal{H}_{n}$ shatters sets of size $n’$ for all $n’ \in \mathbb{N}$. $A$ is a learning algorithm for the class of binary classificators with $0-1$-loss. Thus, for any $n’$ the No-Free-Lunch Theorem (Thm. 5.1) holds with respect to some $h’ \in \mathcal{H}_{n}$. (In particular, Thm. 5.1 does not require $A$ to output anything more specific than a function $f: \mathcal{X} \to \{ 0,1 \} $.)

That is:

$ \forall n’ \in \mathbb{N} \ \exists $ a distribution $D$ that is realizable by $\mathcal{H}_{n}$ and $\exists h’ \in \mathcal{H}_{n}$ s.t. with probability $ \geq 1/7$ over the choice of $ S \sim D^{n’}$ it holds that $L_{D} (A(S)) \geq 1/8 = L_{D} (h’) + 1/8$.

However, by definition of $\mathcal{H}_{n}$ we have $\forall h\in \mathcal{H}_{n} : m_{\mathcal{H}}^{NUL} (1/10, 1/10, h) \leq n$. That is, $ \forall D \ \forall h \in \mathcal{H}_{n} : L_{D} (A(S)) \leq L_{D}(h) + 1/10 $ with probability $\geq 9/10$ over the choice of $S \sim D^n$.

Considering the case $n’ = n$ yields the desired contradiction. Hence, $\mathcal{H}_{n}$ must be agnostic PAC learnable.

Comments

Looks reasonable to me. Just a small suggestion: As you argue anyway with $L_{D} (h’) + 1/8$ and $L_{D}(h) + 1/10$ (and not only with 1/8 and 1/10), we could drop the realizability assumption completely and directly argue for all $D$. In my opinion, this would make the proof even a little simpler.