Probably Approximately Correct Learning

PAC learning model

PAC is used to formalize and address following questions: - What can be learned efficiently? - What is inherently hard to learn? - How many examples are needed to learn successfully? - Is there a general model of learning?

We denote be the set of all possible examples, input space. The set of all posible labels or targets is . A concept is a mapping from to . A concept class is a set of concepts is denoted by . The learner considers a fixed set of possible concepts , called a hypothesis set, which might not necessarily coincide with . It

Generalization error Given a hypothesis , a target concept , and an underlying distribution , the generalization error or risk of is defined by

where is the indicator function of the event . The generalization error of a hypothesis is not directly accessible to the learner, but learner can measure the empirical error of a hypothesis on the labeled sample .

Empirical error Given a hypothesis , a target concept , and a sample , the empirical error or empirical risk of is defined by:

Thus, the empirical error of is its average error over the sample , while the generalization error is its expected error based on the distribution . For a fixed , the expectation of the empirical error on is equal to the generalization error:

Indeed, by the linearity of the expectation and the fact that the sample is drawn i.i.d., we can write

for any in sample . Thus

PAC-learning A concept class is said to be PAC-learnable if there exists an algorithm and a polynomial funcion such that for any and , for all distributions on and for any target concept , the following holds for any sample size :

If futher runs in , then is said to be efficiently PAC-learnable. When such an algorithm exists, it is called a PAC-learning algorithm for .

Another explanation: [2]

: Concept class; : Learner; : Hypothesis space; , size of hypothesis space; : distribution over inputs; : error goal; : certainty goal.

Probably Approximately Correct

is PAC-learnable by using iff learner will with probability , output a hypothesis such that in time and samples polynomial

Third explaination: [3]

Probably approximately correct (PAC) learning theory helps analyze whether and under what conditions a learner will probably output an approximately correct classifier.

First, let's define "approximate." A hypothesis is approximately correct if its error over the distribution of inputs is bounded by some , . I.e., , where is the distribution over inputs.

Next, "probably." If will output such a classifier with probability , with , we call that classifier probably approximately correct.

Knowing that a target concept is PAC-learnable allows you to bound the sample size necessary to probably learn an approximately correct classifier, which is what's shown in the formula you've reproduced:

Shattering, VC-dimension

When the hypothesis spaces are infinite, the VC-dimension provides such a measure.

Definition: Shattering (1) - the set of splittings of dataset using concepts from . shatters if .

A set of points is shattered by is there are hypotheses in that split in all of the possible ways; i.e., all possible ways of classifying points in are achievable using concepts in .

Definition: Shattering (2) A set of size of is shattered by if , i.e. all possible labelings of the set are realized by function in .

Definition: VC-dimension (1) The VC-dimension of a hypothesis space is the cardinality of the largest set that can be shattered by . If arbitraily large finite sets can be shattered by , then .

Definition: VC-Dimension (2) VC-dim() = cardinality of the largest set shattered by .

References

[1] @book{mohri2018foundations, title={Foundations of machine learning}, author={Mohri, Mehryar and Rostamizadeh, Afshin and Talwalkar, Ameet}, year={2018} }

[2] https://www.youtube.com/watch?v=e37nlms7Zi0

[3] https://stats.stackexchange.com/questions/142906/what-does-pac-learning-theory-mean