Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Introduction  





2 Definition of Occam learning  





3 The relation between Occam and PAC learning  



3.1  Theorem (Occam learning implies PAC learning)  





3.2  Theorem (Occam learning implies PAC learning, cardinality version)  







4 Proof that Occam learning implies PAC learning  





5 Improving sample complexity for common problems  





6 Extensions  





7 See also  





8 References  





9 Further reading  














Occam learning






Català
Español
Русский
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Incomputational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

Occam learnability implies PAC learning, and for a wide variety of concept classes, the converse is also true: PAC learnability implies Occam learnability.

Introduction

[edit]

Occam Learning is named after Occam's razor, which is a principle stating that, given all other things being equal, a shorter explanation for observed data should be favored over a lengthier explanation. The theory of Occam learning is a formal and mathematical justification for this principle. It was first shown by Blumer, et al.[1] that Occam learning implies PAC learning, which is the standard model of learning in computational learning theory. In other words, parsimony (of the output hypothesis) implies predictive power.

Definition of Occam learning

[edit]

The succinctness of a concept inconcept class can be expressed by the length of the shortest bit string that can represent in. Occam learning connects the succinctness of a learning algorithm's output to its predictive power on unseen data.

Let and be concept classes containing target concepts and hypotheses respectively. Then, for constants and , a learning algorithm is an -Occam algorithm for using iff, given a set of samples labeled according to a concept , outputs a hypothesis such that

where is the maximum length of any sample . An Occam algorithm is called efficient if it runs in time polynomial in , , and We say a concept class isOccam learnable with respect to a hypothesis class if there exists an efficient Occam algorithm for using

The relation between Occam and PAC learning

[edit]

Occam learnability implies PAC learnability, as the following theorem of Blumer, et al.[2] shows:

Theorem (Occam learning implies PAC learning)

[edit]

Let be an efficient -Occam algorithm for using . Then there exists a constant such that for any , for any distribution , given samples drawn from and labelled according to a concept of length bits each, the algorithm will output a hypothesis such that with probability at least .

Here, is with respect to the concept and distribution . This implies that the algorithm is also a PAC learner for the concept class using hypothesis class . A slightly more general formulation is as follows:

Theorem (Occam learning implies PAC learning, cardinality version)

[edit]

Let . Let be an algorithm such that, given samples drawn from a fixed but unknown distribution and labeled according to a concept of length bits each, outputs a hypothesis that is consistent with the labeled samples. Then, there exists a constant such that if , then is guaranteed to output a hypothesis such that with probability at least .

While the above theorems show that Occam learning is sufficient for PAC learning, it doesn't say anything about necessity. Board and Pitt show that, for a wide variety of concept classes, Occam learning is in fact necessary for PAC learning.[3] They proved that for any concept class that is polynomially closed under exception lists, PAC learnability implies the existence of an Occam algorithm for that concept class. Concept classes that are polynomially closed under exception lists include Boolean formulas, circuits, deterministic finite automata, decision-lists, decision-trees, and other geometrically-defined concept classes.

A concept class is polynomially closed under exception lists if there exists a polynomial-time algorithm such that, when given the representation of a concept and a finite list ofexceptions, outputs a representation of a concept such that the concepts and agree except on the set .

Proof that Occam learning implies PAC learning

[edit]

We first prove the Cardinality version. Call a hypothesis badif, where again is with respect to the true concept and the underlying distribution . The probability that a set of samples is consistent with is at most , by the independence of the samples. By the union bound, the probability that there exists a bad hypothesis in is at most , which is less than if. This concludes the proof of the second theorem above.

Using the second theorem, we can prove the first theorem. Since we have a -Occam algorithm, this means that any hypothesis output by can be represented by at most bits, and thus . This is less than if we set for some constant . Thus, by the Cardinality version Theorem, will output a consistent hypothesis with probability at least . This concludes the proof of the first theorem above.

Improving sample complexity for common problems

[edit]

Though Occam and PAC learnability are equivalent, the Occam framework can be used to produce tighter bounds on the sample complexity of classical problems including conjunctions,[2] conjunctions with few relevant variables,[4] and decision lists.[5]

Extensions

[edit]

Occam algorithms have also been shown to be successful for PAC learning in the presence of errors,[6][7] probabilistic concepts,[8] function learning[9] and Markovian non-independent examples.[10]

See also

[edit]

References

[edit]
  1. ^ a b Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's razor. Information processing letters, 24(6), 377-380.
  • ^ a b c Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory, chapter 2. MIT press.
  • ^ Board, R., & Pitt, L. (1990, April). On the necessity of Occam algorithms. In Proceedings of the twenty-second annual ACM symposium on Theory of computing (pp. 54-63). ACM.
  • ^ Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework Archived 2013-04-12 at the Wayback Machine. Artificial intelligence, 36(2), 177-221.
  • ^ Rivest, R. L. (1987). Learning decision lists. Machine learning, 2(3), 229-246.
  • ^ Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343-370.
  • ^ Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807-837.
  • ^ Kearns, M. J., & Schapire, R. E. (1990, October). Efficient distribution-free learning of probabilistic concepts. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 382-391). IEEE.
  • ^ Natarajan, B. K. (1993, August). Occam's razor for functions. In Proceedings of the sixth annual conference on Computational learning theory (pp. 370-376). ACM.
  • ^ Aldous, D., & Vazirani, U. (1990, October). A Markovian extension of Valiant's learning model. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 392-396). IEEE.
  • Further reading

    [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=Occam_learning&oldid=1172116408"

    Categories: 
    Theoretical computer science
    Computational learning theory
    Hidden categories: 
    Webarchive template wayback links
    Articles with short description
    Short description matches Wikidata
     



    This page was last edited on 25 August 2023, at 02:07 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki