Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Applications  



1.1  Digital circuitry  



1.1.1  Comparison with other encoding methods  



1.1.1.1  Advantages  





1.1.1.2  Disadvantages  









1.2  Natural language processing  





1.3  Machine learning and statistics  







2 See also  





3 References  














One-hot






العربية
Čeština
Deutsch
Español
فارسی
Français

Italiano
עברית

Polski
Português
Русский
Українська

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

(Redirected from One hot encoding)

Decimal Binary Unary One-hot
0 000 00000000 00000001
1 001 00000001 00000010
2 010 00000011 00000100
3 011 00000111 00001000
4 100 00001111 00010000
5 101 00011111 00100000
6 110 00111111 01000000
7 111 01111111 10000000

Indigital circuits and machine learning, a one-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).[1] A similar implementation in which all bits are '1' except one '0' is sometimes called one-cold.[2]Instatistics, dummy variables represent a similar technique for representing categorical data.

Applications[edit]

Digital circuitry[edit]

One-hot encoding is often used for indicating the state of a state machine. When using binary, a decoder is needed to determine the state. A one-hot state machine, however, does not need a decoder as the state machine is in the nth state if, and only if, the nth bit is high.

Aring counter with 15 sequentially ordered states is an example of a state machine. A 'one-hot' implementation would have 15 flip flops chained in series with the Q output of each flip flop connected to the D input of the next and the D input of the first flip flop connected to the Q output of the 15th flip flop. The first flip flop in the chain represents the first state, the second represents the second state, and so on to the 15th flip flop, which represents the last state. Upon reset of the state machine all of the flip flops are reset to '0' except the first in the chain, which is set to '1'. The next clock edge arriving at the flip flops advances the one 'hot' bit to the second flip flop. The 'hot' bit advances in this way until the 15th state, after which the state machine returns to the first state.

Anaddress decoder converts from binary to one-hot representation. A priority encoder converts from one-hot representation to binary.

Comparison with other encoding methods[edit]

Advantages[edit]
Disadvantages[edit]

Natural language processing[edit]

Innatural language processing, a one-hot vector is a 1 × N matrix (vector) used to distinguish each word in a vocabulary from every other word in the vocabulary.[5] The vector consists of 0s in all cells with the exception of a single 1 in a cell used uniquely to identify the word. One-hot encoding ensures that machine learning does not assume that higher numbers are more important. For example, the value '8' is bigger than the value '1', but that does not make '8' more important than '1'. The same is true for words: the value 'laughter' is not more important than 'laugh'.

Machine learning and statistics[edit]

In machine learning, one-hot encoding is a frequently used method to deal with categorical data. Because many machine learning models need their input variables to be numeric, categorical variables need to be transformed in the pre-processing part. [6]

Label Encoding
Food Name Categorical # Calories
Apple 1 95
Chicken 2 231
Broccoli 3 50
One Hot Encoding
Apple Chicken Broccoli Calories
1 0 0 95
0 1 0 231
0 0 1 50

Categorical data can be either nominalorordinal.[7] Ordinal data has a ranked order for its values and can therefore be converted to numerical data through ordinal encoding.[8] An example of ordinal data would be the ratings on a test ranging from A to F, which could be ranked using numbers from 6 to 1. Since there is no quantitative relationship between nominal variables' individual values, using ordinal encoding can potentially create a fictional ordinal relationship in the data.[9] Therefore, one-hot encoding is often applied to nominal variables, in order to improve the performance of the algorithm.

For each unique value in the original categorical column, a new column is created in this method. These dummy variables are then filled up with zeros and ones (1 meaning TRUE, 0 meaning FALSE).[citation needed]

Because this process creates multiple new variables, it is prone to creating a 'big p' problem (too many predictors) if there are many unique values in the original column. Another downside of one-hot encoding is that it causes multicollinearity between the individual variables, which potentially reduces the model's accuracy.[citation needed]

Also, if the categorical variable is an output variable, you may want to convert the values back into a categorical form in order to present them in your application.[10]

In practical usage, this transformation is often directly performed by a function that takes categorical data as an input and outputs the corresponding dummy variables. An example would be the dummyVars function of the Caret library in R.[11]

See also[edit]

References[edit]

  1. ^ Harris, David and Harris, Sarah (2012-08-07). Digital design and computer architecture (2nd ed.). San Francisco, Calif.: Morgan Kaufmann. p. 129. ISBN 978-0-12-394424-5.{{cite book}}: CS1 maint: multiple names: authors list (link)
  • ^ Harrag, Fouzi; Gueliani, Selmene (2020). "Event Extraction Based on Deep Learning in Food Hazard Arabic Texts". arXiv:2008.05014. {{cite journal}}: Cite journal requires |journal= (help)
  • ^ Xilinx. "HDL Synthesis for FPGAs Design Guide". section 3.13: "Encoding State Machines". Appendix A: "Accelerate FPGA Macros with One-Hot Approach". 1995.
  • ^ Cohen, Ben (2002). Real Chip Design and Verification Using Verilog and VHDL. Palos Verdes Peninsula, CA, US: VhdlCohen Publishing. p. 48. ISBN 0-9705394-2-8.
  • ^ Arnaud, Émilien; Elbattah, Mahmoud; Gignon, Maxime; Dequen, Gilles (August 2021). NLP-Based Prediction of Medical Specialties at Hospital Admission Using Triage Notes. 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI). Victoria, British Columbia. pp. 548–553. doi:10.1109/ICHI52183.2021.00103. Retrieved 2022-05-22.
  • ^ Brownlee, Jason. (2017). "Why One-Hot Encode Data in Machine Learning?". Machinelearningmastery. https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
  • ^ Stevens, S. S. (1946). “On the Theory of Scales of Measurement”. Science, New Series, 103.2684, pp. 677–680. http://www.jstor.org/stable/1671815.
  • ^ Brownlee, Jason. (2020). "Ordinal and One-Hot Encodings for Categorical Data". Machinelearningmastery. https://machinelearningmastery.com/one-hot-encoding-for-categorical-data//
  • ^ Brownlee, Jason. (2020). "Ordinal and One-Hot Encodings for Categorical Data". Machinelearningmastery. https://machinelearningmastery.com/one-hot-encoding-for-categorical-data//
  • ^ Brownlee, Jason. (2017). "Why One-Hot Encode Data in Machine Learning?". Machinelearningmastery. https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
  • ^ Kuhn, Max. “dummyVars”. RDocumentation. https://www.rdocumentation.org/packages/caret/versions/6.0-86/topics/dummyVars

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=One-hot&oldid=1228357066"

    Categories: 
    Digital electronics
    1 (number)
    Hidden categories: 
    CS1 maint: multiple names: authors list
    CS1 errors: missing periodical
    Articles with short description
    Short description matches Wikidata
    Use dmy dates from May 2019
    All articles with unsourced statements
    Articles with unsourced statements from June 2024
    Pages displaying wikidata descriptions as a fallback via Module:Annotated link
    Pages displaying short descriptions of redirect targets via Module:Annotated link
     



    This page was last edited on 10 June 2024, at 20:16 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki