Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Background  





2 Methodology  





3 Impact and mitigation  





4 References  





5 External links  














Trojan Source







Add links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Trojan Source
CVE identifier(s)
  • CVE-2021-42574
  • CVE-2021-42694
  • Date discoveredSeptember 9, 2021; 2 years ago (2021-09-09)
    DiscovererNicholas Boucher, Ross Anderson
    Affected softwareUnicode, source code
    Websitetrojansource.codes

    Trojan Source is the name of a software vulnerability that abuses Unicode's bidirectional characters to display source code differently than the actual execution of the source code.[1] The exploit utilizes how writing scripts of different reading directions are displayed and encoded on computers. It was discovered by Nicholas Boucher and Ross Anderson at Cambridge University in late 2021.[2]

    Background

    [edit]

    Unicode is an encoding standard for representing text, symbols, and glyphs. Unicode is the most dominant encoding on computers, used in over 98% of websites as of September 2023.[3] It supports many languages, and because of this, it must support different methods of writing text. This requires support for both left-to-right languages, such as English and Russian, and right-to-left languages, such as Hebrew and Arabic. Since Unicode aims to enable using more than one writing system, it must be able to mix scripts with different display orders and resolve conflicting orders. As a solution, Unicode contains characters called bidirectional characters (Bidi) that describe how text is displayed and represented. These characters can be abused to change how text is interpreted without changing it visually, as the characters are often invisible.[4]

    Relevant Unicode bidirectional formatting characters
    Abbreviation Name Description
    LRE U+202A LEFT-TO-RIGHT EMBEDDING Try treating following text as left-to-right.
    RLE U+202B RIGHT-TO-LEFT EMBEDDING Try treating following text as right-to-left.
    LRO U+202D LEFT-TO-RIGHT OVERRIDE Force treating following text as left-to-right.
    RLO U+202E RIGHT-TO-LEFT OVERRIDE Force treating following text as right-to-left.
    LRI U+2066 LEFT-TO-RIGHT ISOLATE Force treating following text as left-to-right without affecting adjacent text.
    RLI U+2067 RIGHT-TO-LEFT ISOLATE Force treating following text as right-to-left without affecting adjacent text.
    FSI U+2068 FIRST STRONG ISOLATE Force treating following text in direction indicated by the next character.
    PDF U+202C POP DIRECTIONAL FORMATTING Terminate nearest LRE, RLE, LRO, or RLO.
    PDI U+2069 POP DIRECTIONAL ISOLATE Terminate nearest LRI or RLI.

    Methodology

    [edit]

    In the exploit, bidirectional characters are abused to visually reorder text in source code so that later execution occurs in a different order. Bidirectional characters can be inserted in areas of source code where string literals are allowed. This often applies to documentation, variables, or comments.

    Vulnerable Python code
    Source code with hints Source code displayed visually Source code interpreted
    def sum(num1, num2):
      '''Add num1 and num2, and [RLI] ''' ;return
      return num1 + num2
    
    def sum(num1, num2):
      '''Add num1 and num2, and return; '''
      return num1 + num2
    
    def sum(num1, num2):
      '''Add num1 and num2, and ''' ;
      return
      return num1 + num2
    

    In the above example, the RLI mark (right-to-left isolate) forces the following text to be interpreted differently than it is displayed: the triple-quote is first (ending the string), followed by a semicolon (starting a new line), and finally with the premature return (returning None and ignoring any code below it). The new line terminates the RLI mark, preventing it from flowing into the below code. Because of the Bidi character, some source code editors and IDEs rearrange the code for display without any visual indication that the code has been rearranged, so a human code reviewer would not normally detect them. However, when the code is inserted into a compiler, the compiler may ignore the Bidi character and process the characters in a different order than visually displayed. When the compiler is finished, it could potentially execute code that visually appeared to be non-executable.[5] Formatting marks can be combined multiple times to create complex attacks.[6]

    Impact and mitigation

    [edit]

    Programming languages that support Unicode strings and follow Unicode's Bidi algorithm are vulnerable to the exploit. This includes languages like Java, Go, C, C++, C#, Python, and JavaScript.[7]

    While the attack is not strictly an error, many compilers, interpreters, and websites added warnings or mitigations for the exploit. Both GNU GCC and LLVM received requests to deal with the exploit.[8] Marek Polacek submitted a patch to GCC shortly after the exploit was published that implemented a warning for potentially unsafe directional characters; this functionality was merged for GCC 12 under the -Wbidi-chars flag.[9][10] LLVM also merged similar patches. Rust fixed the exploit in 1.56.1, rejecting code that includes the characters by default. The developers of Rust found no vulnerable packages prior to the fix.[11]

    Red Hat issued an advisory on their website, labeling the exploit as "moderate".[12] GitHub released a warning on their blog, as well as updating the website to show a dialog box when Bidi characters are detected in a repository's code.[13]

    References

    [edit]
    1. ^ "'Trojan Source' Bug Threatens the Security of All Code – Krebs on Security". November 2021. Archived from the original on 2022-01-14. Retrieved 2022-01-17.
  • ^ "VU#999008 - Compilers permit Unicode control and homoglyph characters". www.kb.cert.org. Archived from the original on 2022-01-21. Retrieved 2022-01-17.
  • ^ "Usage Survey of Character Encodings broken down by Ranking". w3techs.com. Archived from the original on 2022-01-21. Retrieved 2022-01-17.
  • ^ "UAX #9: Unicode Bidirectional Algorithm". www.unicode.org. Archived from the original on 2019-05-02. Retrieved 2022-01-17.
  • ^ Edge, Jake (2021-11-03). "Trojan Source: tricks (no treats) with Unicode [LWN.net]". lwn.net. Retrieved 2022-03-12.
  • ^ Stockley, Mark (2021-11-03). "Trojan Source: Hiding malicious code in plain sight". Malwarebytes Labs. Retrieved 2022-03-12.
  • ^ Tung, Liam. "Programming languages: This sneaky trick could allow attackers to hide 'invisible' vulnerabilities in code". ZDNet. Archived from the original on 2021-12-21. Retrieved 2022-01-21.
  • ^ "GCC & LLVM Patches Pending To Fend Off Trojan Source Attacks". www.phoronix.com. Archived from the original on 2021-12-01. Retrieved 2022-01-17.
  • ^ Malcolm, David (2022-01-12). "Prevent Trojan Source attacks with GCC 12". Red Hat Developer. Archived from the original on 2022-01-17. Retrieved 2022-01-17.
  • ^ "Warning Options (Using the GNU Compiler Collection (GCC))". gcc.gnu.org. Archived from the original on 2018-12-05. Retrieved 2022-01-17.
  • ^ "Security advisory for rustc (CVE-2021-42574) | Rust Blog". blog.rust-lang.org. Archived from the original on 2021-11-30. Retrieved 2022-01-21.
  • ^ "RHSB-2021-007 Trojan source attacks (CVE-2021-42574,CVE-2021-42694)". Red Hat Customer Portal. Archived from the original on 2022-01-17. Retrieved 2022-01-21.
  • ^ "Warning about bidirectional Unicode text | GitHub Changelog". The GitHub Blog. 31 October 2021. Archived from the original on 2022-01-15. Retrieved 2022-01-21.
  • [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=Trojan_Source&oldid=1230782123"

    Categories: 
    2021 in computing
    Injection exploits
    Software bugs
    Hidden categories: 
    Articles with short description
    Short description matches Wikidata
    Articles containing potentially dated statements from September 2023
    All articles containing potentially dated statements
     



    This page was last edited on 24 June 2024, at 17:43 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki