Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Text to be formatted by the recipient  





2 Text preformatted by the originator  





3 Encodings and definitions  





4 Security issues  





5 See also  





6 References  














Soft hyphen






Deutsch
Ελληνικά
Español
Esperanto
فارسی
Français
Magyar

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


ISO symbol for soft hyphen

In computing and typesetting, a soft hyphen (Unicode U+00AD SOFT HYPHEN (­)) or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain invisible within the line.

Two alternative ways of using the soft hyphen character for this purpose have emerged, depending on whether the encoded text will be broken into lines by its recipient, or has already been preformatted by its originator.[1][2][3]

Text to be formatted by the recipient

[edit]

The use of SHY characters in text that will be broken into lines by the recipient is the application context considered by the post-1999 HTML and Unicode specifications, as well as some word-processing file formats. In this context, the soft hyphen may also be called a discretionary hyphenoroptional hyphen. It serves as an invisible marker used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. It becomes visible only after word wrapping at the end of a line.[4] The soft hyphen's Unicode semantics and HTML implementation are in many ways similar to Unicode's zero-width space, with the exception that the soft hyphen will preserve the kerning of the characters on either side when not visible. The zero-width space, on the other hand, will not, as it is considered a visible character even if not rendered, thus having its own kerning metrics.

To show the effect of a soft hyphen in HTML, the words of the following text (from the poem Spring and FallbyGerard Manley Hopkins) have been separated with soft hyphens:

Margaret­Are­You­Grieving­Over­Goldengrove­Unleaving­Leaves­Like­The­Things­Of­Man­You­With­Your­Fresh­Thoughts­Care­For­Can­You­Ah­As­The­Heart­Grows­Older­It­Will­Come­To­Such­Sights­Colder­By­And­By­Nor­Spare­A­Sigh­Though­Worlds­Of­Wanwood­Leafmeal­Lie­And­Yet­You­Will­Weep­And­Know­Why­Now­No­Matter­Child­The­Name­Sorrows­Springs­Are­The­Same­Nor­Mouth­Had­No­Nor­Mind­Expressed­What­Heart­Heard­Of­Ghost­Guessed­It­Is­The­Blight­Man­Was­Born­For­It­Is­Margaret­You­Mourn­For

On HTML browsers supporting soft hyphens, resizing the window will re-break the above text only at word boundaries, and insert a hyphen at the end of each line.

Text preformatted by the originator

[edit]

The SHY character is also used in text where paragraphs have already been broken into lines, such as certain plain text files, text sent to VT100-style terminal emulators or printers, or pages represented in page description languages. This is the application context originally considered by the EBCDIC and ISO 8859-1 standards and implemented in many VT100 terminal emulators.[1][2]

Here, SHY is a visible hyphen that is usually visually indistinguishable from a regular hyphen, but has been inserted solely for the purpose of line breaking. The purpose of the soft hyphen here is to distinguish it from any regular hyphen that might have been part of the original spelling of the word. This distinction helps re-use of already formatted text, when line breaks and soft hyphens inserted during word wrapping have to be removed to convert the text back into its unformatted form. For example, the copy or paste function of a terminal emulator can offer to replace line breaks with a space character, and remove any soft hyphens including any immediately following whitespace characters.

An example application that outputs soft hyphens for this reason is the groff text formatter as used on many Unix/Linux systems to display man pages.

Encodings and definitions

[edit]

Soft hyphen (SHY) characters in coded characters sets, roughly in chronological order:

Other commands for marking hyphenation opportunities in text formatting languages (similar to the HTML 4 and Unicode 4.0 interpretation of SHY):

Security issues

[edit]

Soft hyphens, like other invisible characters, have been used to obscure malicious domainsorURLsine-mail spam.[9][10]

They are also used in emails to try to defeat spam prevention systems. For example, the phrase "I need your assista­nce discreetly" has a soft hyphen in the word assistance which may mean a mail system would not detect the phrase in the email body.[citation needed]

See also

[edit]

References

[edit]
  1. ^ a b c Jukka Korpela (January 2011). "Soft hyphen (SHY) – a hard problem?". Tampere University of Technology. Retrieved 8 April 2011.
  • ^ a b Markus G. Kuhn (4 June 2003). "Unicode interpretation of SOFT HYPHEN breaks ISO 8859-1 compatibility" (PDF). Unicode Technical Committee. L2/03-155R.
  • ^ Eric Muller (14 August 2002). "Yes, SOFT HYPHEN is a hard problem". Unicode Technical Committee. L2/02-279.
  • ^ "CSS Text Module Level 3 Specification". W3C Candidate Recommendation Draft. World Wide Web Consortium (W3C). Retrieved 7 August 2022.
  • ^ "Extended Binary-Coded Decimal Interchange Code - S/390". comsci.us. Retrieved 8 April 2011.
  • ^ "Glossary". IBM. Retrieved 8 April 2011.
  • ^ DIN (15 July 1979). Additional Control Functions for Bibliographic Use according to German Standard DIN 31626 (PDF). ITSCJ/IPSJ. ISO-IR-40.
  • ^ "Commonly Confused Characters". Greg Baker, Simon Fraser University. Retrieved 12 July 2011.
  • ^ "Spammers Using Soft Hyphen To Hide Malicious URLs". Slashdot. 7 October 2010. Retrieved 8 April 2011.
  • ^ "Soft Hyphen – A New URL Obfuscation Technique". Symantec. Retrieved 8 April 2011.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Soft_hyphen&oldid=1226646546"

    Categories: 
    Punctuation
    Typography
    Control characters
    Whitespace
    Unicode formatting code points
    Hidden categories: 
    Articles with short description
    Short description matches Wikidata
    All articles with unsourced statements
    Articles with unsourced statements from June 2024
    Use dmy dates from November 2018
     



    This page was last edited on 1 June 2024, at 00:23 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki