textarea
and pre
Elements
●7. Attributes
●7.1 Disallowed Attributes
●7.2 Language Attributes
●7.3 Attributes with Special Considerations
●7.3.1 The id
Attribute
●8. Named Entity References
●9. Script and Style
●9.1 External Script and Style
●9.2 In-line Script and Style
●10. Comments in Polyglot Markup
●11. Exceptions from the Foreign Content Parsing Rules
●12. Example Document
●A. Acknowledgements
●B. References
●B.1 Normative references
●B.2 Informative references
text/html
(if the content is transmitted to an HTML-aware user agent)
or appl
ication/xhtml+xml
(if the content is transmitted to an XHTML-aware user agent).
Other permissible MIME types are text/xml
, appli
cation/xml
,
and any MIME type whose subtype ends with the four characters "+xml
". [XML-MT]
Polyglot markup results in:
●a valid HTML document. [HTML5]
●awell-formed XML document.
[XML10]
●
identical DOMs when processed as HTML and when processed as XML.
A noteable exception to this is that HTML and XML parsers generate different DOMs for
some xml
(xml:lang
, xml:space
, and xml:base
),
xmlns
(xmlns=""
and xmlns:xlink=""
), and xlink
(such as xlink:href
) attributes.
XML requires and HTML5 permits these attributes in certain locations and the attributes are preserved by HTML parsers.
Polyglot markup is not constrained:
●
to be valid XML.
[XML10]
●by conformance to any XML DTD.
Polyglot markup is
scripted according to the rules of XML (does not use document.write
, for example)
and excludes HTML elements that are impossible to replicate in an XML parser (does not use the noscript
element, for example).
Polyglot markup triggers non-quirks mode in HTML parsers,
as non-quirks mode is closest to XML-mode rendering, in regard to both DOM and CSS.
Polyglot markup results in
the same encoding and the same language in both HTML-mode and XML-mode.
All web content need not be authored in polyglot markup.
Polyglot markup is ideal for publishing when
there's a strong desire to serve both HTML and XML tool chains
without simultaneously having to maintain dual copies of the content: one in HTML and a second in XHTML.
In addition, a single polyglot markup output requires
less infrastructure to produce than to produce both HTML and XHTML output for the same content.
Polyglot markup is also be beneficial when lightweight processes—such as
quick testing or even hand-authoring—are applied to content intended to be published both as HTML and XHTML,
especially if that content is not sent through a tool chain.
<meta charset="UTF-8
"/>
(the HTML encoding declaration).
●Outside the document
●By adding "charset=utf-8"
to the MIME/HTTP Content-Type header [HTTP11], as the following examples show in HTML and XML, respectively:
Content-type: text/html; charset=utf-8
Content-type: application/xhtml+xml; charset=utf-8
The HTML encoding declaration has no effect in XML.
When the HTML encoding declaration is the only encoding declaration,
the encoding default from XML makes XML parsers treat content as UTF-8.
The W3C Internationalization (i18n) Group recommends
to always include
a visible encoding declaration in a document, because it helps
developers, testers, or translation production managers to check the
encoding of a document visually.
DOCTYPE
is in uppercase letters.
●The string html
is in lowercase letters.
●The string SYSTEM
, if present, is in uppercase letters.
●The string PUBLIC
, if present, is in uppercase letters.
●A Formal Public Identifier (FPI), if present, is a case-sensitive match of the registered FPI to which it points.
●A URI, if present in the document type declaration, is a case-sensitive match of the URI to which it points.
●If the URI is the string abou
t:legacy-compat
, polyglot markup includes the string in lowercase letters, as required by HTML5.
●If the URI is an http URL, the URI points to the correct resource, using case-sensitive letters.
Note that using about:legacy-com
pat
in XML may yield unpredictable parsing results, depending on the XML processing pipeline.
Polyglot markup does not use document type declarations for HTML4, HTML3, or HTML2, regardless of whether they contain a URI or not and
regardless of their effect in HTML5 parsers, as these document type declarations are not compatible with XHTML.
html
, the root SVG element, svg
,
and the root MathML element, math
.
Polyglot markup
declares the following default namespaces, when the markup languages
are included in the document, to maintain XML-compatibility [XML10]:
●<html xmlns="http://www.w3.org/
1999/xhtml">
●<math xmlns="http://www.w3.org/
1998/Math/MathML">
●<svg xmlns="http://www.w3.org/2
000/svg">
Polyglot markup declares the default namespaces on the root HTML element, html
,
the root SVG element, svg
, and the root MathML element math
,
and on any HTML elements used as children of SVG or MathML elements.
Polyglot markup does not declare any other default or prefixed element namespace, because
[HTML5] does not natively support the declaring of any other default or prefixed element namespace.
xlink:
.
Polyglot markup declares the XLink namespace on the HTML root element (html
) or
once on the foreign element where it is used (svg
ormath
), to maintain XML-compatibility [XML10].
Inpolyglot markup, the xlink prefix uses the namespace declaration xmlns:xlink="http://www.w3.org
/1999/xlink"
before using the xlink prefix for the following attributes:
●xlink:actuate
●xlink:arcrole
●xlink:href
●xlink:role
●xlink:show
●xlink:title
●xlink:type
Furthermore, polyglot markup defines the xlink prefix only on foreign elements (any SVG or MathML element) but not the root html
element or any other HTML element.
Note that there are other prefixed attributes that can be used beyond xlink:href
(such as xml:ba
se
).
Polyglot markup does not declare these prefixes via xmlns. The prefixes are implicitly declared in XML and are automatically
applied to the appropriate attributes in HTML.
html
, head
, title
,
and body
element.
The html
element is the root element.
The head
and body
elements are children of the html
element.
The title
element is a child of the head
element.
Therefore, the following source code would be the most basic polyglot markup document.
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang=""> <head> <title></title> </head> <body> </body> </html>Whenever it uses a
tr
element, polyglot markup always wraps the tr
element inside a
tbody
, thead
, or tfoot
element.
In HTML, if a group of one or more adjacent tr
elements are not explictly wrapped inside a tbody
, thead
, or tfoot
element,
the HTML parser creates and wraps a new tbody
element around the tr
elements.
XML parsers do not create the tbody
element, thus offering the potential for creating different DOMs.
Correct:
<table> <tbody> <tr>...Incorrect:
<table> <tr>...Whenever it uses
col
elements within a table
element, polyglot markup explicitly uses a colgroup
element surrounding groups of the col
elements.
In HTML, if a group of one or more adjacent col
elements are not explicitly wrapped inside a colgroup
element,
the HTML parser creates and wraps a new colgroup
element around the col
elements.
XML parsers do not create the colgroup
element, thus offering the potential for creating different DOMs.
Correct:
<table> <colgroup> <col>...Incorrect:
<table> <col>...
noscript
element, because
the n
oscript
element cannot be used in XML documents. [HTML5]
altGlyph
●altGlyphDef
●altGlyphItem
●animateColor
●animateMotion
●animateTransform
●clipPath
●feBlend
●feColorMatrix
●feComponentTransfer
●feComposite
●feConvolveMatrix
●feDiffuseLighting
●feDisplacementMap
●feDistantLight
●feFlood
●feFuncA
●feFuncB
●feFuncG
●feFuncR
●feGaussianBlur
●feImage
●feMerge
●feMergeNode
●feMorphology
●feOffset
●fePointLight
●feSpecularLighting
●feSpotLight
●feTile
●feTurbulence
●foreignObject
●glyphRef
●linearGradient
●radialGradient
●textPath
definitionurl
,
which polyglot markup changes to the mixed case definitionURL
.
●Polyglot markup uses lowercase letters in attribute names for all SVG elements except the following,
for which polyglot markup uses mixed case:
●attributeName
●attributeType
●baseFrequency
●baseProfile
●calcMode
●clipPathUnits
●contentScriptType
●contentStyleType
●diffuseConstant
●edgeMode
●externalResourcesRequired
●filterRes
●filterUnits
●glyphRef
●gradientTransform
●gradientUnits
●kernelMatrix
●kernelUnitLength
●keyPoints
●keySplines
●keyTimes
●lengthAdjust
●limitingConeAngle
●markerHeight
●markerUnits
●markerWidth
●maskContentUnits
●maskUnits
●numOctaves
●pathLength
●patternContentUnits
●patternTransform
●patternUnits
●pointsAtX
●pointsAtY
●pointsAtZ
●preserveAlpha
●preserveAspectRatio
●primitiveUnits
●refX
●refY
●repeatCount
●repeatDur
●requiredExtensions
●requiredFeatures
●specularConstant
●specularExponent
●spreadMethod
●startOffset
●stdDeviation
●stitchTiles
●surfaceScale
●systemLanguage
●tableValues
●targetX
●targetY
●textLength
●viewBox
●viewTarget
●xChannelSelector
●yChannelSelector
●zoomAndPan
accept
●accept-charset
●charset
●checked
●defer
●dir
●direction
●disabled
●enctype
●hreflang
●http-equiv
●lang
●media
●method
●multiple
●readonly
●rel
(for values that do not contain a colon)
●scope
●selected
●shape
●target
(keywords only; browsing context names are case-sensitive)
●type
(ona
, link
, object
, script
, or style
elements)
●type
(on input)
Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes.
area
●base
●br
●col
●command
●embed
●hr
●img
●input
●keygen
●link
●meta
●param
●source
Polyglot markup uses the minimized tag syntax for void elements, e.g. <br/>
,
rather than the alternative syntax <br></br>
.
Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph)
polyglot markup does not use the minimized form (e.g. the document uses <p></p
>
and not <p />
).
Note that MathML and SVG elements may be either self-closing or contain content.
HTTP header: Content-language: ru
Whenever there is an HTTP Content-Language: header (whose value is no more and no less than exactly one language tag),
polyglot markup declares both the lang
and the xml:lang
attributes on the root element.
For more information, see Language Attributes.
As a general practice and for the sake of expediency and simplicity, polyglot markup may always include
both the xml:lang
as well as the lang
attributes on the root element.
HTTP Content-Type:
header has no extra rules or restrictions,
whereas polyglot markup does not use the http-equiv="Content-Ty
pe"
declaration on the meta
element.
For more specific information about using the HTTP Content-
Type:
header, see Specifying a Document's Character Encoding.
textarea
and p
re
Elements
textarea
orpre
element,
the text within the element does not begin with a newline.
	
for a tab
rather than the literal character '\t'
.
This is because of attribute-value normalization in XML [XML10].
The following example uses numeric character references (escaped
characters) for the line feed, tab, and less-than characters within a srcdoc
attribute.
<iframe srcdoc="<p>Hello 
 	 world!</p>" src="demo_iframe_srcdoc.htm"></iframe>Because of attribute-value normalization in XML [XML10], polyglot markup does not use newline characters within an attribute. Practically speaking, for source code with newlines within attributes, DOMs generated via XML and HTML will be different; however, whitespace differences have no behavioral impact on the page unless explicitly examined by JavaScript, rendering the differences of small consequence. Note that newlines are overtly not allowed in the
title
attribute or in any attribute containing a URI.
See also Attribute Values.
xml:space
●xml:base
Note that the xml:space
and xml:
base
attributes are allowed on SVG and MathML elements.
lang
and xml:l
ang
attributes.
Neither attribute is to be used without the other, and polyglot markup maintains identical values for both lang
and xml:lang
.
Polyglot markup uses the language attributes in the html
element to set the default language for the document overtly.
Although HTML5 sets the language of the root element via a fallback
language mechanism, this mechanism is not required to work in XML.
HTML5 activates the fallback language mechanism whenever the root element lacks language attributes.
For the mechanism to actually set a fallback language, however, it has to locate either an http-equiv="Content-Language"
declaration on the meta
element
or an HTTP Content-Language:
header, either of whose content value is no more and no less than exactly one language tag.
Note that although the mechanism can locate either the meta element or the header, the meta element is considered first.
For more information about determining language in HTML5, see the language determination rules. [HTML5].
id
Attribute
id
attribute.
This is because values for the id
attribute may not contain space characters in HTML5. [HTML5]
amp
●lt
●gt
●apos
●quot
For entities beyond the previous list, polyglot markup uses character references.
For example, polyglot markup uses  
instead of
.
Note that polyglot markup may use decimal values for escape characters (such as   in the previous example);
however, the Character Model for the World Wide Web recommends
that content should use the hexadecimal form of character escapes rather than the decimal form when both are available. [CHARMOD]
Polyglot markup always uses character references for the less than sign (<
) and ampersand (&
) when they are used as characters,
except when those characters appear inside a CDATA section.
<script src="external.js"></script>
<link rel="stylesheet" href="external.css"/>Although
document.write()
and do
cument.writeln()
are valid in an HTML document, neither function may be used in XHTML.
Therefore, neither is used in polyglot markup.
Instead, use the innerHTML
property for both HTML and XHTML.
Note that the innerHTML
property takes a string.
XML parsers parse the string as XML in XHTML.
HTML parsers parse the string as HTML in HTML.
Because of the difference in parsing, if you send the parser content that does not follow the rules for polyglot markup
the results will differ for a DOM create with an XML parser and one created with an HTML parser.
<
or&
or]]>
or--
.
Note that XML parsers are permitted to silently remove the
contents of comments;
therefore, the historical practice of hiding scripts and style
sheets within comments to make the documents backward compatible is
likely to not work as expected in XML-based user agents.
<
or&
character.
The following example is safe because it does not contain problematic characters within the scrip
t
tag.
<script>document.body.appendChild(document.createElement("div"));</script>
>
" or "->
".
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>A Sample Page Using Polglot Markup</title> <!-- The link element is self-closing as described in Section 6.4 Void Elements --> <!-- Style commands are included by linking to an external file rather than including them in-line, as described in Section 9. Script and Style --> <link type="text/css" rel="stylesheet" href="Sample.css"/> </head> <body> <h1>Sample Page Using Polyglot Markup</h1> <p> The source code for this document uses polyglot markup, a document that is a stream of bytes that parses into identical document trees (with the exception of the xmlns attribute on the root element) when processed as HTML and when processed as XML. The source code for this document also contains additional comments about the use of polyglot markup. </p> <h2>Foreign Elements</h2> <p> The following shapes use SVG elements. Polyglot markup introduces undeclared (native) default namespaces for the the root SVG element (<svg>) and respects the mixed-case element names and values when appropriate, as described in sections 5.1 Element-Level Namespaces, 6.3.1 Element Names, and 6.3.3 Attribute Values. </p> <!-- Polyglot markup declares the xlink: namespace on the <svg> element to maintain XML-compatibility --> <svg width="350" height="250" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <g> <title>Three SVG shapes</title> <desc> This SVG image contains an ellipse filled with a gradient that goes from white to blue as it moves outward from the center. A yellow rectangle with a black border overlaps the ellipse in the upper-left quadrant, and a red spiral on a white background overlaps the ellipse in the bottom-right quadrant. The red spiral is also a link to the example code for that SVG shape. </desc> <defs> <!-- Note that "radialGradient" and "myGradient" respect mixed-case values. --> <radialGradient id="myGradient" cx="50%" cy="50%" r="50%" fx="50%" fy="50%"> <stop offset="0%" style="stop-color:rgb(200,200,200); stop-opacity:0"/> <stop offset="100%" style="stop-color:rgb(0,0,255); stop-opacity:1"/> </radialGradient> </defs> <ellipse cx="50%" cy="50%" rx="50%" ry="42%" style="fill:url(#myGradient)"/> <rect x="0" y="0" width="100" height="100" style="fill: yellow; stroke: black;"/> <a xlink:href="http://www.w3schools.com/svg/tryit.asp?filename=path2&type=svg"> <!-- Note that the following attribute contains no newlines. --> <path transform="translate(60, -175)" d="M153 334 C153 334 151 334 151 334 C151 339 153 344 156 344 C164 344 171 339 171 334 C171 322 164 314 156 314 C142 314 131 322 131 334 C131 350 142 364 156 364 C175 364 191 350 191 334 C191 311 175 294 156 294 C131 294 111 311 111 334 C111 361 131 384 156 384 C186 384 211 361 211 334 C211 300 186 274 156 274" style="fill:white;stroke:red;stroke-width:2"/> </a> </g> </svg> <h2>Void Elements</h2> <!-- Given an empty instance of an element whose content model is not EMPTY (in this case, an empty paragraph) polyglot markup does not use the minimized form, as described in Section 6.4 Void Elements --> <p></p> <p> There is an empty <p> element before this paragraph. Polyglot markup uses <p></p> and not <p />. </p> <p> Polyglot markup treats certain elements as self-closing, void elements, such as the following <img> element. </p> <img height="48" width="72" alt="W3C" src="http://www.w3.org/Icons/w3c_home"/> <p> For more information, see Section 6.4 Void Elements. </p> <h2>Required Elements</h2> <p> The following table uses the required <tbody> element, as described in Section 6.1 Required Elements. </p> <table> <tbody> <tr> <th>Column One</th> <th>Column Two</th> </tr> <tr> <td>Row 1, Column 1</td> <td>Row 1, Column 2</td> </tr> <tr> <td>Row 2, Column 1</td> <td>Row 2, Column 2</td> </tr> <tr> <td>Row 3, Column 1</td> <td>Row 3, Column 2</td> </tr> </tbody> </table> <p> The following table uses the required <colgroup> element, as described in Section 6.1 Required Elements. </p> <table> <colgroup> <col style="background-color:silver"/> <col style="background-color:gray"/> <col style="background-color:yellow"/> </colgroup> <tbody> <tr> <th>ISBN</th> <th>Title</th> <th>Price</th> </tr> <tr> <td>3476896</td> <td>My first HTML</td> <td>$53</td> </tr> <tr> <td>1234567</td> <td>Intermediate Polyglot</td> <td>$49</td> </tr> </tbody> </table> <h2>Named Entity References</h2> <p> This paragraph uses the string "&" for ampersands ("&") and uses the string " " for a nonbreaking space between the words "polyglot markup," as described in Section 8. Named Entity References. </p> </body> </html>