Search

3/13/2009

Beware of XHTML

Sending XHTML as text/html Considered Harmful
Beware of XHTML

Web servers typically send this content type whenever the file extension is .html, and server-side scripting languages like PHP also typically send documents as text/html by default.
XHTML does not have the same content type as HTML. The proper content type for XHTML is application/xhtml+xml. Currently, many web servers don't have this content type reserved for any file extension, so you would need to modify the server configuration files or use a server-side scripting language to send the header manually. Simply specifying the content type in a meta element will not work over HTTP.
When a web browser sees the text/html content type, regardless of what the doctype says, it automatically assumes that it's dealing with plain old HTML. Therefore, rather than using the XML parsing engine, it treats the document like tag soup, expecting HTML content. Because HTML 4.01 and simple XHTML 1.0 are often very similar, the browser can still understand the page fairly well. Most major browsers consider things like the self-closing portion of a tag (as in
) as a simple HTML error and strip it out, usually ending up with the HTML equivalent of what the author intended.

A Null End Tag is a special shorthand form of a tag that allows you to save a few characters in the document. Instead of writing <title>My page</title>, you could simply write <title/My page/ to accomplish the same thing. Due to the rules of Null End Tags, a single slash in an empty element's start tag would close the tag right then and there, meaning <br/ is a complete and valid tag in HTML. As a result, if you have <br/> or <br />, a browser supporting Null End Tags would see that as a br element immediately followed by a simple > character. Therefore, an XHTML page treated as HTML could be littered with unwanted > characters.
This problem is often overlooked because most popular browsers today are lacking support for Null End Tags, as well as some other SGML shorthand features. However, there are still some smaller user agents that properly support Null End Tags. One of the more well-known user agents that support it is the W3C validator. If you send it a page that uses XHTML self-closing tags, but force it to parse the page as HTML/SGML like most user agents do for text/html pages, you can see the results in the outline: immediately after each of the self-closing elements, there is an unwanted > character that will be displayed on the page itself.

This XHTML document is well-formed and valid. However, the additional XML features are only handled correctly when it is sent as application/xhtml+xml. When handled correctly, this document should contain a fully formatted MathML equation and an SVG image (for browsers that support those technologies).
Differences in XHTML handling - Example 2

沒有留言: