Search

1/11/2012

Use Lowercase Markup For Better Compression - avoid uppercase markup to improve xhtml and html compression

Use Lowercase Markup For Better Compression - avoid uppercase markup to improve xhtml and html compression

Lowercase markup compresses more efficiently than uppercase markup. Along with the benefits of XHTML compatibility, lowercase markup allows HTTP compression to work more efficiently by increasing redundancy. In this article we show the benefits of using lowercase markup on five popular sites.

How Lowercase Markup Helps GZIP Compression

The GZIP compression algorithm used in mod_gzip and other HTTP compression programs works by substituting shorter tokens for longer identical strings. By using more lowercase strings, especially repetitious table and div structures, you increase the likelihood of more string matches. While an HTML file of all lowercase markup is the same size as a mixed case HTML file, it compresses more efficiently. Even if you don't use HTTP compression on your site, your users on dialup accelerators like Earthlink's Accelerator and AOL's Topspeed will benefit from your lowercase markup.

How Much Smaller?

To test the effectiveness of lowercase markup we compressed the HTML homepages of five random sites before and after lowercasing all of their HTML markup (see Table 1).

Table 1: Lowercase versus Mixed Uppercase Markup Compression

Homepage Uncompressed HTML (bytes) GZIP -6 Compressed Compressed after lowercase markup Percent Smaller
ABCNews.com* 49,959 11,125 10,785 3.05
Guardian.co.uk 73,772 14,080 13,808 1.93
JCPenny.com* 19,728 3,310 3,154 4.71
Olympics.com* 26,927 6,273 6,126 2.34
Slashdot.org* 49,291 12,589 12,434 1.23
Average 2.65
*Uses HTTP compression. The homepages tested were ABCNews.com, Guardian.co.uk, JCPenny.com, Olympics.com, and Slashdot.org. Note that mod_gzip defaults to gzip -6 for compression to give the best balance between speed and size.

On average the all-lowercase markup saved an additional 2.65 percent off these compressed home pages. All lowercase markup saved from 1.23% (Slashdot.org) to 4.71% (JCPenny.com) off the compressed mixed case home pages. Four out of the five sites tested with our Web Page Analyzer used HTTP compression, so most of these sites would benefit from switching to lowercase markup and accelerated Guardian users would also benefit.

Conclusion

On average using all lowercase markup saved 2.65% off of compressed HTML file size. JCPenny.com would realize over 4.7% smaller HTML files using all lowercase markup after compression. You can achieve higher compression ratios by adopting the same approach to your CSS and JavaScript markup to maximize the efficiency of GZIP compression. Using identical wording, and repetitive markup (like tables, similarly structured divs, or class names) can improve GZIP compression even further.

沒有留言: