Skip to Navigation

All in the Head: Document Type Definitions

A Document Type Definition defines a set of declarations that conform to a particular markup syntax. These definitions provide the syntax for applications of SGML or XML (i.e.) the markup languages of HTML and XHTML respectively. A DTD defines the following “building blocks” of an HTML or XHTML document:

  • Elements — such as head, body, p and how these may be nested (their parent / child relationship).
  • Attributes — these provide extra information about an element and has an associated value (e.g.) title="value".
  • Entities — variables used to define common text such as &amp; &lt; and &gt; (&, < and > respectively).
  • PCDATA — Parsed Character Data. This is text to be parsed by a parser.
  • CDATA — This is also Character Data but the text is not parsed by a parser.


Why do we need a DTD?

It is important to add a DTD declaration to any HTML or XHTML document to establish that the document is an instance of the defined DTD. This informs the World Wide Web Consortium (W3C) validator which version of (X)HTML you are using. Without this, you cannot validate the markup and Cascading Style Sheets (CSS) and will fail to meet checkpoints in the W3C web accessibility guidelines to boot. A DTD is also important for the proper rendering and functionality of web documents in browsers. By telling the browser to render in standards-compliant mode the (X)HTML, CSS and Document Object Model (DOM) code that you write will be treated as expected. Leave out the DTD, or incorrectly declare it, and you put the browser into “quirks” mode and the browser assumes you’ve written invalid markup and code from the 1990s. The browser will render your CSS as if it were Internet Explorer 4 and use proprietary, browser-specific Document Object Models for your JavaScript. In 2006, this is not good.

How do we use a DTD? 

A Document Type Definition is declared in a web page by using the DOCTYPE (Document Type Declaration) tag. The DOCTYPE is case-sensitive and comprises two parts, the public identifier (it’s name) and system identifier (Universal Resource Identifier (URI) to the DTD). Although the DOCTYPE contains the URI of the Document Type Definition the browser holds an internal list and only uses the URI as a reference. Here’s an example of a correct DOCTYPE, showing the public identifier on the first line followed by the system identifier on the second:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

The full list of recommended DOCTYPEs is available from the W3C website [http://www.w3.org/QA/2002/04/valid-dtd-list.html]. I will put my head on the block and recommend a subset from that list for your web pages:

  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  3. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  4. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">


As we progress down the list, the allowed list of elements etc get more tightly controlled. If you are not able or comfortable with using XHTML then HTML 4.01 Strict is the doctype for you, next week’s article will probably explain why you would still develop a new website to this DOCTYPE. XHTML 1.0 Transitional is ideal for beginners as it remains as flexible as HTML 4 but introduces (enforces) well-formedness. XHTML 1.0 Strict, the basis of this website and what I use for new work, is stricter in allowable elements and attributes as it attempts to remove presentational markup from the structure. Purists will probably want to go all the way and use XHTML 1.1.

Conclusion

  1. Producing a standards-compliant web page starts right at the top of the code with a correctly formed DOCTYPE (i.e.) it comprises of the public identifier and system identifier (a URI) and that the case is properly preserved. Either cut and paste the DOCTYPE declaration from the W3C resource page or ensure that your HTML editor outputs it correctly.
  2. A properly declared DOCTYPE allows the browser to render your markup and code in “standards-compliant” mode — in other words as you would expect it to.
  3. A properly declared DOCTYPE allows a validator to validate your markup and CSS, this is especially important for accessibility purposes.
  4. The only decision to make is “which DOCTYPE do I want to use?”

 

Karl Dawson has been publishing a series titled “From the Top” for his ‘That Standards Guy’ blog at http://www.thatstandardsguy.co.uk/ explaining how and why to construct a high quality, web-standards compliant head section for a web page. He is a member of GAWDS (Guild of Accessible Web Designers).

Doctype definitions article