All in the Head: Document Type Definitions
A Document Type Definition defines a set of declarations that conform to a particular markup syntax. These definitions provide the syntax for applications of SGML or XML (i.e.) the markup languages of HTML and XHTML respectively. A DTD defines the following “building blocks” of an HTML or XHTML document:
- Elements — such as head, body, p and how these may be nested (their parent / child relationship).
- Attributes — these provide extra information about an element and has an associated value (e.g.) title="value".
- Entities — variables used to define common text such as & < and > (&, < and > respectively).
- PCDATA — Parsed Character Data. This is text to be parsed by a parser.
- CDATA — This is also Character Data but the text is not parsed by a parser.
Why do we need a DTD?
How do we use a DTD?
A Document Type Definition is declared in a web page by using the DOCTYPE (Document Type Declaration) tag. The DOCTYPE is case-sensitive and comprises two parts, the public identifier (it’s name) and system identifier (Universal Resource Identifier (URI) to the DTD). Although the DOCTYPE contains the URI of the Document Type Definition the browser holds an internal list and only uses the URI as a reference. Here’s an example of a correct DOCTYPE, showing the public identifier on the first line followed by the system identifier on the second:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The full list of recommended DOCTYPEs is available from the W3C website [http://www.w3.org/QA/2002/04/valid-dtd-list.html]. I will put my head on the block and recommend a subset from that list for your web pages:
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
As we progress down the list, the allowed list of elements etc get more tightly controlled. If you are not able or comfortable with using XHTML then HTML 4.01 Strict is the doctype for you, next week’s article will probably explain why you would still develop a new website to this DOCTYPE. XHTML 1.0 Transitional is ideal for beginners as it remains as flexible as HTML 4 but introduces (enforces) well-formedness. XHTML 1.0 Strict, the basis of this website and what I use for new work, is stricter in allowable elements and attributes as it attempts to remove presentational markup from the structure. Purists will probably want to go all the way and use XHTML 1.1.
- Producing a standards-compliant web page starts right at the top of the code with a correctly formed DOCTYPE (i.e.) it comprises of the public identifier and system identifier (a URI) and that the case is properly preserved. Either cut and paste the DOCTYPE declaration from the W3C resource page or ensure that your HTML editor outputs it correctly.
- A properly declared DOCTYPE allows the browser to render your markup and code in “standards-compliant” mode — in other words as you would expect it to.
- A properly declared DOCTYPE allows a validator to validate your markup and CSS, this is especially important for accessibility purposes.
- The only decision to make is “which DOCTYPE do I want to use?”