Validating a Custom DTD

In his article in this issue, Peter-Paul Koch proposes
adding custom attributes to form elements to allow triggers for specialized
behaviors. The W3C validator won’t validate a document with these
attributes, as they aren’t part of the XHTML specification.

article will show you how to create a custom
DTD that will add those
custom attributes, and will show you how to validate documents that use those
new attributes. Here is a sample of the HTML with the custom attributes that
let us specify the maximum length of a text area and whether a form element
is required or not:

  <input type="text" name="yourName" size="40" />
  <input type="text" name="email" size="40"
  <span class="highlight">required="true" />
<textarea <span class="highlight">maxlength="300" required="false" rows="7" cols="50"></textarea> </p> <p> <input type="submit" value="Send Data" /> </p> </form>

What’s a DTD?#section2

A Document Type Definition (DTD) is a file that
specifies which elements and attributes exist in a markup language and
where they can appear. Thus, the XHTML DTD specifies that
<p> is a valid element, and that it can appear
inside a <div>, but not inside a <b>.
The URL at the end of your DOCTYPE declaration points
to a place where you will find the DTD for the flavor of HTML you’re
using. Neither your browser nor the W3C Validator goes out to the web to find
the DTD — they have a “wired-in” list of the valid
DOCTYPEs and use the URL for identification purposes only. As you will see
later, this will change when you make a custom DTD.

Specifying the attributes#section3

Adding attributes to an existing DTD is easy. For each attribute, you
need to specify which element it goes with, what the attribute name is,
what type of values it may have, and whether the attribute is optional or
not.  This information is specified in this model:

  elementName attributeName type optionalStatus

To add the maxlength attribute to the
<textarea> element, you write this:

<!ATTLIST textarea maxlength CDATA #IMPLIED>

The CDATA specification means that the attribute value
can contain any old character data you please; thus
maxlength=“300” or maxlength=“ten” will both
be valid. For “open-ended” data, DTDs don’t let you
get more specific.  The #IMPLIED specification means that
the attribute is optional.  A required attribute would specify

When you have a list of possible values for an attribute, you may specify
them in the DTD.  This is the case with the attribute named
which has the values true and false. The values
are case sensitive; in this example only the lowercase values are specified, so
a value of TRUE would not be considered valid.

<!ATTLIST textarea required (true|false) #IMPLIED>

Confusion alert! This attribute is named “required,”
but you don’t have to put it on every <textarea>
element, so it’s an optional attribute.

The attribute named required should also be available to the
<input> and <select> elements. All
in all, the specifications to modify the DTD look like this:

<!ATTLIST textarea maxlength CDATA #IMPLIED>
<!ATTLIST textarea required (true|false) #IMPLIED>
<!ATTLIST input required (true|false) #IMPLIED>
<!ATTLIST select required (true|false) #IMPLIED>

Note: Adding new attributes to existing
elements is easy; adding new elements is somewhat more difficult and beyond
the scope of this article.

Placing the attributes#section4

Now that you’ve defined the custom attributes, how do you place
them where a validator can find them?  The very best place to put them
would be as the
internal subset
directly in your document:

"-//W3C//DTD XHTML 1.0 Transitional//EN"
  <!ATTLIST textarea maxlength CDATA #IMPLIED>
  <!ATTLIST textarea required (true|false) #IMPLIED>
  <!ATTLIST input required (true|false) #IMPLIED>
  <!ATTLIST select required (true|false) #IMPLIED>

If you run such a file through the W3C
validator, you find that it validates wonderfully well.
If you download the sample files for this article and validate
file internal.html, you can see this for yourself.
when you display the file in a browser, the ]>
shows up on the screen.  There’s no way around this bug, so this
approach is right out.

Modifying the DTD#section5

An approach that does workrequires you to obtain the
XHTML transitional DTD and add your modifications to that file.
The original version of the DTD is file
xhtml1-transitional.dtd in directory dtd
from this article’s sample files.  You will also find
three files with the .ent extension in that
directory. These three files
define all the entities that you use in HTML,
such as and ñ. You
need to keep all these files together in the same directory.

The customized file, named xhtml1-custom.dtd was
created by opening file xhtml1-transitional.dtd and
adding the new attribute specifications at the end of the file. When
adding attributes, you
want to add your customizations at the end of the DTD to
ensure that everything they need to reference
has already been defined.

Changing the DOCTYPE#section6

You must now change the <!DOCTYPE> in your HTML
file to indicate that you are now using this custom “flavor”
Since the custom DTD isn’t one of the publicly registered ones,
the DOCTYPE will not use the PUBLIC specifier. Instead,
you use the keyword SYSTEM followed by the location of the
custom DTD. This may be a relative or absolute path name, or, if your
DTD is on a server, a URL.  The path must point to where your
custom DTD really is!
File custom.html in the sample files for this article
uses a relative path name:


When you try to use the W3C validator on
custom.html, it rejects
the document because you aren’t using one of the validator’s
approved DTDs.

Using a different validator#section7

The solution is to use a different validator which will actually go
out to the URL that you have specified and use it to check whether your
document is valid or not.
Because the document you’re validating is XHTML,
you can use any XML parser that
does validation. This article will uses the Xerces parser,
available from  This parser is written in
Java™, so you will need to have Java installed on your system.
When you unzip the Xerces download file, it will create a directory named
xerces-2_6_2 (or whatever version is current).  In the
following text, the assumption is that you have unzipped it to the top
level of the
C: drive on Windows or to /usr/local on Linux.

One of the sample files that comes
with Xerces is the Counter program. This program
counts the number of elements,
attributes, ignorable whitespaces, and characters appearing in
an XML (or, in this case, XHTML) document. This program has an option
to turn on validation as it parses the document, making it perfect for
the task at hand.
You run the Counter program (which is going to be your
“validator”) from
a batch file for Windows or a shell script for Linux.
Here is the
batch file, named
It is all on one line, but shown here split across lines to
fit on the page. Please note: there is a blank before the word
dom and after the -v.

java -cp c:xerces-2_6_2xercesImpl.jar; »
c:xerces-2_6_2xmlParserAPIs.jar; »
c:xerces-2_6_2xercesSamples.jar dom/Counter -v »
%1 %2 %3 %4 %5 %6 %7 %8

Here is the Linux shell script, named

java -cp /usr/local/xerces-2_6_2/xercesImpl.jar:\
/usr/local/xerces-2_6_2/xercesSamples.jar \
dom/Counter -v $1 $2 $3 $4 $5 $6 $7 $8

Of course, if you have unzipped Xerces to a different location, you
will have to change the path names.
Once this is all set up, you can validate the file
custom.html by typing
this on a Windows command line:

validate custom.html

Or this at a Linux shell prompt:

./ custom.html

If your file is valid, you will receive a message giving the
filename and some statistics about the file, like this:

custom.html: 543;50;0 ms
  (15 elems, 20 attrs, 9 spaces, 43 chars)

If the file isn’t valid, you will get error messages as well.
For example, if you try to validate a file named badfile.html
which contains these errors:

<p>Email: <input type="text" name="email" size="40"
 required="<span class="highlight">yes" /></p>
<textarea maxlength="300" <span class="highlight">inquirer="false" rows="7" cols="50"></textarea>

You’ll get this output from the validator:

[Error] badfile.html:12:70: Attribute "required"
  with value "yes" must have a value from the
  list "true false ".
[Error] badfile.html:14:63: Attribute "inquirer"
  must be declared for element type "textarea"
  611;82;0 ms (15 elems, 20 attrs, 9 spaces, 43 chars)

Another validation method#section8

If you are using the
jEdit editor,
you may download the XML plugin. If you name your file with the
extension .xhtml, jEdit will validate using your custom
DTD as specified in the DOCTYPE.


It is easy to specify additional attributes for XHTML elements; with a little
bit of work, you can set up a validator to check your files against your
custom version of HTML.  Download all the
sample files from this article
and give it a whirl.

