Examining XML
The latest specification of the Extensible Markup Language is available online at the W3C. This specification completely describes XML. But it can be fairly difficult to understand. In this article, we will examine several parts of the XML specification in order to understand the basics of an XML document.
- characters
- a character is one unit of text, such as a letter, numeral, space, tab, and other Unicode characters
- DTD
- Document Type Definition, the actual grammar of the XML document
- Document Type Declaration, the statement at the top of valid XML documents defining where to find the Document Type Definition
- entity
- a storage unit for the XML document. Each XML document consists of one or more entities. For example, the HTML tag <html></html> defines an entire html entity.
- XML
- Extensible Markup Language
- XML document
- a document that is well-formed as described in the XML specification
XML Documents
As mentioned in the definitions, an XML document is comprised of entities and is well-formed if it conforms to the standards in the XML specification. There are some basic aspects of an XML document.
- white space
XML treats white space (spaces, tabs, carriage returns) the way HTML does. One or more white space character is treated as only one. - character tags
XML uses the same characters as HTML for indicating tags and elements, specifically <, >, and &. It also uses the colon (:) within XML names for namespaces. - other characters
Other ASCII and Unicode characters are taken as literal unless the DTD or other element of the document redefines them. - comments
XML also uses the same comment style you are familiar with in HTML <– –> - processing instructions
These are special tags created to contain instructions for applications. They are indicated with<?and?>tags - CDATA
When you have a large block of XML code you would like to comment out quickly or information you need to mark as data rather than actual code, you can use the<![CDATA[tag and end the section with]]>
