XML Schema was approved as an official recommendation by the WOrld Wide Web Consortium (W3C) in 2001. W3C defines XML Schema as "XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content, and semantics of XML documents."
As it's clear from the definition that XML Schema is a mechanism of defining basically the structure, content and semantics of XML Documents which conform to the Schema under consideration. But, DTD (Document Type Definition) also serves a similar purpose. So, why a need for something different was felt?
Well... the short answer is that XML Schema just evolved from DTDs in the wake of getting a more powerful and flexible mechanism as compared to what DTD supported. XML Schemas are XML documents themselves and hence all the benefits of XML - parsing, programmatically accessing, validating, and extending automatically apply to XML Schemas as well. These all benefits make Schemas a far better alternative to DTDs.
Do we still need DTDs?
Yeah, we need them, but probably only for the legacy applications where they have been in use since long. Almost all the newer applications now use Schemas only for the simple reason that XML Schemas are much more powerful & flexible and therefore offer many advantages over DTDs. However, DTDs are still supported and they can be used in tandem with XML Schemas as well.
Anatomy of XML Schemas
An XML Schema is made up of the following declarations and definitions. Few of these might be optional as well.
XML Schema applications/uses
As it's clear from the definition that XML Schema is a mechanism of defining basically the structure, content and semantics of XML Documents which conform to the Schema under consideration. But, DTD (Document Type Definition) also serves a similar purpose. So, why a need for something different was felt?
Well... the short answer is that XML Schema just evolved from DTDs in the wake of getting a more powerful and flexible mechanism as compared to what DTD supported. XML Schemas are XML documents themselves and hence all the benefits of XML - parsing, programmatically accessing, validating, and extending automatically apply to XML Schemas as well. These all benefits make Schemas a far better alternative to DTDs.
Do we still need DTDs?
Yeah, we need them, but probably only for the legacy applications where they have been in use since long. Almost all the newer applications now use Schemas only for the simple reason that XML Schemas are much more powerful & flexible and therefore offer many advantages over DTDs. However, DTDs are still supported and they can be used in tandem with XML Schemas as well.
Anatomy of XML Schemas
An XML Schema is made up of the following declarations and definitions. Few of these might be optional as well.
- Document Type Declaration: Since an XML Schema is also an XML Document which obviously conforms to the W3C XML recommendations and hence they may contain the particular document type declaration, but this is not a mandatory requirement. If present, it's inferred by the root element named <Schema>
- Namespace Declaration: Again an optional declaration which is used to provide a context for element names and attribute names used within an XML Document (which conforms to this Schema). This helps in avoiding ambiguity in name resolution and thereby helps building and extending XML Documents using URIs ensuring unique names for elements and attributes. Namespaces can either be defined inline (called inline xml namespace which are defined inline with element and atribute in the XML) or as Expanded names where the namespace name is combined with the local name to uniquely identify the particular name. Both the namespace and the local name are separated with a colon (e.g., NamespaceName:LocalName).
- Type Definitions: These definitions are used to define Simple or Complex data types or structures, which are later re-used by all the client content models.
- Element/Attribute Declarations: This section of an XML Schema defines the elements and their respective attributes which are used for tags in the XML instances using the Schema. Various constraints like id, type, max/minOccurs, substitutionGroup, etc. can also be defined in this section of an XML Schema.
- Sequence Definition: As the name might suggest this section is used to enforce the order in which the child elements of the XML instances (using the XML Schema in consideration) are required to appear.
XML Schema applications/uses
- Data Validation: XML Schema is used to define the structure of the elements and the attributes within the elements and this definition helps the XML parsers to validate the XML Document Syntax, Datatypes used by the XML Document/Instance, and/or the Inclusion of mandatory elements or attributes. This obviously helps the application designers to delegate some of the basic data validation (and data sufficiency) tasks to the XML Schema rather than doing all of them programmatically.
- Content Model Definition: XML Schema supports both simple and complex data types definitions which ultimately provides flexibility of using concepts like inheritance to the data syntax. This consequently helps building the XML Schema defining extensible models highly suitable for large and complex applications.
- Data Exchange/Integration: Since XML Schemas are themselves XML documents and hence they can be parsed and accessed similar to other XML instances by variety of XML companion tools for various purposes. For example: an XSD when used in conjunction with an appropriate XSLT and an XML-enabled database can support the changes made to the global elements defined by the XSD to be processed consistently. In addition, the output can simultaneously be produced in various formats like PDF, Doc, RTF, HTML, etc. using the single source publishing methodology. The data-oriented datatypes provided in XML Schema 1.1, in addition to the document-oriented datatypes as supported in the previous versions, facilitate complex document exchange and data integration scenarios. Namespaces supported by XML Schema can be used to have more than one vocabulary at a time in an XML instance as the namespaces enable the XML documents to contain unique identifiers. Namespaces facilitate ample opportunities for data exchange and integration by enabling the entire XML frameworks to co-exist within the same architecture. This feature is extremely helpful in mergers and acquisitions, and supply chain requirements, where we generally have a plethora of heterogeneous data constructs.
- Industry XML Standards: These standards aim to streamline and provide a basis for industry-wide data integration by implementing common XML vocabularies which enable the business partners to seamlessly exchange data across different systems and architectures. Several new industry standards are strongly being followed and they are paving way for a seamless data integration and inter-operability. Some of these standards are DITA, DocBook, SCORM, ACORD, CXML, FIXML, and XBRL.