Web Languages Overview

Web Languages

Marking Up Documents

Information in the World Wide Web is encoded via markup languages, which use tags (markups) to embed metadata into a document. "Marking up" a document was a tradicional practice within the publishing industry, where it was used as a means for communication of printed work between authors, editors, and printers. Then, this practice was later expanded to more sophisticated schemes that evolved to the languages currently used in the World Wide Web and in other data exchanging applications.

Historical Background

The concept of markup languages (1) was initially implemented by IBM in 1969 with the development of the Generalized Markup Language (Goldfarb, 1996), which gained in popularity throughout the seventies. Then, the growing demand for a more powerful standard led to the development of the Standard Generalized Markup Language (SGML), which was adopted as an ISO standard in 1986 (ISO:8879). SGML was a powerful language but also a very complex one, which hindered its use in popular applications.

The breakthrough that sparked the popularization of markup languages was the creation of the Hypertext Markup Language (HTML) in 1989 by Tim Berners-Lee and Robert Caillau (Connoly et al., 1997). HTML is a very simple subset of SGML that is focused on the presentation of documents. It rapidly became the standard language for the World Wide Web. Yet, as the WWW became ubiquitous, the limitations of HTML became apparent, the major one being its inability to deal with data interchange due to its limited support for metadata. Even though the W3C launched new HTML versions, these were not aimed to provide support to data exchange, since HTML was not originally designed for data interchange(2). In spite of the WWW's original intent was clearly focused on documents, HTML’s inaptitude for data interchange became a major shortcoming at the same pace the WWW became an ideal medium for data interchange.

The answer for the HTML limitations was the development of the Extensible Markup Language (XML), which is much simpler than SGML but still capable of expressing information about the contents of a document and of supporting user-defined markups. XML became a W3C recommendation in 1998. In addition to its use for data packaging (e.g. the .plist files in Mac OS X and many configuration files in Windows XP), it has become the acknowledged standard for data interchange.

Adding Semantics

With the establishment of the Semantic Web road map by the W3C in 1998, it became clear that more expressive markup languages were needed. As a result, the first Model Syntax Specification for the Resource Description Framework (RDF) was released in 1999 as a W3C recommendation. Unlike the data-centric focus of XML, RDF is intended to represent information and to exchange knowledge. Accounts of the differences between RDF and XML are widely available on the WWW (e.g. Gil & Ratnakar, 2004).

In addition to a knowledge representation language, the Semantic Web effort also needed an ontology language to support advanced Web search, software agents, and knowledge management. The latest step towards fulfilling that requirement was the release of OWL as a W3C recommendation in 2004. OWL superseded DAML+OIL (Horrocks, 2002), a language that merged the two ontology languages being developed in the US (DAML) and Europe (OIL) (3) .

According to Hendler (2004), earlier languages have been used to develop tools and ontologies for specific user communities, and therefore were not defined to be compatible with the architecture of the World Wide Web in general, and the Semantic Web in particular. In contrast, OWL uses the RDF framework to provide a more general, interoperable approach by making ontologies compatible with web standards, scalable to web needs, and with the ability to be distributed across many systems. The interested reader will find information on OWL at the W3C OWL website. Yet, as we stated before, OWL suffers from the limitations of deterministic languages and thus lacks the advantages of probabilistic reasoning.

Footnotes

(1) In spite of both being called languages, markup languages are very different from programming languages. They are static and do not process information, but only store it in a structured way.

(2) HTML has a strong focus on displaying information. Even its limited, implied semantics are largely ignored. As an example, tags h1, h2, …, h5 are commonly employed as a formatting tool, rather than to identify header levels in a document structure.

(3) The interested reader will find further information on DAML at http://www.daml.org/ and on OIL at http://www.ontoknowledge.org/oil/

References

Connoly, D., Khare, R., & Rifkin, A. (1997). The Evolution of Web Documents: The Ascent of XML. Word Wide Web Journal (special issue on XML), 2(4), 119-128.

Goldfarb, C. F. (1996). The Roots of SGML - A Personal Recollection.

Hendler, J. (2004). Frequently Asked Questions on W3C's Web Ontology Language (OWL).

Horrocks, I. (2002). DAML+OIL: A Reasonable Web Ontology Language. Keynote talk at the WES/CAiSE Conference. Toronto, Canada.