SOX is an alternative syntax for XML. It is useful for reading and creating XML content in a text editor. It is then easily transformed into proper XML.
SOX was created because developers can spend a great deal of time with raw XML. For many of us, the popular XML editors have not reached a point where their tree views, tables and forms can completely substitute for the underlying markup language. This is not surprising when one considers that developers still use a text view, albeit enhanced, for editing other languages such as Java.
SOX uses indenting to represent the structure of an XML document, which eliminates the need for closing tags and a number of quoting devices. The result is surprisingly clear. For example, here is an XSLT script written in SOX form:
stylesheet> xmlns=http://www.w3.org/1999/XSL/Transform version=1.0 template> match=node() copy> apply-templates> select=node()
Here is a scrap of XHTML written in a slightly more compact style:
html> head> title> My Home Page body> h1> Contact Details p> I can be contacted at a> href=mailto:me@myplace.net this address except when on vacation.
SOX can be used to write a subset of XML consisting of elements, attributes and text. Other parts of XML such as processing instructions, comments and entities are not covered by SOX at this stage.
An implementation of SOX as a SAX reader is provided.
In the basic SOX grammar, each line represents an XML element, attribute, or text node as follows. The full SOX grammar adds quoted text and the single-line forms as detailed in following sections.
Whitespace consists of spaces and tabs. Whitespace is treated as follows:
Lines consisting only of whitespace are ignored.
Indentation is represented by whitespace at the beginning of a line, counting tabs as equivalent to 8 spaces.
In unquoted text, leading and trailing whitespace (other than the indent) is ignored and each internal span of whitespace is treated as a single space.
A single space is unconditionally appended to the unquoted text forming an XML text node. (This can be prevented by quoting.)
All other whitespace is ignored.
A string of text can be quoted by enclosing it within a pair " or ' characters. A quoted string can be used wherever unquoted text may appear.
SOX |
XML |
---|---|
A quoted string may be used for an attribute value, following the '=' . | |
template> match="html:p[class='note']" |
<template match= "html:p[class='note']" /> |
A quoted string on a line by itself represents a text node. (No space is appended to the string.) | |
pre> "controlled sp" "acing" |
<pre>controlled spacing</pre> |
A quoted string may appear within unquoted text (including at the beginning and end of the unquoted text). The string is inserted into the quoted text (without quotes). | |
p> Whole ">" the parts. |
<p>Whole > the parts. </p> |
Adjacent quoted strings are concatenated without any intervening space. | |
p> "This" "and" "that" |
<p>Thisandthat </p> |
Within the string :
Whitespace is preserved.
The the '=' and '>' characters are preserved.
The ' character (for a string quoted with ") and the " character (for a string quoted with ') is preserved.
No line breaks are allowed.
A multiline string of text is quoted with triple quote marks. Each multiline string represents an XML text node For example:
SOX |
XML |
---|---|
pre> """Text spanning several lines forming a single XML 'so-called' text node""" |
<pre>Text spanning several lines forming a single XML 'so-called' text node</pre> |
A multiline string is introduced by a (suitably indented) triple quote, ''' or """.
All text following the triple quote up to a matching triple quote forms part of the string. This includes newlines, but indentation is treated specially, as follows.
Indentation within the multiline string is adjusted to form the string value. Any indentation less than or equal to the current indentation level is removed. Indentation greater than the current level is reduced by the current indentation level.
For clarity, an attribute or child text node of an element may appear on the same line as the element name, following the '>'. Additional children of the element may follow in an indented block as usual. (Children including any on the same line as the element must still appear in the correct order.) For example:
Basic SOX |
Alternative |
---|---|
template> name=item html:p> ITEM: apply-templates> select=node() |
template> name=item html:p> ITEM: apply-templates> select=node() |
A element may also appear on the same line as its parent. In this case, the element is the only child of the parent and the following indented block, if any, belongs to the (innermost) child. For example, an XML schema fragment:
Basic SOX |
Alternative |
---|---|
element> name=doc annotation> documentation> the document element complexType> sequence> element> name=body type=bodyType |
element> name=doc annotation> documentation> the document element complexType> sequence> element> name=body type=bodyType |
A Java implementation of a SAX parser and a SAX serialiser is provided. The source is here: SOX-20020331.zip.
A convenient way to parse and generate SOX is to use styler. Styler can be used from the command line or as an Ant task to process SOX.
The foregoing definition of SOX is in the public domain and may be copied and used freely.
The acronym "SOX" also refers to a circa 1999 XML Schema proposal: