nexml schema 0.9

Skip to: Site menu | Main content

The future data exchange standard is here!

NeXML is an exchange standard for representing phyloinformatic data — inspired by the commonly used NEXUS format, but more robust and easier to process.

Process nexml data

Overview

~ / doc / schema-1
rss | digg reddit del.icio.us facebook — Last updated: Tue Sep 23 12:49:28 IST 2014

NeXML 0.9 schema documentation

On this page: Documentation roadmap | Complex type index | Simple type index | Schema design discussion

Documentation roadmap

This is the starting page for the autogenerated schema documentation. Each page of the documentation describes the types from a single schema file, which form a logical unit - such as all types that have to do with DNA. At the top of the page is a brief description taken from the root <xs:annotation/> element of the schema file. This section closes with three icon links, which open in a separate window:

organisation chart — links to a graph of the inheritance tree of the complex types on the page, recursing up to the root class (which might be on a different page). Abstract types are shown in grey, concrete types in black. Inheritance by extension in blue, by restriction in red. The graph is a clickable image map.

file hierarchy — links to a graph of file inclusions: xml schema files can include other files, and these inclusions are displayed here (with the file described on the page shown in black). The graph is a clickable image map.

xml schema code — links to the xml schema source of the described file.

The remainder of the page consists of type definitions, with for each type:

Description

A brief description of the type.

Inheritance

Describes how the type is derived (extension or restriction). Links to the parent class of the type, and child classes (if any).

Attributes

Lists the attributes that may occur on element instances of the type; their name, data type of the value, and their usage (required/optional/forbidden). Only applies to complex types.

Facets

Lists constraining facets. For example, a regular expression that a string must match, or the lower and upper bounds for a number. Only applies to simple types.

Substructures

Lists the immediate child elements, sequences and choices.

Definition source

The raw code of the type definition.

Complex type index

Simple type index

Schema design discussion

The design of the nexml schema is guided by a handful of simple principles. Having some understanding of what these are will help you make the most of the documentation. You will want to find out how inheritance is used in the schema and how to traverse the inheritance tree, how nexml elements are nested, and how the schema is modularized into files. By reading this section, you will learn the organization of the schema files and the type definitions in them so you will be able to find what you need quickly.

A photo of a Russian Doll Babushka — Xml schemas generally are designed following one of three patterns. If you sit down and design a schema for a rigid format where things only ever have one place, you might start by writing the type definition of the root element. Inside that type definition you would define which child elements are allowed, and inside them you would define their children and so on.

The end result would be a schema that mirrors the instance documents you had in mind - one big nested structure. This is known as the "Russian Doll" pattern. The downside of this approach is that you can't break your schema down into different files or reuse type definitions so it is not a very practical approach for large schemas. This is not how nexml is designed.

A photo of a Baloney sandwich Bologna — The second approach is the very opposite of the first. You might take this approach if what you are building is a loosely coupled collection of snippets, for example because each of them is a type of small message you send to a web service. Following this design you would write your schema as a library of type definitions and elements.

Although this is useful for messaging protocols and the like, it's not very practical for complex structured data because every type can be the root element and there isn't an obvious superstructure. Phylogenetic data like that contained in NEXUS files consists of blocks of fundamentally different types that relate to each other in different ways. To make sense of these relationships and process and query them efficiently things need to be in predictable locations within documents (or streams, records, or messages). The nexml schema is therefore also not designed following this "Salami Slice" pattern.

A photo of Venetian blinds Venice — The third approach is an intermediate of the two. Types are defined as a library of snippets just like the Salami Slice pattern and exist as reusable, named, things - but they indicate what other named types their immediate children can be.

Taken as a whole, such a design has a superstructure where one type slides into another, and that into another, like the lattices in blinds: the "Venetian Blinds" pattern, which is how nexml is designed. The basic units in the nexml schema are complexType definitions. These definitions consist of a clump of element declarations (the allowed children within the type) and attribute declarations which jointly define the structure of an element that is an instance of that complexType. Elsewhere, this type definition is then used to specify allowed named instances of it in other type definitions.

has-a — Assuming a finite non-recursive set of these definitions there must be a "top lattice" - the Nexml complexType. Starting from this top level type we can then navigate the schema by traversing the path of types allowed within other types. The way the documentation shows this is by listing, where applicable, the immediate substructures of the complex type. For example, the Nexml type allows one or more child elements of type Taxa, which in instance documents are implemented by elements called "otus". If we follow the link to the Taxa complex type we can have a look at what child elements are allowed in the "otus" element and follow the links to their type definitions and so on.

is-a — Because the nexml schema is designed in a modular way with named types, their type definitions can be reused and extended to derive other types. This is done extensively in the schema, and you can explore this inheritance tree by following the links in the Inheritance subsection of each type definition, which specifies what superclass the type was derived from (and how, namely through restriction or extension) and what other types derive from this type.