UBIF_TypeLib.xsd schema file overview
(Version: Unified Biosciences Information Framework (UBIF) 1.1)
TDWG working group: Structure of Descriptive Data (SDD)
Introduction
This document gives an overview of the schema components present in a single schema file, similar to the entry view provided by graphical schema editors. It documents only the root level annotations and components (elements, global attributes, simple and complex types, and groups). The definition of the components listed here is documented separately (hyperlinking could not yet be implemented).
Because the UBIF schema is designed as a type library, complex types represent class definitions and most schema files contain only a single root-level element.
Please see the schema documentation resource directory for schema overviews of other files and detailed component documentation.
Schema file content
The following content is generated automatically from the documentation inside the schema file:
Unified Biosciences Information Framework (UBIF) XML schema. This part provides a type library of fundamental simple and complex types. See the main UBIF.xsd file for complete information, copyright and licensing.
Copyright © 2006 TDWG (Taxonomic Databases Working Group, www.tdwg.org). See the file UBIF_(c).xsd for authorship and licensing information.
Note: if multiple namespaces shall be used, all of which make use of this library, it would be possible to remove both xmlns="http://rs.tdwg.org/UBIF/2006/" and targetNamespace="http://rs.tdwg.org/UBIF/2006/" from xs:schema. In this 'chameleon pattern' (http://www-106.ibm.com/developerworks/library/x-flexschema/ or http://www.xfront.com/ZeroOneOrManyNamespaces.html), the included type libraries acquire the target namespace of the including schema. However, when testing this pattern in 2003-2004, several validators had problems handling this; for the time being UBIF and related schemata like SDD use only a single namespace.
Imported or included schemata:
The following import of xml namespace allows use of xml:lang directly. That schema defines an attribute lang of type="xs:language". The enumerated language values of this type are extensible using "x-" plus identifier. For the case of language-neutral elements (scientific taxon names) the value 'x-neutral' is recommended. To express unknown or mixed language, the special values 'mul' (multiple/mixed) and 'und' (undetermined/unknown) already exist (see http://wiki.tdwg.org/twiki/bin/view/UBIF/ExtendLanguageWithNeutralAndUnknown). Note: the import uses a local schema version, to ensure validation at times when not connected to the internet. The original schemaLocation is "http://www.w3.org/2001/xml.xsd".
Imports: w3c-schema/xml.xsd (http://www.w3.org/XML/1998/namespace)
Includes: UBIF_EnumLib.xsd
Basic type library:
Basic generic types:
- LongString (Simple type, based on xs:normalizedString by restriction): Normalized string required to contain at least 1 character, currently also limited length 64000 character to reduce implementation costs for some applications. Requiring a minimum length of 1 removes the "xml string anomaly", i. e. required element/attributes may have no content - which differs from behavior of number/date types.
- ShortString (Simple type, based on LongString by restriction): Normalized string of limited length (currently 1..255 character). The main reason to declare a limited length string types is to reduce implementation cost on some systems. Most database management systems (dbms) limit string length, either per field or per sum of fields in a record. Although probably all dbms also support long strings, these usually have different properties (slower, occasionally not sortable or indexable). Thus designers of logical or physical database models needs information about expected string length for data they intend to import. This cost usually occurs only to consuming applications where it is not acceptable to simply ignore part of the information.
- ZeroToOne (Simple type, based on xs:double by restriction): Double precision numeric value in the range of [0..1], for probabilities, etc.
- ColorRGB (Simple type, based on ShortString by restriction): Colors defined as RGB (red-green-blue) values combined as hex-encoded into a string, like in html. Example: #EE88FF. Colors may also be expressed as HSV (hue-saturation-luminance), but this is convertible to RGB. RGB is preferred because it is used in HTML.
Derived string types with restricting patterns:
- NumericFormatPattern (Simple type, based on ShortString by restriction): String containing a format pattern of the type used in the xslt format-number function
The resource media type carries currently only semantics, no syntax or regular expression pattern:
- ResourceMediaType (Simple type, based on ShortString by restriction): Resource media type (MIME, "text/html", "image/png", etc.). Compare www.w3.org/TR/xml-media-type.
The following Range, Date, and Coordinate types describe frequently recurring simple type combinations in a element with attributes
Elements defining value ranges:
- ValueRangeOrVerbatim (Complex type): A value range as upper/lower value, optional, plus additional/alternative verbatim text.
- ValueRange (Complex type, based on ValueRangeOrVerbatim by restriction): Restricted to required upper/ower value (no verbatim)
- ZeroToOneRange (Complex type, based on ValueRangeOrVerbatim by restriction): Required lower and upper attributes in the range 0-1. Used, e. g., for probabilities or values commonly expressed as percent
- ZeroToOneEstimateRange (Complex type): Optional lower/upper estimate attributes in the range 0-1, with default values. Used, e. g., for certainty and frequency; the default values 0 and 1, resp., indicate that no estimate was possible.
Types for composite gregorian calendar date/time (points in time where parts may be missing; following the seven property model described, e. g., in xml Schema 1.1 (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). Instead of gYear, gMonth, gDay integer types with constraining facets are used for two reasons: a) each of them may have a timezone, which may lead to inconsistent data with multiple timezones; b) the lexical representation seems to be occasionally poorly implemented (e.g. where '31', or '---5' are accepted, whereas valid examples are '---31', '---05', and '---05+02:00'). In addition to the seven property model additional text attributes for either unsharp additions or complete verbatim dates are added. Note that incomplete dates in most cases are calendar specific and incomplete non-gregorian dates can not be expressed. Furthermore, for complete dates it may be unclear whether a reformed or unreformed date has been used (e.g. in Russia in the 19th century).
- CompositeDate (Complex type): Date separated into attributes so that any part of the date may be missing
[ATTR: year = four digit year;
month = two digit month of year;
day = two digit day of month;
verbatim = unparsed textual date representation;
supplement = text additional or modifying the exact dates, e. g., 'end of summer', 'first half or year', 'first decade of month', '1888-1892';
timezone = expressed as integer according to the xml schema seven parameter model]
- CompositeDateTime (Complex type, based on CompositeDate by extension): Date + Time separated into attributes so that any part of the date may be missing. Note: adding a single time attribute of type xs:time would be simpler, but a duplication of the timezone information would be possible.
Types for geographical coordinates:
- DecimalLatitude (Simple type, based on xs:double by restriction): Latitude of geographical coordinates in signed decimal degrees (i.e. 30° 30' S would be expressed as -30.5). The value range is -90 to 90°, South latitude being negative, North latitude being positive.
- DecimalLongitude (Simple type, based on xs:double by restriction): Longitude of geographical coordinates in signed decimal degrees (i.e. 30° 30' W would be expressed as -30.5). The value range is -180 to 180°, West longitude being negative, East longitude being positive.
- GeographicalCoordinates (Complex type): ATTR: latitude, longitude (in decimal degrees), geodeticdatum (esp. if different from a Greenwich-based datum).
Complex types closely related to enumerations (these may alternative be placed in UBIF_TypeLib)
- AbstractStringOrCode (Abstract Complex type): Three attributes provide options to express a value constrained (enumerated/extensible) vocabulary, simple free-form text (perhaps interpreted), or verbatim (uninterpreted original version). At least one attribute should be present; this can not be validated by the schema (external validation is required for this and for all types derived from this).
- Sex (Complex type, based on AbstractStringOrCode by restriction): Expressing sex as code (enumerated vocabulary) or free-form literal or verbatim text. At least one attribute should be present; this can not be validated by the schema (external validation).
- TaxonomicRank (Complex type, based on AbstractStringOrCode by restriction): Expressing taxon rank as code (enumerated vocabulary), simple free-form text (perhaps interpreted), or verbatim (uninterpreted original version). At least one attribute should be present; this can not be validated by the schema (external validation).
- RevisionStatus (Complex type, based on AbstractStringOrCode by restriction): Expressing actual revision level relative to intended revision level as code (enumerated vocabulary: RevisionStatusEnum), optionally plus literal text (free-form comment).
- DataStatus (Complex type, based on AbstractStringOrCode by restriction): Expressing reasons why data are missing (not coded) as code (enumerated vocabulary: DataStatusEnum) only (text equivalents currently not supported!)
Complex types referring to UnivarStatMeasureEnum (used, e. g., by SDD):
Other complex types
- TelephoneNumber (Complex type): Telephone, fax, etc. number
ATTR: number = should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number") (vCard:Tel.Number)
ATTR: devicetype = voice, fax, mobile, pager, modem (identical with vCard:Tel.Voice etc.; if several flags apply to a single phone number list the phone number multiple times!)
ATTR: usagenote = free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Tel.Home/Work flags)
ATTR: preferred = preferred number, may occur multiple times for different device types (vCard:Tel.Pref)
Base type and derived types for all document internal cross reference (using id/ref attributes):
- LocalInstanceID (Simple type, based on ShortString by restriction): This allows to define (and redefine) the value type for instance IDs and refs to these
Language and audience attributes form the basis of text representations of labels and other types:
Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for namespace problems occurring with attribute definitions in included library schemata.
- language (Attribute group): The enumerated language values of this type are extensible using "x-" plus identifier. For the case of language-neutral elements (scientific taxon names) the value 'x-neutral' is recommended. To express unknown or mixed language, the special values 'mul' (multiple/mixed) and 'und' (undetermined/unknown) already exist (see http://wiki.tdwg.org/twiki/bin/view/UBIF/ExtendLanguageWithNeutralAndUnknown).
- optional_language (Attribute group)
- multilingual (Attribute group): (multilingual support attributes)
Audience is also available as an object type to define label and expertise level for audiences. However, audience values may be used even if no Audience object with a corresponding id can be found.)
The reason for this is that all object labels, representations may already use audience in addition to language. To avoid circular dependencies or introducing special cases for audience objects, it was considered acceptable not to validate the correspondence using schema identity constraints (= referential integrity) here.
(Note: If audience definitions are present, a missing attribute (and one explicitly containing the default set in this schema, e.g. "-") in multilingual or AudienceRef should be treated as pointing to the first audience with expertiselevel=0 (undefined).
Complex types that add language/audience or 'preferred' attributes to the simple types LongString, ShortString, anyURI:
- LongStringL (Complex type, based on LongString by extension): Long string (i. e. xs:string with minimum length=1) extended with *optional* language attribute
- ShortStringL (Complex type, based on ShortString by extension): ShortString (i. e. xs:string with limited length), extended with *optional* xml:lang attribute
(Generated on 23. May 2006 by DiversitySchemaTools Version 0.5. Copyright (c) G. Hagedorn 2006.)