SDD_TypeLib.xsd schema file overview
(Version: Unified Biosciences Information Framework (UBIF) 1.1 and SDD 1.1)
TDWG working group: Structure of Descriptive Data (SDD)
Introduction
This document gives an overview of the schema components present in a single schema file, similar to the entry view provided by graphical schema editors. It documents only the root level annotations and components (elements, global attributes, simple and complex types, and groups). The definition of the components listed here is documented separately (hyperlinking could not yet be implemented).
Because the UBIF schema is designed as a type library, complex types represent class definitions and most schema files contain only a single root-level element.
Please see the schema documentation resource directory for schema overviews of other files and detailed component documentation.
Schema file content
The following content is generated automatically from the documentation inside the schema file:
This file will be included into the UBIF/SDD integration schema 'SDD.xsd' (SDD uses the same namespace as UBIF).
Copyright © 2006 TDWG (Taxonomic Databases Working Group, www.tdwg.org). See the file SDD_(c).xsd for authorship and licensing information.
Due to problems with key/keyrefs when using two namespaces (see documentation on the SDD WIKI: http://wiki.tdwg.org/twiki/bin/view/SDD/UBIFDesignRequirements), the SDD schema is based on the UBIF namespace, and thus uses include rather than import!
Includes: UBIF_CoreExtensions.xsd
Includes: SDD_EnumLib.xsd
UBIF insertion groups
The two SDD-groups are used inside the UBIF top-level Datasets/Dataset structure to define the object collections used by SDD
- SDD-DescriptiveTerminology (Element group): Defines the operational terminology (concepts, characters, states, etc.)
in which descriptions are expressed.
- SDD-DescriptiveData (Element group): Describes individuals (specimens) or classes (taxa) using the terms defined in the operational terminology (concepts, characters, states, etc.).
For all first-class objects in SDD, collections of type set are defined. These form root-level collections in the Dataset object.
TERMINOLOGY START
DescriptiveConcepts, Characters and dependent objects (states, modifiers, statistical measures)
1. a) DescriptiveConcept definitions. Note: relations between concepts may be defined in the operational character tree. Independent ontologies of concepts may be created through Link rel=Subclass etc. Another plan for the future is to allow defining concepts relations inside characters.
- DescriptiveConcept (Complex type, based on VersionedAbstractObject by extension): DescriptiveConcepts may be basic
properties (color, shape,
texture), structural types
(fruit types), methods
(naked eye, hand lens,
microscope) or other
hierarchical generalizations that can be applied to characters
(e. g., relative region: tip
versus base of structure)
Note that a number of ontological relations between concepts may be expressed using the general link structure of the base type (subclass, part of, etc.).
- DescriptiveConceptRef (Complex type, based on AbstractLocalRef by extension): Refers to a concept node
- DescriptiveConceptRefSeq (Complex type, based on Seq by extension): (A sequence of references to a descriptive concepts)
Inner classes of DescriptiveConcept. ModifierSeq, ConceptStateSeq, and RecommendedMeasureSeq are second-class objects embedded in first class objects.
1. b) Character tree definitions, references (plus internal types)
- CharacterTree (Complex type, based on VersionedAbstractObject by extension): Defines an entire character tree
(which may be a tree or a single tree node containing a flat list)
- CharTree_NodeRef (Complex type, based on AbstractLocalRef by extension): Refers to a node in the character tree
- CharTree_Node (Complex type): Inner nodes (or terminal nodes if no characters follow)
- CharTree_NodeSeq (Complex type, based on Seq by extension)
- CharTree_Character (Complex type, based on CharacterRef by extension): A character reference, creating a terminal tree node
Inner classes of CharacterTree and CharTree_Node:
2. --- Character definitions (characters = data recording and analysis variables, depending on observed part, property, and observation or measurement methodology)
a) Abstract base type and derived types to be used in instance documents.
- AbstractCharacterDefinition (Abstract Complex type, based on VersionedAbstractObject by extension): Defines a character in the
terminology. Abstract base
type, one of the
extensions below must
be used in instance
documents
- CategoricalCharacter (Complex type, based on AbstractCharacterDefinition by extension): # Derived from AbstractCharacterDefinition to be used in instance documents (non-abstract type).
Categorical data include nominal and ordinal data (DELTA types UM/OM and NEXUS types). Other terms for categorical data in statistics are 'qualitative data' or 'attributes'. The term 'attribute' has been avoided in SDD because it has different definitions in statistics, programming, databases, DELTA, etc. Both 'qualitative' and 'attribute' are ambiguos as to whether ordinal/ ranked variables are in- or excluded.
- QuantitativeCharacter (Complex type, based on AbstractCharacterDefinition by extension): # Derived from AbstractCharacterDefinition to be used in instance documents (non-abstract type)
Quantitative data include data like the DELTA types IN/RN. They are not supported by NEXUS.
- MolecularSequenceCharacter (Complex type, based on AbstractCharacterDefinition by extension): # Derived from AbstractCharacterDefinition to be used in instance documents (non-abstract type)
- TextCharacter (Complex type, based on AbstractCharacterDefinition by extension): # Derived from AbstractCharacterDefinition to be used in instance documents (non-abstract type). In coded descriptions, these characters only support a Text element for unconstrained text.
For applications not capable to analyze unconstrained natural language, text data can not be used in identification.
Note: The ColorRangeCharacter above is only an example of other derivations expected, like algorithmically described shapes, molecular sequences (genome/proteome), or molecular patterns (RFLP, AFLP, etc)
b) inner classes, one-time use within character definitions above
c) State definitions within CategoricalCharacter. Abstract base type and derived types to be used in instance documents.
d) Character and state references
- CharacterRef (Complex type, based on AbstractLocalRef by extension): Refers to a character (e. g., from within concept trees or from descriptions). It consists only of a reference to a Character definition id.
- CharacterStateRef (Complex type, based on AbstractLocalRef by extension): Refers to a character state (e. g., from descriptions). It consists only of a reference to a Character state definition id.
- CharacterStateRefSet (Complex type, based on Set by extension): A collection of state references (CharacterStateRef type)
- ConceptStateRef (Complex type, based on AbstractLocalRef by extension): Refers to a project-wide definition of a categorical state at a concept node
e) Modifiers cover expressions of certainty, frequency, manner, degree, etc. that can be added to existing character value or state data in descriptions.
Modifier reference (single, and group with multiple) to be used in coded descriptions:
- ModifierRefWithData (Complex type, based on AbstractLocalRef by extension): Actual modification of a statement. Refers to a modifier of any type (frequency, certainty, spatial, temporal, etc.).
Modifier reference extended with Text element, used in natural language markup:
- ModifierMarkupRef (Complex type, based on ModifierRefWithData by extension): Actual modification of a statement for markup of natural language descriptions (with Text inside). Refers to a modifier of any type (frequency, certainty, spatial, temporal, etc.).
(Note on ModifierRef/ModifierRefMarkup: Although semantics for the lower/upper attributes are defined only for frequency and certainty modifiers, the schema allows are them in all statement modifications. Additional validation by other means than xml schema may be provided, and applications should use the lower/upper attributes only in modifiers of types than Certainty and Frequency. In other modifier types, the values may be discarded upon import. XML schema validation was attempted in SDD up to 1.0 beta 2, but this resulted in a complex system of multiple derived base types and was considered too complicated.
f) Statistical measures: The base semantics and labels are already available through UBIF. At concepts node further elaboration may occur: a) wording and value formatting b) definition of recommended measure sets.
- UnivarStatMeasureElaboration (Complex type, based on AbstractVocabularyBase by extension): A kind of local extension of the base definition of a statistical measure; used inside in concepts, adding, e. g., formatting information.
- ValueRangeWithClass (Complex type, based on ValueRange by extension): ValueRange extended with a specification for the kind of value (broad enumerated statistical concepts).
TERMINOLOGY END
TERMINOLOGY-BASED DATA
The following types are used in descriptions or identification key to code descriptive data by reference to characters, states, and modifiers defined in the Terminology.
3. --- Character references in coded descriptions: SummaryData
a) abstract and non-abstract derived types used in coded descriptions
Note: The non-abstract derived types are to be used in instance documents. The type names have been shortened to simplify instance documents, especially if an xsi:type would be used (Char xsi:type='CatSummaryData').
- AbstractCharSummaryData (Abstract Complex type, based on CharacterRef by extension): Abstract base type. Used in CodedDescription/CodedData/Char to make statements for a single character in a class or specimen.
- CatSummaryData (Complex type, based on AbstractCharSummaryData by extension): # Derived from AbstractCharSummaryData to be used for categorical (char. state) data in instance documents (non-abstract type)
- QuantSummaryData (Complex type, based on AbstractCharSummaryData by extension): # Derived from AbstractCharSummaryData to be used for numerical (statistical measures) data in instance documents (non-abstract type)
- MolecularSequenceData (Complex type, based on AbstractCharSummaryData by extension): # Derived from AbstractCharSummaryData to be used for letter-sequence data (especially nucleotide and protein sequences)
- TextCharData (Complex type, based on AbstractCharSummaryData by extension): # Derived from AbstractCharSummaryData to be used for unconstrained text ("Text-fields")
b) types used inside the CharSummaryData-derived types
- StateData (Complex type, based on CharacterStateRef by extension): A categorical state including frequency, state modifier, and Notes
- DataStatusData (Complex type, based on DataStatus by extension): Similar to StateData, but for status values like '-' (= inapplicable) or '?' (= data unavailable). It support notes, but no modifiers!
c) A collection of summary character data, containing a choice of derived character data types (polymorphic structure, choice options are equivalent to use of base type plus xsi:type).
- SummaryDataSet (Complex type, based on Set by extension): A collection of character summary data elements (all of which are derived from CharSummaryData abstractType
4. --- Character references in coded descriptions: SampleData
a) abstract and non-abstract derived types used in sample data
- AbstractCharSampleData (Abstract Complex type, based on CharacterRef by extension): Abstract base type. Used in CodedDescription/SampleData/
Sample/SamplingUnit.
- CatSampleData (Complex type, based on AbstractCharSampleData by extension): # Derived from abstract CharSampleData to be used for categorical (char. state) data in instance documents (non-abstract type)
- QuantSampleData (Complex type, based on AbstractCharSampleData by extension): # Derived from abstract CharSampleData to be used for numerical data in instance documents (non-abstract type) in coded descriptions (Sample/ SamplingUnit). Attribute value (xs: double) is for directly measured/observed values. Not for statistical measures; these cannot occur in sampling units!
5. --- Character references in coded descriptions: SampleData
a) abstract and non-abstract derived types used in natural language descriptions. Lacking multiple inheritance mechanisms in xml schema, these Markup versions have been derived independently. They are designed to be closely related to corresponding types in the coded description, however.
- AbstractCharacterMarkup (Abstract Complex type, based on CharacterRef by extension): Abstract base type. Used in NaturalLanguageDescriptions.
Note: although Text and DataStatus scoring is common to all derived types, it can not be defined here. The markup of natural language should follow the original text sequence and type derivation would impose an xml schema sequence constraint.
- CategoricalMarkup (Complex type, based on AbstractCharacterMarkup by extension): # Extends the abstract CharacterMarkup for use with categorical (char. state) data
- QuantitativeMarkup (Complex type, based on AbstractCharacterMarkup by extension): # Extends the abstract CharacterMarkup for use with numerical (statistical measures) data as well as a list of sample measurement values.
("ColorRangeMarkup" (color polygon measurement data) or "SequenceMarkup" (molecular or other sequences) are not supported at the moment, since the author do not expect to find them in natural language descriptions. If necessary, these types will be added.)
b) The following NLD type refers to concept nodes and has no corresponding types in SummaryData/SampleData:
- MarkupGroup (Element group): Used in ConceptMarkup
and root of NLD (without
a ref to concept). (Note:
Modeling through class
derivation alone would
require multiple
inheritance.)
- NaturalLanguageMarkup (Complex type): The root of natural language markup is identical to ConceptMarkup, except that the concept ref attribute is prohibited.
- ConceptMarkup (Complex type, based on DescriptiveConceptRef by extension): Used in NaturalLanguageDescriptions. Refers to concepts (i. e. nodes defined in concept trees)
c) types used inside the CharacterMarkup types
- MarkupText (Complex type, based on LongString by extension): Formatted text with an additional optional attribute "parsed" (default=false). Used for Text and Note elements inside the NaturalLanguageDescription container.
- DataStatusMarkup (Complex type, based on DataStatus by extension): Variant of DataStatusData to be used inside the NaturalLanguageDescription markup container.
- StateMarkup (Complex type, based on CharacterStateRef by extension): Variant of StateData to be used inside the NaturalLanguageDescription markup container.
- ValueMarkup (Complex type): For single values (singleton observation or values in a sample).
- (Complex type, based on UnivarSimpleStatMeasureData by extension): (Used inside Quantitative markup)
- (Complex type, based on UnivarParamStatMeasureData by extension): (Used inside Quantitative markup)
TERMINOLOGY-BASED DATA END
DESCRIPTIONS START
Descriptions are either natural language with optional markup or coded descriptions. Both are derived from the same base type:
- AbstractDescription (Abstract Complex type, based on VersionedAbstractObject by extension): Abstract base type for NaturalLanguageDescription
and CodedDescription.
The id attribute is currently not used in keyrefs from within this schema. However, it is considered generally useful to uniquely identify descriptions in federated situations.
- DescriptionScopeSet (Complex type, based on ExtendedScopeSet by extension): Extension of Scope base classes for descriptive concepts
- NaturalLanguageDescription (Complex type, based on AbstractDescription by extension): Descriptions entered as free-form text with optional (and potentially incomplete) markup referring to concepts (= char. tree nodes), characters, and states as defined in the terminology.
- CodedDescription (Complex type, based on AbstractDescription by extension): Coded description data are highly controlled by the vocabulary and structures defined in the Terminology, using references to characters, states, modifiers, numerical values for measurements. They also support a limited amount of free-form text (in Notes or Annotation only). Separating data and terminology allows rearranging and refactoring the terminology, multilingual support through central terminology translations, and multiple hierarchical views.
Coded descriptions must fulfill more rigorous consistency requirements than natural language descriptions and are more suitable for analysis. Furthermore, language-dependent annotations are minimized so that data can be easily reorganized and translated into multiple languages.
Note: Representation/MediaObject of entire description may contain media like images that are not specific to a character (else add them to character elements below).
A special subtype of CodedDescription are original sampling data, which are organized into referrable SamplingEvent containers:
- SamplingEventSet (Complex type, based on Set by extension): Collection of Sampling events
- SamplingEvent (Complex type, based on AbstractEvent by extension): A container for a sample data, with repeated sampling units, each of which may record multiple characters that are observed together.
- SamplingUnitDataSet (Complex type, based on Set by extension): (This is equivalent to the SummaryDataSet - the sample data structure has the intermediate event for which no equivalent exists in summary data!)
- SamplingEventRef (Complex type, based on AbstractLocalRef by extension): Refers to a specific SampleData/SamplingEvent in a CodedDescription.
DESCRIPTIONS END
IDENTIFICATION KEYS START
Stored identification keys (esp. manually designed as opposed to automatically generated) are stored in a separate section:
- StoredKey (Complex type, based on VersionedAbstractObject by extension): Defines a stored identification key (dichotomous or multifurcating key) that has been digitized from printed publications or manually created to express expert knowledge that would not be available in dynamically created dichotomous keys (using Ratings from terminology and a 'find next best character' to minimize the average search tree).
- StoredKey_LeadSeq (Complex type, based on Seq by extension)
- StoredKeyAbstractNode (Complex type): Base type common to both lead and leaf nodes in a stored identification key.
- StoredKeyNode (Complex type, based on StoredKeyAbstractNode by extension): An inner node in a stored identification key, containing the lead statement to follow and optionally a next question answered by the following statements. The terminal nodes of the tree is defined by a separate Leaf type.
- StoredKeyLeaf (Complex type, based on StoredKeyAbstractNode by extension): A leaf (terminal node) in a stored identification key, containing the lead statement to follow plus the result of the key. Most frequently the result will point to a taxon/class name, but it may also point to a subkey, or create a 'reticulation' directly into this or another key.
Leaves have no id attribute and cannot be referenced.
- StoredKeyRef (Complex type, based on AbstractLocalRef by extension): Refers to an entire stored identification key (e. g., if a key is referenced as a subkey from within another key)
- StoredKeyNodeRef (Complex type, based on AbstractLocalRef by extension): Refers to a node in a stored key (e. g., for reticulating keys)
IDENTIFICATION KEYS END
Other basic types used by SDD (compare also the types used by UBIF)
Character rating (equivalent to DELTA wheight, reliability, etc., but characters are scored taxon specific in descriptions rather than for all taxa)
- Rating (Complex type): A rating of 1 (low) to 5 (high), with 3 as central value, the topic that is being rated, plus an optional indication whether inherited (= calculated based on related definitions) or defined directly.
- RatingSet (Complex type, based on Set by extension): A collection of ratings to rate characters for conveniency, etc. This is especially relevant during interactive identification to rank the remaining characters for discriminative power and convenience.
Special types for natural language wordings:
- AbstractVocabularyBase (Abstract Complex type, based on AbstractObject by extension): Base of modifiers, stat. meas. elaborations, and categorical character states (both referrable concept and local character state definitions).
Any use of a modifier or character state in descriptions is a reference to an object derived from this class.
- NatLangPhraseString (Complex type, based on LongStringL by extension): A text element used to define wordings for natural language output. Currently the type only adds a role attribute. Further attributes may be required if the handling of leading and trailing blanks should not work in an interoperability context (e. g., attributes like BlankBefore / BlankAfter of type BooleanTripleState?).
- NaturalLanguagePhraseSet (Complex type, based on Set by extension)
(Generated on 23. May 2006 by DiversitySchemaTools Version 0.5. Copyright (c) G. Hagedorn 2006.)