UBIF_EnumLib.xsd schema file overview
(Version: Unified Biosciences Information Framework (UBIF) 1.1)
TDWG working group: Structure of Descriptive Data (SDD)
Introduction
This document gives an overview of the schema components present in a single schema file, similar to the entry view provided by graphical schema editors. It documents only the root level annotations and components (elements, global attributes, simple and complex types, and groups). The definition of the components listed here is documented separately (hyperlinking could not yet be implemented).
Because the UBIF schema is designed as a type library, complex types represent class definitions and most schema files contain only a single root-level element.
Please see the schema documentation resource directory for schema overviews of other files and detailed component documentation.
Schema file content
The following content is generated automatically from the documentation inside the schema file:
Unified Biosciences Information Framework (UBIF) XML schema. This part provides controlled vocabularies (= enumerated values) both specific to biosciences and for general topics required in the context of biosciences data. See the main UBIF.xsd file for complete information, copyright and licensing.
Copyright © 2006 TDWG (Taxonomic Databases Working Group, www.tdwg.org). See the file UBIF_(c).xsd for authorship and licensing information.
Enumerations to support interoperability
Internal formatting note: Annotations of individual enumerated values should be written as "short label" + " -- " + "detailed information" or "[abbreviation]" + " -- " + "short label" + " -- " + "detailed information". Abbreviation must be enclosed in square brackets []. Additional specification information may be given for some enumerations in xs:appinfo. An xslt script transforms such schema annotations into a data document that can directly be used in user interfaces.
An important feature of this schema file is that these enumerations may be turned into data, many of them including extra specification data. Please see UBIF-EnumerationTools for further information. Using the data files in application development rather than hardcoding enumerations in code enables simple adaption to future versions of UBIF.
a) Generic enumerations
- RevisionStatusEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the revision status as assessed by creators or editors or objects. It may apply to an entire data set as well as to individual objects (a specimen, a taxon name, a description, etc.). Exact semantics are defined only for the first and the last two categories. The semantics of the intermediate levels (1 to 5) may be chosen freely by the user (and associated with the actual, project dependend workflow, e.g. 'draft', 'review', 'approved', 'stable'). The added semantics should, however, conserve the order of revision status values. If, for example, three revision steps are planned (2 intermediate, reaching FullyRevised on third), it is recommended to use RevisionLevel2, RevisionLevel4, FullyRevised.
- ExpertiseLevelEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the expertise exprected or required from human consumers of data or services. Values are restricted to integer values from 0 to 5. 0 is defined as unspecified level, and 1 to 5 indicates expertise from schoolchildren to taxonomic expert. See the description of the values for recommendations for interpreting and choosing the expert level.
- ResourceTypeEnum (Simple type, based on xs:QName by restriction): This enumeration is identical with the DCMI Type Vocabulary (http: //dublincore.org/documents/dcmi-terms/, as of 6/2004), except that an additional type "Other" has been added. Its purpose is to provide a framework of broad media or resource type terms, without the technical detail provided by the large number of MIME types. The annotations are largely based on those from the DublinCore metadata initiative vocabulary.
- TelephoneDeviceEnum (Simple type, based on xs:QName by restriction): Kind of phone number: voice, fax, mobile, pager, modem. These enumerated values are identical with vCard 3.0 flags (several of which can be added to a single phone number; to represent this in the UBIF interface duplicate the phone number itself!)
- Rating1to5Enum (Simple type, based on xs:QName by restriction): Enumeration restricted to integer values from 1 to 5, indicating an arbitrary rating (meaning, e. g., 1 = disagree strongly, 2 = rather disagree, 3 = neutral or undecided, 4 = rather agree, 5 = agree strongly). This enumeration is of limited usefulness and could be replaced by an restriction on integer, but using the enumeration the semantics of agreement/disagreement or positive/negative rating can be communicated in a culture-neutral way (in German 1 is generally considered best and 5 worst, in English 1 worst, 5 best...).
- TextDirectionalityEnum (Simple type, based on xs:QName by restriction): Values are ltr (left to right), rtl (right to left). Compare CSS2 and the XHTML 2.0 bi-directional text module. Note: A future UBIF version may also include lro/rlo = left-right-overide/right-left-overide, if this is found to be necessary.
- StringFormattingTypeEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing whether and which kind of inline formatting may be contained in a text literal (plain, inline-entity-encoded, html-level-entity-encoded). In the absence of an attribute providing further specification, most UBIF text elements may contain "inline" entity-escaped formatting! (Other standards: ~=atom:title/@type)
- MeasurementUnitPrefixEnum (Simple type, based on xs:QName by restriction): Multiplication factor prefixes used in the scientific SI system (T, G, M, k, h, c, m, µ, n, p, f, a).
b) Statistical and data analysis categories
- StatisticalMeasurementScaleEnum (Simple type, based on xs:QName by restriction): In statistical analysis it is often vital to know some basic properties of the values that are being analyzed. Some of these properties can be summarized in the form of a measurement scale. Higher scales can always be analyzed under the assumptions of a lower scale (ordinal data can be analyzed as nominal, ratio as interval).
- QuantitativeMeasurementScaleEnum (Simple type, based on StatisticalMeasurementScaleEnum by restriction): Those values from StatisticalMeasurementScaleEnum addressing numerical data ('ratio' and 'interval').
Note: Occasionally "integer" or "cardinal" (versus real numbers) are also considered part of the measurement scale. This should be avoided because: a) All combinations of interval/ratio and discrete/continous are possible. b) The important distinction is whether a measurement is based on a continuous or discrete scale. Although in most cases this is equivalent with integer versus real numbers, it is not necessarily so. An ANOVA will report false significance not only when values come from "1, 2, 3 and 4", but also when they come from "1.2, 2.4, 3.6 and 4.8".
- CategoricalMeasurementScaleEnum (Simple type, based on StatisticalMeasurementScaleEnum by restriction): Those values from StatisticalMeasurementScaleEnum addressing categorical data ('nominal' and 'ordinal').
- DataStatusEnum (Simple type, based on xs:QName by restriction): Data status values (examples: "unknown", "not applicable") identify standardized reasons why data are missing. Alternative names are 'missing data indicators', 'special states', 'Null-values'. The annotation labels of the values can be freely changed as long as the semantics are preserved.
Statistical methods
- UnivarAnyStatMeasureEnum (Simple type, based on xs:QName by restriction): Enumerated list of univariate statistical methods. The list is intended to be more complete than normally necessary at least in biological morphometrics. If you still miss some measures, please request additions in a future version of this schema. Note: No satisfying external ontology for statistical methods could be found; the statistics section of MathML 2.0 (statistics.xsd) seems rather incomplete!
- UnivarSimpleStatMeasureEnum (Simple type, based on UnivarAnyStatMeasureEnum by restriction): An enumeration of univariate statistical measures supported by UBIF (esp. used by SDD). Compare also UnivarParamStatMeasureEnum, containing further statistical measures that use an additional parameter (for percentage of percentile or confidence interval, etc.).
- UnivarParamStatMeasureEnum (Simple type, based on UnivarAnyStatMeasureEnum by restriction): An enumeration of parameterized univariate statistical measures supported by UBIF (esp. used by SDD). This enumeration is similar to UnivarSimpleStatMeasureEnum, but here a parameter value is supported. Abbreviation, Label and Details (within xs:documentation) may contain the string "{ParameterValue}". Labels are expected to become meaningful if this is replaced with the actual parameter value.
c) Agent role codes
- AgentRoleEnum (Simple type, based on xs:QName by restriction): Provides codes for roles like author, editor, photographer, advisor, or copyright holder. The roles and their codes used here are based on http://www.loc.gov/marc.relators/ (as of 2004/6 available at http://dublincore.org/usage/meetings/2004/03/Relator-codes.html). For example, the enumerated code "aut" for author corresponds to http://www.loc.gov/marc.relators/aut. The DublinCore Agents group is considering using the same codes (see e. g. http://www.loc.gov/marc/dc/Agent-roles.html), but as of 2004/6 the DublinCore Agents subgroup did not yet agree on a Creator/Contributor refinement as qualified DublinCore. Note that the roles selected here are a subset of the roles that are available in MARC. See second annotation for reasons of not using a union-design, which would be easier to maintain!
- AgentCreatorContribRoleEnum (Simple type, based on AgentRoleEnum by restriction): Union of AgentCreatorRoleEnum and AgentContributorRoleEnum, but no Owner roles. Technical note: currently this is modeled somewhat strangely as an xml-schema restriction of AgentRoleEnum (= union of all three basic role groups). This is a workaround for a problem Xerxes 2.6.2 detected by Jacob Asiedu! We hope that in the future the whole 'xs:restriction' below can be replaced again with the much more straightforward 'xs:union memberTypes="AgentCreatorRoleEnum AgentContributorRoleEnum"/' (= union of the two intended AgentRole enumerations).
- AgentCreatorRoleEnum (Simple type, based on AgentRoleEnum by restriction): Enumeration of roles supported for creator agents. See AgentRoleEnum for information about the MARC relator codes.
- AgentContributorRoleEnum (Simple type, based on AgentRoleEnum by restriction): Enumeration of supported roles for contributor agents. See AgentRoleEnum for information about the MARC relator codes.
- AgentOwnerRoleEnum (Simple type, based on AgentRoleEnum by restriction): Enumeration of supported roles for owner/copyright agents. See AgentRoleEnum for information about the MARC relator codes.
d) Label/Title/Detail roles, Media representation roles, Link relationships, IPR Statement roles
- LabelRoleEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the kind of label text. These are currently highly constrained, but either additional values or free extensibility (by union of this enum type with xs:anyURI) are expected for future releases of UBIF.
- DetailRoleEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the kind of long 'detail' text. In contrast to label text, detail text to which these values are applied are not restricted in length. Note that the text length of detail text elements is not limited by the schema. It is, however, recommended that the length does not exceed 30000 characters because longer text may lead to interoperability problems. Currently the enumerated values are highly constrained, but either extension or free extensibility (by union of this enum type with xs:anyURI) is expected for future releases of UBIF. Note that a 'description' is usually presented instead of the object, and a 'caption' always together with the object. An 'abstract' may be presented instead or together with the object.
- MediaRepresentationRoleEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the object representation role a media object (image, audio/video, rich text) may have.
- LinkingRelEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the semantics of a Link uri. The vocabulary includes all values from IRPStatementRoleEnum. These are currently highly constrained, but future version may add either more values or free extensibility (union of this type with xs:anyURI).
- IRPStatementRoleEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing the kind of IPR (= intellectual property right) statements.
e) codes expressing sexual status (SexStatusEnum is a union of the following types)
- SexStatusEnum (Simple type, based on xs:QName by restriction):
Codes for sex value in humans (clinical status) or animals. The codes are largely based on those defined in DICOM (Digital Imaging and Communications in Medicine, http://medical.nema.org/, Coding Scheme Designator DCM Version 01, PS3.16 Annex B, CID 7455) and ASTM E1633 (= "Standard Specification for Coded Values Used in the Electronic Health Record. Document Number: ASTM E1633-02a. ASTM International, 10-Nov-2002, 76 pages"). Additional codes specific to biology have been added.
An alternative standard is ISO 5218, which provides only four codes: "0 = Not known, 1 = Male, 2 = Female, 9 = Not specified". The difference between 0 and 9 is: "(0) implies that the sex of the person is not provided in the personal details i.e. the data has not been supplied and sex cannot be ascertained from the data provided"; "(9) implies that the sex of the person cannot be determined for physical reasons, e. g. a new born baby". ISO 5218 contains fewer and less intuitive codes. For biological purposes many codes would have to be arbitrarily added. G. Hagedorn, 10. August 2004
- BasicSexStatusEnum (Simple type, based on SexStatusEnum by restriction): Contains basic sex type codes, sufficient for recording human sexes in most administrative contexts (used, e. g., in the Agent type data interface)
- AdditionalSexStatusEnum (Simple type, based on SexStatusEnum by restriction): Contains codes in addition to those defined in BasicSexStatusEnum that are necessary for animals and clinical sex descriptions of humans. Additional codes "S, I, HM, HF, HT" has been added to those defined in DICOM /ASTM E1633. On the other side, the DICOM /ASTM E1633 codes "MP = male pseudohermaphrodite" and "FP = female pseudohermaphrodite" are omitted here because they are limited to human sex and express a politically contentious perspective (see http://en.wikipedia.org/wiki/Pseudohermaphrodite). See the UBIF type SexStatusEnum for a union of the enumerated values in this type and those in BasicSexStatusEnum.
f) Enumerations specific to the biological domain
- TaxonHierarchyTypeEnum (Simple type, based on xs:QName by restriction): Defines the type of a taxon hierarchy (list of enumerated values to support application interoperability).
- IdentificationCertaintyEnum (Simple type, based on xs:QName by restriction): Identifications of an Specimen (object/unit) as belonging to a taxon concept may be uncertain. This is especially important in biology, where identification qualifiers like 'cf.' or 'aff.' are often used as part of the scientific name. The following enumerated list provides general categories not restricted to scientific organism names. Note: In biology additional expression is often expressed through the choice of placement of the certainty qualifier. For example, 'Echinonema ferruginea var. campestris' may be qualified as 'cf. Echinonema ferruginea var. campestris', 'Echinonema cf. ferruginea var. campestris', 'Echinonema ferruginea cf. var. campestris'. The first presumably means that the entire name is uncertain, but the infraspecific name may be appropriate, the second indicates that the genus is certain, the species uncertain, and the final that the species in certain and only the infraspecific rank is uncertain. To achieve this level of expressiveness, it is recommended that an additional data element 'IdentificationUncertainTaxonomicRank' of type TaxonomicRankEnum may be combined with an element of IdentificationCertaintyEnum. IdentificationUncertainTaxonomicRank should be optional and omitted to express that an identification is unknown, but the rank not known (e. g. in 'Echinonema ferruginea?'). In ABCD 1.44 a special rank with enumeration beforeName, beforeFirstEpithet, beforeSecondEpithet is used instead.
- NomenclaturalTypeStatusOfSpecimensEnum (Simple type, based on xs:QName by restriction):
This list is a first version of a constrained vocabulary to express typifying relations between taxonomic names and specimens (objects/units preserved in collections). Beyond those type categories explicitly governed by nomenclatural codes (Zoology, Botany, Bacterioloy, Virology), the list also includes some additional type status terms. These categories may be helpful when interpreting the original circumscription (topotypes, ex-types), but do not have the same binding status as terms governed by the nomenclatural codes. The enumeration attempts to strike a balance between listing all possible terms, and remaining comprehensible. In general, including too many terms was considered less problematic than omitting terms. Applications may easily select a subset for presentation in their user interface.
This list is intended as a first version and it is hoped that in the review process through TDWG it will achieve sufficient maturity to be truly useful. It is expected that over time revisions will have to be made. Please use the WIKI (http://wiki.tdwg.org/twiki/bin/view/UBIF/NomenclaturalTypeStatusOfSpecimensDiscussion) to discuss the current list and the lists of synonymous, doubtful, or excluded type terms provided therein.
Some background information: A type provides the objective standard of reference to determine the application of a taxon name. The type status of a specimen is only meaningful in combination with the name that is being typified (a specimen may have been designated type for multiple names in different publications). The type status of an object may be designated in the original description of a scientific name (original designation), or - under rules layed out in the respective nomenclatural codes - at a later time (subsequent designation). -- For taxa above species rank the type is always a lower rank taxon (e. g., species for genus, genus for family). The type terms for this situation are not included in the enumeration. Ultimately, typication of all taxa goes back to physical type specimens, but this should not be recorded as such in data sets. The indirect type reference in higher taxa means that typification changes to the lower taxon automatically affect the higher taxon.
The exact definitions of type status differ between nomenclatural codes (ICBN, ICZN, ICNP/ICNB, etc.). The term definitions are intended to be informative and generally applicable across the different codes. The should not be interpreted as authoritative; in nomenclatural work the exact definitions in the respective codes have to be consulted. A duplication of status codes (bot-holo, zoo-holo, bact-holo, etc.) is not considered desirable or necessary. Since the application of the type status terms is constrained by the relationship of the typified name with a specific code, the exact definition can always be unambiguously retrieved.
The following publications have been consulted to determine the number of type terms that should be included and to prepare the semantic definitions:
- Nomenclatural Glossary for Zoology (January 18 2000; ftp://ftp.york.biosis.org/sysgloss.txt; verified 17. June 2004)
- ICBN St. Louis Code (http://www.bgbm.fu-berlin.de/iapt/nomenclature/code/SaintLouis/0013Ch2Sec2a009.htm; verified 17. June 2004)
- Draft BioCode 4th version (Greuter et al., 1997; http://www.rom.on.ca/biodiversity/biocode/biocode1997.html)
- Glossary of 'type' terminology (Ronald H. Petersen; http://fp.bio.utk.edu/mycology/Nomenclature/nom-type.htm)
- Dictionary of Ichthyology (Brian W. Coad and Don E. McAllister, 2004; http://www.briancoad.com/Dictionary/introduction.htm)
- A useful resource that was not available when writing this proposal might be: Hawksworth, D.L., W.G. Chaloner, O. Krauss, J. McNeill, M.A. Mayo, D.H. Nicolson, P.H.A. Sneath, R.P. Trehane and P.K. Tubbs. 1994. A draft Glossary of terms used in Bionomenclature. (IUBS Monogr. 9) International Union of Biological Sciences, Paris. 74 pp.
"not a type" was added from the enumeration published in TaxonConceptSchema v 0.8 by J. Kennedy & Robert Kukla in October 2004
Dr. Miguel A. Alonso-Zarazaga and Dr. Walter Gams are thanked for review and help. - Gregor Hagedorn, 13.7.2004-17.11.2004
- NomenclaturalStatusEnum (Simple type, based on xs:QName by restriction): Controlled vocabulary expressing nomenclatural status of a biological taxon name. ### This needs urgently revision! Enumeration of possible values for nomenclatural status. (Source: initial LinneanCore.)
- NomenclaturalCodesEnum (Simple type, based on xs:QName by restriction): Enumeration of nomenclatural codes under which a name is considered valid. (Source: comparison of enumerations in ABCD 1.49 and first LinneanCore draft.) - Gregor Hagedorn
TaxonomicRankEnum is the superset of the values in the types following it:
- TaxonomicRankEnum (Simple type, based on xs:QName by restriction):
Enumerated codes to express the rank of a taxon (scientific organism name) in a taxonomic hierarchy. The list is intended to be interoperable between name providers for bacteria, viruses, fungi, plants, and animals. It is not assumed that in each taxonomic group all ranks have to be used. Individual applications may select appropriate subsets (which may be based on information given inside the enumerated values, see Specifications/BioCode-, Botany-, Zoology-, and BacteriaStatus). The enumeration attempts to strike a balance between listing all possible rank terms, and remaining comprehensible. For example, the "infra-" ranks specifically mentioned in BioCode have been included (although very rarely used), but the additional intermediate zoological ranks (micro, nano, pico, etc.) are not included. Whether the selection of infraspecific ranks (some informal ranks, esp. from bacteriology, may be missing!) probably needs some discussion. However, it is believed that this list may help to start developing data sets that can easily be integrated across the barriers of language and taxonomic traditions.
Not included in the list are the botanical "notho-" ranks, which are used to designate hybrids (nothospecies, nothogenus). It is assumed they can be generated from separate information that the taxon is a hybrid. ICBN §4.4 states: "The subordinate ranks of nothotaxa are the same as the subordinate ranks of non-hybrid taxa, except that nothogenus is the highest rank permitted".
The following publications have been consulted to determine the number of type terms that should be included and to prepare the semantic definitions:
- The Berlin Taxonomic Information Model, MoReTax view (Berendsohn & al., http://www.bgbm.org/scripts/ASP/BGBMModel/Catalogues.asp?Cat=MT
- DiversityTaxonomy model version 0.7 (G. Hagedorn & T. Gräfenhan 2002, http://www.diversitycampus.net/Workbench/Taxonomy/Model/InformationModels.html)
- ABCD version 1.44, types HigherTaxonRankType and RankAbbreviationType, by W. Berendsohn, reviewed by D. Hobern
- TaxCat2 - Database of Botanical Taxonomic Categories by Jörg Ochsmann, IPK Gatersleben; http://mansfeld.ipk-gatersleben.de/TaxCat2/default.htm
Many thanks for review and help go to Dr. Walter Gams. - Gregor Hagedorn, 13.7.2004-17.11.2004.
Note: the list of all ranks is implemented as a union of all following rank subsets. Note that although BioCode has been used to define the partition into subsets, the ranks are not limited to BioCode but should be an interoperable superset of ranks used in Virology, Bacteriology, Botany and Zoology.
Technical note: It would be preferable to define the values in separate types and define this type as xs:union memberTypes="TaxonomicRankBelowSubspeciesEnum TaxonomicRankSpeciesGroupEnum TaxonomicRankGenusSubdivisionEnum TaxonomicRankGenusGroupEnum TaxonomicRankFamilySubdivisionEnum TaxonomicRankFamilyGroupEnum TaxonomicRankAboveSuperfamilyEnum". However, as explained in the annotation of AgentRoleEnum, Stylus Studio and Xerces have problems with type derivations involving union types. Therefore the current work-around was chosen.
- TaxonomicRankBelowSubspeciesEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode "infra-subspecfic", i.e. below the species group
- TaxonomicRankSpeciesGroupEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode "species group", i.e. only species and subspecies
- TaxonomicRankGenusSubdivisionEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode ""subdivision of a genus" ", i.e. all ranks between genus and species group (i.e. not including subgenus and species)
- TaxonomicRankGenusGroupEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode "genus group", i.e. infragenus to genus
- TaxonomicRankFamilySubdivisionEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode "subdivision of a family", i.e. ranks between genus group and family group
- TaxonomicRankFamilyGroupEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode "family group", i.e. infrafamily to superfamily
- TaxonomicRankAboveSuperfamilyEnum (Simple type, based on TaxonomicRankEnum by restriction): Subset of ranks; equivalent to BioCode "suprafamilial". This rank group includes all ranks higher than superfamily (class, phylum/division, kingdom, domain)
(Generated on 23. May 2006 by DiversitySchemaTools Version 0.5. Copyright (c) G. Hagedorn 2006.)