NaturalLanguageDescriptions

GregorHagedorn - Sat Jun 13 2009 - Version 1.14
Parent topic: SddContents

SDD Part 0: Introduction and Primer to the SDD Standard

2.3 Natural language descriptions.

2.3.1 Traditional natural language descriptions.

Natural-language descriptions (Box 2.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.

Box 2.3.1 - Typical natural language descriptions

Red Knot (Calidris canutus)
Stout wader with bill same length as head, crown unstreaked, narrow white bar in wing, pale rump with grey barring, shortish olive legs. Non-breeding: grey above with narrow pale edging to feathers, pale eyebrow, smudged sides to neck with faint spotting. Juvenile: feathers of back edged white with dark subterminal bar, breast more heavily spotted pale buff and flanks barred, crown faintly streaked. Breeding: rufous underparts, feathers of back rufous patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.

from Slater, P., Slater, P. & Slater, R. (2001) The Slater Field Guide to Australian Birds  (Reed New Holland: Sydney)

Discaria pubescens (Brongn.) Druce
Rigid, spreading shrub to c. 1 m high and wide; stems glabrous. Leaves soon deciduous, c. oblong, to 10 mm long, 3 mm wide, obtuse or minutely mucronate within an apical notch, margins minutely toothed, surfaces glabrous or a few hairs present near tip; stipules dark reddish-brown, c. 1 mm long, often shallowly joined around the node, pubescent on inner face; spines stout, 1.5-4 cm long. Flowers white, solitary or in few-flowered axillary cymes, sometimes congested on short apical shoots; pedicels 2-3 mm long; hypanthium c. 1.5 mm long; sepals somewhat spreading, 1-1.5 mm long; petals attached at throat of hypanthium, c. 1 mm long; stamens subequal to and weakly hooded by petals; disc prominent, lining base of hypanthium, obscurely 5-angled; style minute. Capsule prominently 3-lobed, 4-5 mm diam., the valves separating incompletely at maturity and splitting dorsally and medially.

from Walsh, N.G. (1999) Rhamnaceae, in N.G.Walsh & T.J.Entwisle, Flora of Victoria Volume 4, Dicotyledons, Cornaceae to Asteraceae (Inkata Press: Melbourne)

There are two methods for the production of natural language descriptions within SDD.

  • Descriptions may be produced elsewhere and simply stored within an SDD instance document, these are "authored natural language descriptions"
  • Descriptions may be generated from data and text snippets sourced from within the SDD instance document, these are termed "marked up natural language descriptions".

2.3.2 Authored natural language descriptions.

Authored natural language descriptions are simply descriptions written by hand, either within an application or imported into an application, including legacy descriptions sourced from existing publications. Within SDD "authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and generated descriptions may contain markup (data supplied from a coded data source) but this is not required. All natural language descriptions are nested within the <NaturalLanguageDescriptions> element within <Dataset>.

A natural language description requires only two essential items: the names of the taxa being described, and the descriptions themselves.

A simple SDD instance document for natural language descriptions has the basic structure shown below and in Example 2.3.2.

Example 2.3.2 - Anchored natural language descriptions

		<NaturalLanguageDescriptions>
			<NaturalLanguageDescription id="nat1">
				<Representation>
					<Label>Acalypha L.</Label>
				</Representation>
				<Scope>
					<TaxonName ref="t1"/>
				</Scope>
				<NaturalLanguageData>
					<Text>Herbs, shrubs; or trees, monoecious or rarely dioecious.
    Leaves alternate, margins usually dentate or crenate. Flowers small,
    males and females in separate axillary spikes or females solitary in
    separate axils or one or more at or near base of male spikes; male
    flowers clustered in axillary spikes with small bract under each cluster,
    perianth of 4 segments, glands absent, stamens 8 or rarely 8-16 inserted
    on a raised central receptacle, filaments free; female flowers 1-4
    together within a leafy bract, bracts solitary or in spikes,
    perianth of 3 segments, rarely 4, styles distinct, finely branched.
    Fruits capsules.</Text>
				</NaturalLanguageData>
			</NaturalLanguageDescription>
			<NaturalLanguageDescription id="nat2">
				<Representation>
					<Label>Acalypha australis L.</Label>
				</Representation>
				<NaturalLanguageData>
					<Text>Herb up to 30 cm tall, stems and leaves often pink or red. 
    Leaves with petioles 1-2 cm long; blades ovate or subrhomboid, apex 
    acuminate, base acute or obtuse, margin serrulate-crenate, 2-6 cm X 1-3.5 cm. 
    Spikes short, 1-3 per axil, peduncles ca 0.5-1 cm long or longer; bracts up
    to ca 1.5 cm long.</Text>
				</NaturalLanguageData>
			</NaturalLanguageDescription>
		</NaturalLanguageDescriptions>

For more information on defining taxon names using the <TaxonNames> element, see the topic Defining taxon names.

Note that taxa can also be arranged into hierarchies. See the topic Defining taxon hierarchies for more information.

The <Representation> element provides a label for the description. This may be useful if the instance document includes multiple descriptions for different purposes.

<Scope> describes the taxon or set of taxa to which the description applies.

The <NaturalLanguageData> element contains the text of the natural language description.

2.3.3 Marked up natural language descriptions.

Marking up of natural language descriptions allows parsing of matrix data into natural language descriptions and modification of character and state names for inclusion in natural language descriptions. "Authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and "generated" descriptions may have markup, but do not need to. The sdd standard is capable of storing data with partial markup, resulting from any mixture of automatic markup by a processor or manual markup.

-- Main.DonovanSharp - 01 Jun 2006

  • Long expanded version of NLD structures:
    NaturalLanguageMarkupLong.png