Network Activity D - Collections Standards

NA D is engaging and aiding the active participation by all European Countries in the delivery of information systems about their taxonomic collections. This system is bringing the information held by SYNTHESYS partners to all the user sectors that depend on organism-related information for their research and/or decision making purposes.

NA D builds on and actively incorporates the results of initiatives such as BioCASE and ENHSIN, which have laid the base for a European collection information service. It is taking the research work from these initiatives as the core of a practical, sustainable, Europe-wide information network for users of taxonomic information.

Aims and objectives

NA D will engage and aid the active participation by all European Countries in the delivery of information systems about their taxonomic collections. This system will bring the information held by SYNTHESYS partners to all the user sectors that depend on organism-related information for their research and/or decision making purposes.

NA D will build on and actively incorporate results of initiatives such as BioCASE and ENHSIN, which have laid the base for a European collection information service, and aims to take the research work from these initiatives as the core of a practical, sustainable, Europe-wide information network for users of taxonomic information.

There will also be interaction with global biodiversity informatics infrastructure, represented by GBIF, and related European projects, such as ENBI.

  • Extend the amount and increase the technical quality and availability of networked data resources
  • Pool, extend, and synchronise semantic definitions, standard data, and data standards
  • Develop User interfaces

Progress report

Extend the amount and increase the technical quality and availability of existing data

NA D runs two helpdesks to increase the technical quality and availability of data: The SYNTHESYS Technology Helpdesk provides support for collection holders intending to use the BioCASE provider software for connecting their collections to the BioCASE/GBIF network infrastructure. The helpdesk offers help in understanding the principles of the architecture, planning the actions and preparing the collection databases for publication, installing and configuring the provider software, troubleshooting, mapping the collection database to the ABCD schema as well as bridging the gap between the collection holder and the GBIF secretariat in Copenhagen.

The number of BioCASE records in GBIF reached 16.5 million in 2008 (14% of the total records in GBIF) which are partly credited to the SYNTHESYS Helpdesk. The Technology Helpdesk has provided support to institutions in Austria, Belgium, Denmark, Estonia, Germany, Hungary, Israel, Luxembourg, Netherlands, Spain, Portugal, Slovakia, UK, and Australia.

The Content Helpdesk focuses on checking data and providers are aided in improving their data. So far, 386 data sets including more than 12 million records were checked and errors were communicated to provider IT personnel and responsible biologists. The response to this feedback has been very positive and work is underway to correct the entries.

For both helpdesks contact: Jörg Holetschek phone: +49 (0)30/838-50150 email: mailto:j.holetschek@bgbm.org

On the data provision side, the cache generation system was modified to allow nightly updates and the slicing rules for the three caches currently maintained at the BGBM were updated. Beta and alpha versions of the new GBIF index were installed and the cache generator system adapted to the new data model, allowing the switching of the SYNTHESYS User interfaces to the production version of new GBIF index. A generic query tool for the protocols used in the GBIF/SYNTHESYS context was developed and put into use. A first prototype is running now, and being used in other projects by BGBM. This prototype is now being actively improved.

Three GBIF mirror sites have been established in Madrid, Paris and Tervuren.

Pool, extend & synchronize semantic data definitions, standard data sources, and data standards

Work concerning the linking of the collection infrastructure to systems with name-based data, harvesting historical names from the collection network, and harmonising conceptual models treating scientific organism names, with the aim to provide the base for concept-based User interfaces has been completed. The SYNTHESYS Transmission Engine calculates implicit concept relationships within a “Berlin model” database. The Transmission Engine can be tuned to give some references greater weight than others and fine-tune break-off points at which a relationship is no longer considered valid. Examples of concept relationships generated by the Engine are: “contradiction”, “congruent”, “included in”, “includes”, and “overlapping”. These relationships are in turn used by the thesaurus for “fuzzy” searching, i.e. presenting not just collection data that matches the search string, but also all data that is conceptually related.

Considerable progress has been made with NCD (Natural Collection Descriptions) during this period: Much of the development work was undertaken by the SYNTHESYS NCD subgroup of Biodiversity Information Standards (TDWG)1. This work takes forward the work already achieved through SYNTHESYS, preparing NCD for use throughout the world.

The NA D developed standard underwent a transformation from an XML Schema to being based on the Resource Description Framework (RDF), culminating in NCD version 0.80. This was done to ensure that NCD becomes a stable part of the technical architecture being promoted globally for biodiversity information. The NCD is of broad intertest to Eurpean institutions outside of SYNTHESYS, for example ETI (Amsterdam) have created a Toolkit based on NCD so that institutions and organizations may manage their collection description records. This, in turn, will be the basis of the Biodiversity Collections Index (BCI) which will use NCD to aggregate data about biodiversity collections around the world and become a central resource for use by anyone wishing to know where to find collections of biodiversity material of interest to their research.

The activities concerning the development of comprehensive information models for neglected collection information domains (anthropology and earth sciences) were completed. For the earth sciences collections, the ABCDEFG is proposed to the organisation of Biodiversity Information Standards (TDWG) as a data standard for geoscientific collections. The purpose of the EFG Task Group is to foster accessibility of existing and emerging geoscientific collection databases at the international level by developing and maintaining a comprehensive and commented schema for geoscientific collection records (extending the ABCD Schema). The ETH Zurich has joined GeoCASE (a mirror version of the SYNTHESYS BioCASE Portal for use by geologists) which is an important step in extending the network. Now, two large geoscientific collection databases can be queried through the prototype of the GeoCASE portal.

User Interface development

NA D launched the SYNTHESYS/BioCASE Portal (Biological Collection Access Service Europe) in 2008: The BioCASE portal uses basic web services constructed by GBIF in addition to some specialized ones. It is complementary to the GBIF portal in that it provides more detailed information about specimen and observation data in the GBIF network, especially when these are based on the rich ABCD standard.

This BioCASE portal demonstrates the modular and distributed nature of the GBIF infrastructure, which allows regional networks such as BioCASE to both contribute to and build upon the global efforts in biodiversity informatics led by GBIF Work on the SYNTHESYS-BioCASE User interface concentrated on bug fixing, optimization of database queries, and enhancement of the suggestion tool. The internationalisation continued and translations of the User interfaces into Nordic and Baltic languages as well as Chinese have been completed. The development will culminate in a launch of the SYNTHESYS/BioCASE portal planned for M50, jointly presented by SYNTHESYS with GBIF and CETAF; the latter has now formally endorsed BioCASE as its contribution to GBIF.

A new access portal to European specimens and observations, based on the existing SYNTHESYS User interface and TOQE (Thesaurus Optimized Query Enhancer), has been developed. This new query system is capable of expanding Latin names into queries to related terms and concepts, using the European checklists Fauna Europea for zoology and Euro+Med for botanical names and taxa. The concept relationships present in the databank are queried through TOQE and then used by the portal for extending the search, making it return collection data matching the search string and all data that are conceptually related. To incorporate the Fauna Europaea checklist data into the SYNTHESYS search tool, downloads of Fauna Europea have been prepared converting the Fauna Europea data into the relevant "Berlin Model" data structure. This includes around 130,000 accepted animal species names, a selection of their associated synonyms, their higher classification, and their European occurrence details at country level.

A prototype of the itinerary tool has been accomplished. This module for 'itinerary retracing' for subsequent data quality assessment is to become part of the EDIT online tool kit for e-Taxonomy. It still requires implementation of the algorithm and testing on the local GBIF cache.

NA D also supports GBIF on optimising the GBIF and SYNTHESYS index database (mySQL) and support for setting up the GeoCASE network particularly the implementation of the BioCASE Simple User Interface for GeoCASE.

Outcomes and deliverables

Extend the amount and increase the technical quality and availability of networked data resources

The SYNTHESYS Technology Helpdesk provides support for collection holders intending to use the BioCASE provider software for connecting their collections to the BioCASE/GBIF network infrastructure.

For both helpdesks contact: Jörg Holetschek phone: +49 (0)30/838-50150 email: mailto:j.holetschek@bgbm.org

Three GBIF mirror sites have been established in Madrid, Paris and Tervuren.

Using the GBIF infastructure to set up special interest networks

Pool, extend, and synchronise semantic definitions, standard data, and data standards

User interface : analysis and modularization study
Develop authentication services for system access
Duplicant detection software
Report on the development of BiogML: an XML Schema for biographical information
Biographical information-XML schemaV1.0.xsd

Outline Proposals for an Anthropology Extension to the ABCD Schema

ABCDEFG standard

User Interface development

SYNTHESYS launched the BioCASE Portal (Biological Collection Access Service Europe) in 2008. The portal allows regional networks such as BioCASE to both contribute to and build upon the global efforts in biodiversity informatics led by GBIF

http://search.biocase.org/Europe

The SYNTHESYS Specimen and Observation Portal (Paper)
Detailed UI architecture study including html mock-up
UI implementation

Development of a Novel Cache Prototype and its Rapid Implementation

Access portal to European specimens and observations, based on the existing SYNTHESYS User interface and TOQE (Thesaurus Optimized Query Enhancer): http://search.biocase.org/toto.

PUBLICATIONS

Cantonati, M., Scalfi, A. & Bertuzzi, E. (ed.): 2nd Central Europan Diatom Meeting, Abstract Book, 12 June 2008 - 15 June 2008 Trento (Italy), Trento. [Abstract and Poster]

Güntsch, A., Berendsohn, W. G., Ciardelli, P., Hahn, A., Kusber, W.-H. & Li, J. 2009: Adding content to content - a generic annotation system for biodiversity data. - Studi Trent. Sci. Nat. 84: 123-128.

Güntsch, A., Berendsohn, W.G., Ciardelli, P., Hahn, A., Kusber, W.-H., Li, J. & Oancea, C. 2008: Adding content to content - a generic annotation system for biodiversity data. - P. 77 in

Holetschek, Jörg; Kelbert, Patricia; Müller, Andreas; Ciardelli, Pepe; Güntsch, Anton; Berendsohn, Walter G.: International Networking of Large Amounts of Primary Biodiversity Data

Kelbert, P. 2008: The new EDIT specimen and observation explorer for taxonomists. EDIT Newsletter 11: 10-12. Available from [http://www.e-taxonomy.eu/files/newsletter11.pdf]

Kusber, W.-H., Zippel, E., Kelbert, P., Holetschek, J., Güntsch, A. & Berendsohn, W. G. 2009: From cleaning the valves to cleaning the data: Case studies using diatom biodiversity data on the Internet (GBIF, BioCASE). - Studi Trent. Sci. Nat. 84: 111-122 .

Kusber, W.-H., Zippel, E., Kelbert, P., Holetscheck, J., Güntsch, A. & Berendsohn, W.G. 2008: Europäische Beobachtungs-, Multimedia- und Belegdaten in Internetportalen (GBIF und SYNTHESYS/BioCASE) für die limnologische Forschung: Stand, Potential, Qualitätssicherung . - P. 97 in N.N. (ed.): Abstractband. Jahrestagung 2008 der Deutschen Gesellschaft für Limnologie. 22.-26. September 2008. Konstanz. [Abstract and Talk]

Kusber, W.-H., Zippel, Kelbert, P., Holetschek, J., Güntsch, A. & Berendsohn, W.G. (subm.): Europäische Beobachtungs-, Multimedia- und Belegdaten in Internetportalen (GBIF und SYNTHESYS/BioCASE/EDIT) für die limnologische Forschung: Stand, Potential und Qualitätssicherung. - Deutsche Gesellschaft für Limnologie (DGL):  Erweiterte Zusammenfassungen der Jahrestagung 2008 (Konstanz), Hardegsen 2009.

Kusber, W.-H., Zippel, E., Güntsch, A. & Berendsohn, W.G. 2008: Data portals for biological systematics. - P. 374 in Gradstein, S.R. et al. (ed.): Systematics 2008. Programme and Abstracts, Göttingen 7-11 April 2008. Universitätsverlag Göttingen, Göttingen. [Abstract and Poster]

Submitted:Zippel, E., Kelbert, P., Kusber, W.-H., Holetschek, J.,  Güntsch, A. & Berendsohn, W.G. 2008:EDIT Specimen and Observation Explorer for Taxonomists" - eine nützliche Komponente der taxonomischen EDIT-Arbeitsplattform im Internet. - GfBS Newsletter 21

Submitted to Workshop “Biodiversitätsinformatik” of INFORMATIK 2009, 39. Jahrestagung der Gesellschaft für Informatik

https://journals.ku.edu/index.php/jbi/article/view/1631/3472 .

Events

There are no further NA D events planned

Where next

The work of NA D will continue in SYNTHESYS FP7 contract as NA 3

The European Distributed Institute of Taxonomy (EDIT) will support further development with the aim to integrate the BioCASE portal into the EDIT Internet Platform for Cybertaxonomy. This is to provide an interface for taxonomic researchers, who need to access more detailed data than do many other users of the GBIF portal.

The XML Schema has been offered to the EDIT and to the Human History Information (HHI) extension to the ABCD (Access to Biological Collections Data) standard, both of which require sections for gathering biographical information about biodiversity researchers.