You are here: Home / DIGIT-KEY / Digitisation pipelines / HCMR-µCT

HCMR-µCT

 

 

Pipeline

Existing protocols for standardisation: Yes, HCMR has standardized protocols for metadata archiving. 

Figure 2: Schema of the steps involved in creating the metadata management system


 

  • If no, describe the protocols you are planning to adopt during Synthesys+: n/a

  • Prepare graphical representation (workflow) of standardisation protocols: for HCMR see Fig.2 below.

The metadata that are collected for each micro-CT project are maintained in a relational database, with well-defined semantics for the tables and the columns that are used. The inclusion of this information in the metadata catalogue of the LifeWatchGreece portal, is of paramount importance since (a) the metadata will be integrated with information coming from other sources, enabling therefore the expansion of knowledge about them (e.g. the taxonomic information of species will be “linked” to the particular species that is referred to a particular micro-CT project), (b) the metadata will gain more visibility and become searchable and browsable through the LifeWatch Data Services.

Due to the fact that the metadata catalogue of LifeWatchGreece portal has been implemented using semantic web technologies a set of sub-activities are required. 

  • Data Normalization: during this step the harvested metadata from the microCT relational databases are being normalized as regards their structure. More specifically the harvested data are delivered as CSV resources, which are exported as such from the relational database, and they are structurally transformed to XML. This is required for the subsequent steps (i.e. implementation of schema mappings and data transformation). During this step, more activities can be carried out, which are not triggered though for the case of microCT, such as cleaning of data, normalization of specific types (e.g. dates), etc.

  • Schema Mappings: As already described above, the metadata in the catalogues of LifeWatchGreece project are modeled using semantic web technologies. More specifically, they are modelled using MarineTLO (Tzitzikas et al. 2016), which is an extension of the ISO 21127:2014 CIDOC-CRM, that can be used for modelling marine domain resources. For this reason, in this step we define the schema mappings that are necessary for realizing the transformation of the microCT XML resources (derived from the previous step) as MarineTLO-based descriptions. This is carried out using X3ML mapping definition language (Marketakis et al. 2017) which allows describing in a declarative manner which (and how) parts from the source data (i.e. the XML resources) are mapped to particular classes and instances of the target model (i.e. the MarineTLO). The result of this step is a set of X3ML descriptions that can be used in the next step to carry out the transformation. We should point out that as soon as the structure of the microct relational databases does not change (and as a result the corresponding CSV and XML resources), the X3ML definitions remain the same and no updates are required.

  • Data Transformation: this step takes as input the microCT XML resources (derived from the 1st step), and the X3ML definitions (derived from the 2nd step), and generates the MarineTLO-based descriptions in the form of an RDF dataset. This activity is carried out using X3ML engine[8].

  • Transformed Data Ingest: the last step of this workflow, imports the transformed RDF datasets with microCT metadata to the metadata catalogues of LifeWatchGreece portal. This is carried out using the LifeWatchGreece Data Services API. From this point onwards, the microCT metadata are also searchable and browsable from the Data Services.

The following figures (Fig. 3 and 4) show the indicative modelling with respect to MarineTLO of a microCT Specimen resource, and the microCT scan event.

Figure 3: The indicative modelling with respect to MarineTLO of a microCT Specimen resource

 

Figure 4: The indicative modelling with respect to MarineTLO of a microCT scan event

The HCMR server that hosts and distributes raw data produced by the micro-CT is a virtual machine of the central computer-systems hosting infrastructure of HCMR (proxmox cluster) with 4 CPUs, 4GB RAM and 16TB storage in the central storing infrastructure (96ΤΒ raw in RAID-6).
Contents
Image PNG image Picture1.png