Reusability (Project Progress)#
Our efforts in reusability focused in two main topics. First the demonstration, that the developed Assistant System is capable to handle different data than the Meteorological Observational Data, for which it was developed. For this the system was used as the publication environment for CMIP5 model data. The other topic was the definition of Quality Assurance Procedures for general environmental data.Coupled Model Intercomparison Project Phase 5 - CMIP5#
CMIP5 data are relevant for the next IPCC (IPCC - Intergovernmental Panel on Climate Change|http://www.ipcc.ch) assessment report number 5 (AR5) on the earth's climate. These data are replicated among the three core data archives (PCMDI
, BADC
, WDCC
) for quality assurance and for data dissemination. The CMIP5/IPCC-AR5 data acceptance and DOI publication are mainly related to three activities: ingest control of data and metadata (Quality Control Level 1 – QC L1), additional quality checks of data and metadata (QC L2), and the final STD-DOI publication
(QC L3) of CMIP5 core data.
The results of the quality checks of level 2 are directly used for the STD‐DOI data publication review process at WDCC (publication agency).
The most essential part in the quality assurance process leading to a data publication is the communication between agents and data authors and the authors' approval of metadata and data.
For this the Atarrabi system could be adapted to fit this needs. Atarrabi adaptations to the CMIP5 conditions are:
- Approval that the metadata filled in the CMIP5 questionnaire
and sent to CMIP5 is correct and up-to-date
- Preservation of entity description
- Support of quality flags QC L2 and QC L3
- Splitting of entry title and entry_name for preservation of the unique identifier within CMIP5, the DRS (Data Reference Syntax
) id, in the entry_name
- Deletion of instrument and platform fields
The following Atarrabi fields remain:
- General Information of entity
- Authors of the data
- DOI contact
- Contributors to the entity
- Relations to other publications
- Coverage of the entity
- additional data quality assurance
The workflow of administration and publication part need further adaptation. This is in work.
Quality Assurance on Data#
Ensuring quality of data is a very important part in a publication process. To implement this in a publication workflow we started with the following question: Who should perform the quality assurance? Our answer is, that quality assurance should be divided into two parts. The first is the scientific quality assurance (SQA) process, which review the scientific content of the data files. The second is the technical side of the data (TQA). The SQA should be done by the scientists themselves, because they know, what they can expect of the data. The TQA will be performed by the data center to ensure common standards.- Scientific data quality assurance
![]() |
The scientist, who wants to publish data should approve the scientific data quality with associated description of
the quality. Therefor tests should be performed with result documentation.
To support both we developed a proof of concept implementation as an enhancement package for the statistical language R. The main advantages of R are the interoperability on all main operating systems, the possibilities, which free and open source software delivers, and the acceptance in the scientific world.
Our understanding of quality assurance is strictly test based. For every test a documentation is necessary, which contains its name, a description, information on the used algorithm, the used parameters, results like plots or data and most important a comment by the author, what the test shows. By using the R-package this gets a lot easier, because it assists the user to perform and document the quality tests.
After 'Perform Tests' the 'Documentation' of the test should be inserted into the repository of the data centre.
It should be enhanced by information on the approval status of the data (approved by author) and information on the level of the data.
Actually the R-package support various quality checks on one dimensional data like 1D-timeseries or vertical profiles. This is especially useful in atmospheric sciences, but may be of interest also in other environmental sciences.
Additional information on the R-package can be found in its documentation
.
- Technical quality Assurance
TQA check list
- Number of data sets is correct and not equal 0
- Size of every data set is not equal 0
- The data sets and corresponding metadata are all accessable via internet
- The data size is controlled and correct
- The time description (metadata) and existency of data are consistent, complete, start-, stop- date are consistent, continuous time steps are correct
- The format is correct
- Variable description and data are consistent




