Since then, a lot of effort has been undertaken to make workflows more reusable and thereby making results reproducible. With our TOAR-II (Tropospheric Ozone Assessment Report phase II) database infrastructure we are supporting this approach. It has been lifted to a new level of FAIRness (https://www.go-fair.org/fair-principles/) by integrating more of the FAIR principles through redesigning the database and related services. In addition, new concepts were developed to achieve reproducibility and reusability via standardized workflows and objects.
Canonical workflows consist of automated workflows or workflow fragments which allow for reusability of these snippets in different contexts. The development of reusable workflows and software for scientific data analysis depends on reusable data, which must be described appropriately and standardized to ensure reliable and meaningful analysis results.
We, therefore, developed a concept where we focus on two important, indispensable, and inseparable prerequisites for workflow sharing: data harmonization and documentation.
In our concept paper, we show that the necessary data harmonization for establishing online data analysis services goes much deeper than the obvious issues of common data formats, variable names, and measurement units, and we explore how the generation of FAIR Digital Objects (FDO) and Research Objects (RO) together with automatically generated documentation may support Canonical Analysis Workflows for air quality and related data. We are convinced that our experiences from the TOAR database will show that data harmonization alongside with documentation constitutes a big step towards realizing the potential of canonical workflows.
Schröder et al., Enabling Canonical Analysis Workflows – Documented data harmonization on global air quality data, Data Intelligence Journal. 2022; in print
The foundation of the IntelliAQ project is the TOAR database, the world’s largest collection of surface observation data of ozone, ozone precursor gases, meteorological variables, selected tracers for pollution source attributions, and selected results from numerical models of the atmospheric dynamic and chemical composition. Most of these data constitute timeseries of measurements at specific point locations, the “stations”. Various researchers directly submit their data to the TOAR data centre, where they are reformatted, quality controlled, and inserted to the TOAR database. If requested by the data submitters, the reformatted and augmented files from these direct submissions will also be published in a FAIR data service, including a doi for reference in journal publications, presentations, and elsewhere. However, the majority of data in the TOAR database is not “primary data”, but a copy of data from other databases and repositories.
With GeoDataServices (Schultz, M.G. et.al., 2018), we enable an automated and flexible characterisation of an arbitrary point location using high resolution data. These services are accessible through a standardised REST API and can therefore easily be used by both human and machine. In the current version, GeoDataServices includes geographical information on topography and dominant land surface covers, anthropogenic data about urbanisation (human settlements, built-up areas, nighttime light brightness, population density and streets) and agriculture yields (rice and wheat), climatological and environmental data as NOx emissions and climatic zones. Combining this data, GeoDataServices can characterise any point location and therefore enables users to compare locations in a personalised -use case driven- way. For this personalisation, each query needs to be specified with a radius around the point location, in which a given statistical aggregation function is applied. Beside this personalised information preparation, the GeoDataServices are also included in the TOAR database metadata creation. GeoDataServices is currently under development and not yet accessible for the public. However, interested people are encouraged to contact us for more insight into GeoDataServices.
Reference: Schultz, Martin G., et al. “A web service architecture for objective station classification purposes.” 2018 IEEE 14th International Conference on e-Science (e-Science). IEEE, 2018.