A decoupled, modular and scriptable architecture for tools to curate data platforms

Published: Sept. 29, 2020, 6:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.28.282699v1?rss=1 Authors: Langenstein, M., Hermjakob, H., Bernal Llinares, M. Abstract: Motivation: Curation is essential for any data platform to maintain the quality of the data it provides. Existing databases, which require maintenance, and the amount of newly published information that needs to be surveyed, are growing rapidly. More efficient curation is often vital to keep up with this growth, requiring modern curation tools. However, curation interfaces are often complex and difficult to further develop. Furthermore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources, or a reluctance to change sensitive production systems. Results: We propose a decoupled, modular and scriptable architecture to build curation tools on top of existing platforms. Instead of modifying the existing infrastructure, our architecture treats the existing platform as a black box and relies only on its public APIs and web application. As a decoupled program, the tool's architecture gives more freedom to developers and curators. This added flexibility allows for quickly prototyping new curation workflows as well as adding all kinds of analysis around the data platform. The tool can also streamline and enhance the curator's interaction with the web interface of the platform. We have implemented this design in cmd-iaso, a command-line curation tool for the identifiers.org registry. Copy rights belong to original authors. Visit the link for more info