pmparser and PMDB: resources for large-scale, open studies of the biomedical literature

Published: Sept. 9, 2020, 11:01 p.m.

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.07.285924v1?rss=1 Authors: Schoenbachler, J. L., Hughey, J. J. Abstract: Summary: PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. Availability and implementation: pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is stored in PostgreSQL and compressed dumps are available on Zenodo (https://doi.org/10.5281/zenodo.4008109). Copy rights belong to original authors. Visit the link for more info