CROssBAR Use-case: COVID-19 Knowledge Graphs

As a use case of the CROssBAR system, we present the SARS-CoV-2 infection, a.k.a. COVID-19 CROssBAR knowledge graph. We have constructed 2 different versions of the COVID-19 knowledge graph, (i) a large-scale version including nearly the entirety of the related information on different CROssBAR-integrated data sources, which is ideal for further network or machine learning based analysis, and (ii) a small-scale version distilled to include only the most relevant genes/proteins as provided in UniProt-COVID-19 portal (https://covid-19.uniprot.org), which is ideal for fast interpretation.

The finalized large-scale COVID-19 KG includes 987 nodes (i.e., genes/proteins, drugs/compounds, pathways, diseases/phenotypes) and 3639 edges (i.e., various types of relations). The simplified COVID-19 KG includes a total of 178 nodes and 298 edges. The details of statistics can be found in Table S.3 of the project paper. Since most of the COVID-19 related data has still not been integrated into the regular pipelines of the source biological databases, the entirety of the data could not be pulled to the CROssBAR database automatically, as of July 2020. As a result, we manually obtained the data from these resources. We applied the same knowledge graph methodology incorporated in CROssBAR to construct the networks and saved the pre-constructed graphs, which are accessible through the links below:

For more information about the COVID-19 knowledge graphs, please refer to our project paper or visit the CROssBAR project GitHub repository at: https://github.com/cansyl/CROssBAR