HumanBase Data Sources


Term definitions
Ontology definitions

NameVersion

Gene annotations
Gene-term associations used in enrichment analyses, term predictions, etc.

NameVersion

GIANT data compendia
Genomics datasets used in the network integrative analyses

NameDataset CountDownload
Human Compendium (GIANT v1, 2015)990human_compendium.tsv

Variant effect prediction models
Training data sources for deep learning frameworks

HumanBase integrates multiple variant effect prediction frameworks that our group previously developed, each trained on distinct but complementary data modalities.

ModelTraining Data Sources
DeepSEA (Beluga)Trained on 2,002 chromatin profile features, including DNase hypersensitivity and ChIP-seq profiles. The complete list of features used to train Beluga can be found at this CSV file.
SeiTrained on 21,907 chromatin profiles, including DNase hypersensitivity, ChIP-seq, and ATAC-seq profiles. The list of profiles incorporated in the Sei model can be found in Supplementary Table 1 of Chen et al 2022.
ExPectoIncorporated the Beluga chromatin profile model as well as 218 tissue expression profiles from GTEx, Roadmap Epigenomics and ENCODE (Zhou et al. 2018).
ExPectoSCIncorporated the Beluga chromatin profile model as well as single-cell gene expression profiles for over 100 cell types from seven organ systems (listed in Supplementary Table 1 of Sokolova et al. 2023).
SeqweaverTrained using data from 232 cross-linking immunoprecipitation (CLIP) based RNA binding protein datasets (Supplementary Table 1 of Park et al. 2021).