**The Human Virome Protein Cluster**
--------------------------------
The human virome protein cluster (HVPC) is a protein cluster database created as an effort to improve functional annotation and characterization of human viromes. The database was built out of hundreds of virome datasets from six different body sites.
Please check our [Frontiers in Microbiology article][1] for more details.
## **README** ##
Sequences from virome datasets were assembled and ORfs called from the generated contigs. ORFs longer than 60 aa were clustered and one representative ORF (the longest) from each cluster was gathered into the HVPC.faa fasta file.
Functional annotation data for each ORF were collected from 4 different sources, namely, nr, RAST server with SEED subsystem analysis, Pfam with mappings to Gene Ontology, and pVOG. These annotations are found in the file HVPCs_annotations.txt
So, there are two files in this project:
1. HVPC.faa: fasta file with representative ORFs for all clusters.
2. HVPCs_annotations.txt: tab-separated file with annotations from the sources outlined above. "-" means there is no annotation available from this particular source for this ORF.
Abbreviations: nr, NCBI's non-redundant protein database; pVOG, Prokaryotic Virus Orthologous Groups; GO, Gene Ontology.
[1]: https://doi.org/10.3389/fmicb.2018.01110