# Supplementary Files for Predicting Host-Pathogen Interactions Between *C. difficile* and Mouse
Supplementary data is part of thesis submitted by Sri Harsha Vishwanath. To use this data, please contact [Fiona McCarthy](mailto:fionamcc@arizona.edu).
## Supplementary Table 1: Human:Mouse Orthologs
Description:
Orthologs of mouse proteins are identfied from PSICQUIC human proteins. Using Ensembl Biomart 112, we first selected the Ensembl Genes 108 database. Next we chose the Human genes (GRCh38.p10) dataset. We select only strict 1:1 ortholog types.
| Human UniProt ID | Ensembl Human ID | Mouse Ensembl ID | Ortholog type |
| ----------------- | ----------------- | ----------------- | ------------- |
| A0AV96 | ENSG00000163694 | ENSMUSG00000070780 | ortholog_one2one |
| A1A4Z1 | ENSG00000164675 | ENSMUSG00000046192 | ortholog_one2one |
| A4D0V7 | ENSG00000106034 | ENSMUSG00000062980 | ortholog_one2one |
| A6H8Y1 | ENSG00000145734 | ENSMUSG00000049658 | ortholog_one2one |
__Human Uniprot ID__ - This column identfies the human protein from PSICQUIC with a Uniprot ID.
__Ensembl Human ID__ - This column identfies the human protein from PSICQUIC with an Ensembl ID.
__Mouse Human ID__ - This column identfies the mouse orthologous gene to the human gene with an Ensembl ID.
__Ortholog type__ - This column identifies the ortholog type between mouse and human.
## Supplementary Table 2: Reciprocal Best Matches for *C. difficile* 630 vs Bacillota (Firmicute) Proteins
Description:
To perform Reciprocal Best BLAST, we first installed the NCBI-BLAST tool BLAST 2.12.0+. Pathogens were identified through a preliminary search, and their FASTA sequences were downloaded from the UniProt release 2022-03. Specifically, we obtained the FASTA sequences of *Clostridioides difficile* 630 (taxid: 272563) and labeled them as Cdiff*630 (S1). Using the ID mapping feature on UniProt, we downloaded the sequences of the identified pathogens, creating a pathogen protein database referred to as pathogen*proteins (S2). These sequences were then organized, and BLAST databases were created for both the pathogen proteins and *C. difficile* 630 proteins.
For BLAST processing, the combined FASTA files were split into individual files. Forward BLAST searches involved querying pathogen proteins against the *C. difficile* 630 database, while reverse BLAST searches involved querying *C. difficile* 630 proteins against the pathogen protein database, both using an e-value threshold of 0.001. The results were saved in a tabular format. To identify the best BLAST hits, a Python script from the Simple Reciprocal Best Blast Hit Pairs repository was used. This script processed the forward and reverse BLAST output files to produce an output file that contained matching FASTA headers from *C. difficile* 630 and the pathogen proteins, indicating orthologous relationships.
| PSICQUIC_Firmicute Protein | Bacillota (Firmicute) Species | C. difficile Reciprocal Best Match |
| ------------------------------- | ------------------------ | ---------------------|
| Q81VX3 | Bacillus anthracis | Q17ZW5 |
| Q81UH1 | Bacillus anthracis | Q17ZY4 |
| Q81K75 | Bacillus anthracis | Q180C9 |
| A0A0F7R7N1 | Bacillus anthracis | Q180E4 |
__PSICQUIC_Firmicute Protein__ - This column identfies the Firmicute proteins on PSICQUIC that are orthologous with *C. difficile* 630 proteins with their Uniprot ID.
__Bacillota Species__ - This column identfies the species. As the BLAST databases were created from a protein set belonging to a mixture of bacterial species, this column is to identify the interacting protein species.
__C. difficile Reciprocal Best Match__ - This column identifies the orthologous *C . difficile* 630 protein.
## Supplementary Table 3: Mouse - *C. difficile* 630 Interologs
Description:
To identify interologs, we matched mouse orthologs to human proteins from the PSICQUIC database and *C. difficile* 630 orthologs to pathogen proteins from the PSICQUIC database using Excel's XLOOKUP function. These matches reveal conserved protein-protein interactions, generating interologs.
| PSICQUIC Host protein | Mouse ortholog Ensembl ID | Mouse ortholog Uniprot ID | PSICQUIC Pathogen protein | C. difficile homolog |
| ------ | ------------------ | ------ | ------ | ------ |
| Q9Y6X8 | ENSMUSG00000071757 | Q8C0C0 | Q81ZD6 | Q180P1 |
| Q9Y6X8 | ENSMUSG00000071757 | Q8C0C0 | Q81KK6 | Q182K8 |
| Q9Y6X8 | ENSMUSG00000071757 | Q8C0C0 | Q81W25 | Q18C97 |
__PSICQUIC Host protein__ - Protein from the host organism involved in PSICQUIC (Proteomics Standard Initiative Common QUery InterfaCe) interactions.
__Mouse ortholog Ensembl ID__ - Ensembl database identifier for the mouse gene ortholog.
__Mouse ortholog Uniprot ID__ - Uniprot database identifier for the mouse protein ortholog.
__PSICQUIC Pathogen protein__ - Protein from the pathogen involved in PSICQUIC interactions.
__C. difficile homolog__ - Homologous protein found in the bacterium *C. difficile* strain 630.
## Supplementary Table 4: Host-pathogen interaction network'
Description: The host-pathogen interaction network between *C. difficile* 630 and mouse is generated by merging mouse protein-protein interactions from STRING (Version 12.0) and *C. difficile* 630 protein-protein interactions from STRING (Version 12.0) with the predicted host-pathogen interactions. The network is visualized in Cytoscape Version 3.10.2. The interacting partners in the network is listed in this table.
| Name of Interacting Partner 1 | Accession of Interacting Partner 1 | Name of Interacting Partner 2 | Accession of Interacting Partner 2 |
| ---- | ------ | ------ | ------ |
| Cd46 | O88174 | Slamf1 | Q9QUM4 |
| Cd46 | O88174 | Gopc | Q8BH60 |
| Cd46 | O88174 | Serping1 | P97290 |
| Cd46 | O88174 | Cfi | Q61129 |
__Name of Interacting partner 1__ - The name of the first protein involved in the interaction.
__Accession of Interacting Partner 1__ - The unique identifier for the first protein in a protein database.
__Name of Interacting partner 2__ - The name of the second protein involved in the interaction.
__Accession of Interacting Partner 2__ - The unique identifier for the second protein in a protein database.