Benchmarking dataset
-------
**`clusters.yaml`**: binding site clusters
**`protein-data`**: protein features:
* `*.fasta` protein sequence in the [FASTA][1] format
* `*.pdb` protein structure in the [PDB][2] format
* `*.pops` protein surface calculated by the [POPS][3] program
* `*.profile` protein sequence profile calculated by the [PROFILpro][4] program
**`pocket-data`**: pocket features:
* `*.lpc` ligand-protein contacts calculated by the [LPC][5] program
* `*.mol2` pocket structure in the [mol2][6] format
* `*.pdb` pocket structure in the [PDB][2] format
* `*.sdf` binding ligand in the [SDF][7] format
Unseen dataset
-------
**`unseen_data_results.txt`**: prediction results for unseen data
**`unseen-data`**:
* `*.fasta` protein sequence in the [FASTA][1] format
* `*.pdb` protein structure in the [PDB][2] format
* `*.pops` protein surface calculated by the [POPS][3] program
* `*.profile` protein sequence profile calculated by the [PROFILpro][4] program
* `*.mol2` pocket structure in the [mol2][6] format
Negative dataset
-------
**`negative_pocket_list.txt`**: list of non-binding pockets
**`negative-data`**:
* `*.pops` protein surface calculated by the [POPS][3] program
* `*.profile` protein sequence profile calculated by the [PROFILpro][4] program
* `*.mol2` pocket structure in the [mol2][6] format
**`negative_data_output_probs.yaml`**: output class probabilities for the negative dataset
[1]: https://en.wikipedia.org/wiki/FASTA_format
[2]: https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)
[3]: https://pubmed.ncbi.nlm.nih.gov/12824328/
[4]: http://download.igb.uci.edu/
[5]: https://pubmed.ncbi.nlm.nih.gov/10320401/
[6]: http://chemyang.ccnu.edu.cn/ccb/server/AIMMS/mol2.pdf
[7]: https://en.wikipedia.org/wiki/Chemical_table_file