GraphDTI dataset
-------
**`training_data.tar.gz`**: pickle files with feature vectors for positive and negative instances in the GraphDTI dataset. Names of the pickle files are formatted as `UniProtID_DrugID_ENSPID_SignatureID.pkl`.
The sturcture of feature vectors:
* 300-dimensional vector for ProtVec (index 0-299)
* 512-dimensional vector for Bionoi-AE (index 300-811)
* 300-dimensional vector for Mol2Vec (index 812-1111)
* 300-dimensional vector for Graph2vec (index 1112-1411)
PubChem BioAssay dataset
-------
**`test_data.tar.gz`**: pickle files with feature vectors for positive and negative instances in the PubChem BioAssay dataset. The format and structure of pickle files are the same as in the GraphDTI dataset.
Feature optimization and integration
-------
### Graph2vec generation
**`graph2vec_generation.tar.gz`**: Graph2vec features for training and testing:
- `gene_expression`: numerical values for the differential gene expression
- `target_node_shortest_path`: the shortest distances to the target node
- `input_file`: input files in JSON format to generate Graph2vec features
### Graph2vec optimization
**`graph2vec_optimization.tar.gz`**: Graph2vec features with a different number of connected nodes ranging from 10 to 70.
### Feature integration
**`feature_integration_training.tar.gz`**: the original features, including Mol2Vec, ProtVec, Bionoi-AE, and Graph2vec, for training instances.
**`feature_integration_test.tar.gz`**: the original features, including Mol2Vec, ProtVec, Bionoi-AE, and Graph2vec, for testing instances.