**Benchmarking and testing datasets**:
- `fda_approved_nr.smi` - 1515 FDA-approved drugs obtained from [DrugBank][1].
- `toxnet_nr.smi` - 3035 toxic molecules obtained from [HSDB][2].
- `kegg_nr.smi` - 3682 drugs obtained from [KEGG Drug][3].
- `t3db_nr.smi` - 1283 toxic molecules obtained from [T3DB][4].
- `dude_actives_nr.smi` - 17499 bioactive molecules obtained from [DUD-E][5].
- `nat_nubbe_nr.smi` - 1008 natural products obtained from [NuBBE][6].
- `nat_unpd_nr.smi` - 81372 natural products obtained from [UNPD][7].
- `tcm600_nr.smi` - 5883 traditional Chinese medicines obtained from [Database@Taiwan][8].
**Training datasets**:
- `train_SA_data.smi` - a dataset to train the SAscore predictor.
- `train_Tox_data.smi` - a dataset to train the toxicity classifier.
[1]: https://www.drugbank.ca/
[2]: https://toxnet.nlm.nih.gov/newtoxnet/hsdb.htm
[3]: http://www.genome.jp/kegg/drug/
[4]: http://www.t3db.ca/
[5]: http://dude.docking.org/
[6]: http://nubbe.iq.unesp.br/portal/nubbedb.html
[7]: http://pkuxxj.pku.edu.cn/UNPD
[8]: http://tcm.cmu.edu.tw/