# Description of the dataset ## Structure | Folder | Content | |---|---| | alignments | Tir receptor family MSA | | anchor | Anchor motif predictions | | disopred | DISOPRED disorder predictions | | disopred_agg-clas | DISOPRED aggregation and classes | | fasta | Sequence collections from UniProt | | figures | Plots derived from analysis | | iupred | IUPred 1.0 disorder predictions | | iupred_agg-clas | IUPred 1.0 aggregation and classes | | maps | Species and taxa in FASTA collection | | motif_vs_disorder | Merged data from anchor and aggregated DISOPRED | ## Logic **The code for data processing can be found in [this repository](https://osf.io/cxkjf/)** Sequences were fetched from UniProt and sorted in collections under `fasta`. Three effectors collections were assembles, *E. coli* EHEC, *E. coli* EPEC, and *C. rodentium*. For each one of them, the corrresponding taxon an specie name was extracted. The resulting dictionaries were saved under `maps`. The taxon lists were used to fetch available UniProt reference proteomes for each collection. As a reference, the human proteome was also collected. All those sequence collections are also found under `fasta`. Then, each collection was processed using IUPred 1.0 *short* and *long* modes and DISOPRED 3.1.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.