Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
## Machine-learning annotation of human splicing branchpoints ## Bethany Signal1,2, Brian S Gloss1,2, Marcel E Dinger1,2,\* and Timothy R Mercer1,2,\* 1 Garvan Institute of Medical Research, Sydney, Australia 2 St Vincent’s Clinical School, University of New South Wales, Sydney, Australia \* These authors contributed equally to this work. Correspondence should be addressed to M.E.D. (m.dinger@garvan.org.au) or T.R.M. (t.mercer@garvan.org.au). #### Abstract #### The branchpoint element is required for the first lariat-forming reaction in splicing. We have developed a machine-learning algorithm trained with empirical human branchpoint annotations to identify branchpoint elements from primary genome sequence alone. Using this approach, we can accurately locate branchpoints elements in 85% of introns in current gene annotations. Consistent with branchpoints as basal genetic elements, we find our annotation is unbiased towards gene type and expression levels. A fraction of introns was found to encode multiple branchpoints raising the prospect that mutational redundancy is encoded in key genes. We also identify cases of deleterious branchpoint mutations in clinical variant databases that may explain disease pathogenicity. We propose the broad annotation of branchpoints constitutes a valuable resource for interpreting the impact of common- and disease-causing human genetic variation on gene splicing. #### **Branchpoint annotations** #### Branchpoint annotations are available here as tab delimited files. Predictions for all tested sites are available in <Species>_predictions.txt, and only branchpoint sites (branchpoint_prob >= 0.5) in <Species>_branchpoints.txt. **Format** Annotations were generated using the [branchpointer R package][9], and have the following format: | Column name | Data description | |-----------|----------------| | id | Identifier for the 3' exon that the branchpoint prediction is related to | | branchpoint_prob | branchpoint probability score | |nucleotide | nuceotide at tested site | |distance | distance (in nucleotides) to the closest annotated 3' exon | |allele_status | REF (reference sequence) or ALT (alternative sequence) | |chromosome | chromosome name (i.e. chr1) | |strand | chromosome strand | |end | chromosome location of the site specific query | |exon_3prime | exon_id of the closest annotated 3' exon | |exon_5prime | exon_id of the closest annotated 5' exon | |U2_binding_energy | binding energy of the sequence surrounding the testing site to the U2 snRNA | **Species** Branchpoint annotations were generated for the following species and genome annotations: | Species | Annotation | file name prefix| | ------- | ---------- | --------- | |Human (*Homo sapiens*) | [Gencodev12 / hg19][1] | gencode.v12 | |Human (*Homo sapiens*) | [Gencodev19 / hg19][2] | gencode.v19 | |Human (*Homo sapiens*) | [Gencodev24 / hg38][3] | gencode.v24 | |Mouse (*Mus musculus*) | [GencodevM10 / mm10][4] | gencode.vM10 | |Zebrafish (*Danio rerio*) | [Ensembl release 85 / GRCz10][5] | Danio_rerio | |Fruitfly (*Drosophila melanogaster*) | [Ensembl release 85 / BDGP6][6] | drosophila_melanogaster | |Chicken (*Gallus gallus*) | [Ensembl release 85 / Galgal4][7] | gallus gallus | |Xenopus (*Xenopus tropicalis*) | [Ensembl release 85 / JGI 4.2][8]| xenopus tropicalis | #### **Supplementary Tables** #### **TableS1**: Heptamers and their corresponding scores from HSF. **TableS2**: ClinVar SNPs with a predicted effect on a branchpoint. **TableS3**: GTEx SNPs with a predicted effect on a branchpoint. [1]: http://www.gencodegenes.org/releases/12.html [2]: http://www.gencodegenes.org/releases/19.html [3]: http://www.gencodegenes.org/releases/24.html [4]: http://www.gencodegenes.org/mouse_releases/10.html [5]: http://jul2016.archive.ensembl.org/Danio_rerio/Info/Index [6]: http://jul2016.archive.ensembl.org/Drosophila_melanogaster/Info/Index [7]: http://jul2016.archive.ensembl.org/Gallus_gallus/Info/Index [8]: http://jul2016.archive.ensembl.org/Xenopus_tropicalis/Info/Index [9]: http://github.com/betsig/branchpointer
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.