Main content



Loading wiki pages...

Wiki Version:
**This repository contains data associated with the paper: Javierre\* / Burren\* / Wilder\* / Kreuzhuber\* / Hill\* et al. Genomic regulatory architecture links enhancers and disease variants to target gene promoters (*Cell* 147, 2016).** The description of individual folders and files is given below. All genomic coordinates are for the GRCh37 assembly. **CHiCAGO objects** The PCHi-C and reciprocal CHi-C data (in the respective subfolders) in the form of objects produced by the Chicago R package, saved as rds archives. *Note*: Summary interaction data are also available as text files in the Detected interactions folder. **Capture design** Capture design files for the Chicago package for PCHi-C and reciprocal CHi-C data, respectively. Each tar.gz archive expands into a folder that can be provided to Chicago as the "design folder". **Chromatin annotations** Contains Ensembl Regulatory Build and BLUEPRINT chromatin state annotations of interacting regions in the nine cell types from the BLUEPRINT project. *ActiveTriosRegBuild.txt.gz*. A table of interactions between active promoters and active enhancers. *RegBuild_BLUEPRINT_annotations.txt.gz*. A full list of the Ensembl Regulatory Build annotations of the interacting promoters and PIRs, with activities for promoters and enhancers defined on the basis of chromHMM segmentations generated on the BLUEPRINT data. **Detected interactions** Contains the data on the CHiCAGO interaction scores for the in PCHi-C and reciprocal CHi-C in each cell type. The data are presented in the form of "peak matrices", in which all CHiCAGO scores are given for interactions exceeding a certain score cutoff in at least one cell type. The cutoff of 5 has been used to define significant interactions in the study. The *PCHi-C* subfolder contains the PCHi-C peak matrices with cutoffs of 0 and 5 applied, respectively. The *Reciprocal capture* subfolder contains the reciprocal capture peak matrix, as well as a matrix listing the scores for all interactions between the promoter and reciprocal capture baits, both with a cutoff of 0 applied. **Gene clustering** *geneSpecScores_interactionsWithActiveEnhancers.csv*. The gene-level specificity scores based on interactions with active enhancers, and the cluster IDs of the respective genes from the analysis presented in Figures 4B-E and S4. *geneSpecScores_expression.csv*. Expression-based gene specificity scores from the same analysis. **Gene expression** A matrix of gene expression data used in the study, containing expression quantifications generated with MMSEQ (Turro et al., Genome Biol 2011). **GWAS gene prioritisation** *COGS_gene_scores.txt*. Genescores generated by the COGS algorithm (note: also listed in Table S2, Tab 2 in the paper). *core_AI_disease_network.cys*. A network file for the core AI disease network presented in Figure 6E, generated with Cytoscape 3.3.0 and the Genemania Cytoscape plugin v3.4.0 (Montojo et al., Bioinformatics 2010 and F1000Res 2014). **TAD definitions** Contains the definitions of topologically-associated domains (TADs) for the nine cell type generated as described in the Quantification and Statistical Analysis section in the paper.