Files and data associated with Khalfan et al., 2021 "Modifying Reference Sequence and Annotation Files Quickly and Reproducibly with *ref*orm"
**The files are as follows:**
Figure 2 files
- **Mus_musculus.GRCm38_mb190.fa.gz:** reformed genome sequence
- **Mus_musculus.GRCm38_mb190.gff3:** reformed genome annotation
- **mb190.bam:** alignment on unreformed genome
- **mb190_reform.bam:** alignment using reformed genome
- **mb190_indels_gatk.vcf:** indels on unreformed genome alignments
- **mb190_indels_reform_gatk.vcf:** indels using reformed genome alignments
- **mb190_snps_gatk.vcf:** snps on unreformed genome alignments
- **mb190_snps_reform_gatk.vcf:** snps using reformed genome alignments
Figure 3 files
- **Saccharomyces_cerevisiae_DGY1910.fa:** reformed genome sequence
- **Saccharomyces_cervisiae_DGY1910.gff:** reformed gff
- **DGY22203_sniffles.vcf:** variants calls on unreformed genome alignments
- **DGY2203_reform_sniffles.vcf:** variant calls using reformed genome
----------
**The data for Figure 2 was generated as follows:**
BAM and VCF files generated using https://github.com/gencorefacility/variant-calling-pipeline.
Number of reads in BAM obtained using samtools:
`samtools view mb190.bam 6:52257694-52260880 | wc -l`
(6:52257694-52260880 are the coordinates of the HOXA13 gene)
Number of variants in VCF obtained using tabix:
`bgzip mb190_snps_gatk.vcf`
`tabix -p vcf mb190_snps_gatk.vcf.gz`
`tabix -R hoxa13-region.txt mb190_snps_gatk.vcf.gz | wc -l`
`