**ddRADseq Data Processing**
We followed the STACKS workflow established by Portik et al. 2017 (Evaluating mechanisms of diversification in a Guineo-Congolian tropical forest frog using demographic model selection. Molecular Ecology 26: 5245–5263) to demultiplex, filter, and process our sequencing data. The scripts used to accomplish these steps, along with details of their usage, are linked to in the files section (github: dportik).
In brief summary, the pipeline was used to run the following STACKS v1.35 commands on our demultiplexed data:
ustacks -t fastq -f SAMPLE.trim.fq -o . -r -m 5 -M 2
The above command generated loci from the short reads of SAMPLE.trim.fq, requiring a minimum coverage depth of 5X, and a maximum of two discrepancies. This was done for all samples.
cstacks -b 1 -s [name of all samples together] -o .
The above command created a catalogue of consensus loci from all the samples.
sstacks -b 1 -c batch_1 -s SAMPLE.trim -o .
The above command assigned loci numbers from the catalogue to the sample. This was done for all samples.
populations -b 1 -P . -M [path to population map] -r [for each value: 50, 60, 70, 80, 90, 100] --min_maf 0.05
The above command generated alleles for loci present in 50-100% of the samples (examined by increments of 10%), and enforced a minor allele frequency threshold of 0.05.
The resulting haplotypes.tsv files for each r-value of the populations module were then custom filtered using the 2_haplo_summary.py and 3_haplotype_converter.py scripts in the workflow. In short, these removed invariant loci, non-biallelic loci, paralogous loci, and selected a single random SNP per locus to create input files for different programs.