Dataset Title: Ploidy and clonal membership in Populus tremuloides from RADseq data. Name and contact information of PI: a. Name: James A. Walton b. Institution: Utah State University c. Email: Utah State University Name and contact information of Co-PI: a. Name: Benjamin Blonder b. Institution: Arizona State University c. Email: bblonder@asu.edu Name and contact information of Co-PI: a. Name: Karen E. Mock b. Institution: Utah State University c. Email: karen.mock@usu.edu Funding source: Arizona State University School of Life Sciences Abstract: Ipyrad pipeline parameters files, raw sequence data, barcodes and supporting scripts for clonal and cytotype sample assignment of Populus tremuloides. Methodology: Leaf samples (n=503) were collected from three watersheds in southwestern Colorado. Genomic DNA was subsequently extracted and a ddRAD library prepared following Parchmen et al. 2012. Individually barcoded samples were then pooled and sequenced in three separate libraries using an Illumina HiSeq2500. Various bioinformatic pipelines were used to determine clonal membership as well as ploidy. Additional details can be found in the forthcoming publication. File types: .fastq, .py, .txt, .R, .xlsx, .pdf Instructions to concatenate files for use: To compile the individual sequence files into the three original data files enter the following in the command line. Wait for each command to complete prior to entering next step. Due to file size individual steps may take several minutes. Unzip all split fastq.gz files: gunzip BP01*.fastq.gz gunzip BP02*.fastq.gz gunzip BP03*.fastq.gz Combine the unzipped files into 3 sequence files(BP01, BP02, BP03): cat BP01* > p1815-BP01_S40_L006_R1_001.fastq cat BP02* > p1815-BP02_S41_L007_R1_001.fastq cat BP03* > p1815-BP03_S42_L008_R1_001.fastq Rezip the 3 files for downstream analysis: gzip *BP01*001.fastq gzip *BP02*001.fastq gzip *BP03*001.fastq Remove all files except 3 compiled, zipped sequence files: rm BP*.fastq Viewing Directions: This data set contains the following: 1. Raw compressed amplicon sequence data (fastq.gz) from 3 individual Illumina HiSeq2500 lanes ("BP01", "BP02" and "BP03"). Each lane is presented here in 7 individual sequencing files "aa" to "ag" for ease of data transfer. Once download the command line instructions in the README can be used to compile the required fastq.gz files for downstream analysis (p1815-BP01_S40_L006_R1_001.fastq.gz, p1815-BP02_S41_L007_R1_001.fastq.gz and p1815-BP03_S42_L008_R1_001.fastq.gz). 2. Barcode files corresponding to each raw amplicon sequence data file (barcodes01.txt, barcodes02.txt and barcodes03.txt). 3. Ipyrad (http://ipyrad. readthedocs. io) parameters files used to run steps 2 thru 7 of Ipyrad. Separate sequence files were demultiplexed in step 1 of Ipyrad then merged prior to running remaining steps. One parameters file to be subsequently used for clonal assignment (msl_250_Parameters.txt) and the other for cytotype analysis (msl_10_Parameters.txt). 4. Python script to remove a list of individuals from .vcf files for downstream analysis (removeIndVcf.py). 5. R script to transform jaccard pair-wise similarity indices to clonal group assignments (clone_assignment.R). 6. Ipyrad VCF output files with additional individuals from posy-Ipyrad quality control removed (msl_10_lessDroppedInds.vcf, msl_250_lessDroppedInds.vcf) 7. Results File 8. Raw ploidy estimation results (estploidy_pp98.csv, see Results file for final calls) 9. Pre- and post-size selection distribution of sequence data from tape station (2019-02-13-01 Walton_Pre.pdf, 2019-02-14-01 Walton_post.pdf) Additionally, the following scripts were implemented as part of this pipeline: 1. vcf2Jaccard (https://github.com/carol-rowe666/vcf2Jaccard, Carol Rowe) 2. vcf2hetAlleleDepth (https://github.com/carol-rowe666/vcf2hetAlleleDepth, Carol Rowe) 3. gbs2ploidy (Gompert, Z., Mock, K. Detection of individual ploidy levels with genotyping-by-sequencing (GBS) analysis. Mol. Eco. Res. 17:1156-67.) Special software required to use data: A program capable of utilizing R files as well as the listed file types. Publications that cite or use this data: Parchman, T.L., Gompert, Z., Mudge, J., Schilkey, F.D., Benkman, C.W. & Buerkle, C.A. (2012). Genome‐wide association genetics of an adaptive trait in lodgepole pine. Molecular ecology, 21, 2991-3005.