Supporting Data

Mash: fast genome and metagenome distance estimation using MinHash

RefSeqSketches.msh.gz: Mash sketch database (k=16, s=400) for RefSeq release 70 (48MB)

RefSeqSketchesDefaults.msh.gz: Mash sketch database (k=21, s=1000) for RefSeq release 70 (255MB)

Escherichia.tar.gz: Names and accessions for 500 selected Escherichia genomes, pairwise ANI, and pairwise Jaccard indexes for various k-mer and sketch sizes (24MB)

mash-1.0.tar.gz: Mash version 1.0 codebase (93KB)

SRR2671867.BaAmes.poretools.fastq.gz: Nanopore 1D + 2D sequences generated by poretools (157MB)

SRR2671868.Bc10987.poretools.fastq.gz: Nanopore 1D + 2D sequences generated by poretools (250MB)

Note: For sketches of more recent RefSeq releases, please see third party repositories, such as:

Public data sources

The BLAST nr database was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*.

HMP data were downloaded from ftp://public-ftp.ihmpdcc.org/, reads from the Ilumina/ directory and coding sequences from the HMGI/ directory. Within these folders, sample SRS015937 resides in tongue_dorsum/ and SRS020263 in right_retroauricular_crease/.

SRA runs downloaded with the SRA Toolkit.

RefSeq genomes downloaded from the genomes/refseq/ directory of ftp.ncbi.nlm.nih.gov.

Public data products

Quebec Polyomavirus is submitted to GenBank as BK010702.