Supporting Data¶
Mash: fast genome and metagenome distance estimation using MinHash¶
RefSeqSketches.msh.gz: Mash sketch database (k=16, s=400) for RefSeq release 70 (48MB)
RefSeqSketchesDefaults.msh.gz: Mash sketch database (k=21, s=1000) for RefSeq release 70 (255MB)
Escherichia.tar.gz: Names and accessions for 500 selected Escherichia genomes, pairwise ANI, and pairwise Jaccard indexes for various k-mer and sketch sizes (24MB)
mash-1.0.tar.gz: Mash version 1.0 codebase (93KB)
SRR2671867.BaAmes.poretools.fastq.gz: Nanopore 1D + 2D sequences generated by poretools (157MB)
SRR2671868.Bc10987.poretools.fastq.gz: Nanopore 1D + 2D sequences generated by poretools (250MB)
Note: For sketches of more recent RefSeq releases, please see third party repositories, such as:
Public data sources¶
The BLAST nr database was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*.
HMP data were downloaded from ftp://public-ftp.ihmpdcc.org/, reads from the Ilumina/ directory
and coding sequences from the HMGI/ directory. Within these folders, sample SRS015937 resides in
tongue_dorsum/ and SRS020263 in right_retroauricular_crease/.
SRA runs downloaded with the SRA Toolkit.
RefSeq genomes downloaded from the genomes/refseq/ directory of ftp.ncbi.nlm.nih.gov.
Public data products¶
Quebec Polyomavirus is submitted to GenBank as BK010702.