Supporting Data =============== Mash: fast genome and metagenome distance estimation using MinHash ------------------------------------------------------------------ `RefSeqSketches.msh.gz `_: Mash sketch database (k=16, s=400) for RefSeq release 70 (48MB) `RefSeqSketchesDefaults.msh.gz `_: Mash sketch database (k=21, s=1000) for RefSeq release 70 (255MB) `Escherichia.tar.gz `_: Names and accessions for 500 selected Escherichia genomes, pairwise ANI, and pairwise Jaccard indexes for various k-mer and sketch sizes (24MB) `mash-1.0.tar.gz `_: Mash version 1.0 codebase (93KB) `SRR2671867.BaAmes.poretools.fastq.gz `_: Nanopore 1D + 2D sequences generated by poretools (157MB) `SRR2671868.Bc10987.poretools.fastq.gz `_: Nanopore 1D + 2D sequences generated by poretools (250MB) **Note:** For sketches of more recent RefSeq releases, please see third party repositories, such as: - https://github.com/ayixon/Mash-sketched-reference-databases - https://zenodo.org/records/17081323 Public data sources ~~~~~~~~~~~~~~~~~~~ The BLAST ``nr`` database was downloaded from ``ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*``. HMP data were downloaded from ``ftp://public-ftp.ihmpdcc.org/``, reads from the ``Ilumina/`` directory and coding sequences from the ``HMGI/`` directory. Within these folders, sample SRS015937 resides in ``tongue_dorsum/`` and SRS020263 in ``right_retroauricular_crease/``. SRA runs downloaded with the `SRA Toolkit `_. RefSeq genomes downloaded from the ``genomes/refseq/`` directory of ``ftp.ncbi.nlm.nih.gov``. Public data products ~~~~~~~~~~~~~~~~~~~~ Quebec Polyomavirus is submitted to GenBank as BK010702.