|Title:||Indexed reference databases for KMA and CCMetagen|
|Authors:||Rossetto Marcelino, Vanessa|
|Abstract:||This database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen.
A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen
Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial
The database was built as follows:
The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers.
Indexing was then performed with KMA using the commands:
kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG
Three indexed databases are provided:
1 - NCBI nucleotide collection
2 - RefSeq database of bacterial and fungal genomes|
V 1.0 - Initial Upload
V 2.0 - Addition of NCBI database: The NCBI nucleotide collection contains many environmental and artificial sequence entries without taxonomic information (e.g. uncultured marine bacteria). We therefore compiled a database without those. The file ncbi_nt_no_env_11jun2019.zip contains therefore all ncbi nt entries excluding the descendants of environmental eukaryotes (taxid 61964), environmental prokaryotes (48479), unclassified sequences (12908) and artificial sequences (28384).
|Description:||Full dataset accessible via: http://dx.doi.org/10.25910/5cc7cd40fca8e|
|Rights and Permissions:||CC BY-NC-SA: Attribution-Noncommercial-Share Alike 4.0|
|Type of Work:||Dataset|
|Type of Publication:||Publisher version|
|Appears in Collections:||Research Papers and Publications. Sydney Medical School|
Items in Sydney eScholarship Repository are protected by copyright, with all rights reserved, unless otherwise indicated.