Show simple item record

FieldValueLanguage
dc.contributor.authorRossetto Marcelino, Vanessa
dc.contributor.authorBuchmann, Jan
dc.contributor.authorClausen, Philip
dc.date.accessioned2019-04-30
dc.date.available2019-04-30
dc.date.issued2019-04-30
dc.identifier.urihttp://hdl.handle.net/2123/20336
dc.descriptionFull dataset accessible via: http://dx.doi.org/10.25910/5cc7cd40fca8e
dc.description.abstractThis database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen. A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial The database was built as follows: The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers. Indexing was then performed with KMA using the commands: kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG Three indexed databases are provided: 1 - NCBI nucleotide collection 2 - RefSeq database of bacterial and fungal genomes. V 1.0 - Initial Upload. V 2.0 - Addition of NCBI database: The NCBI nucleotide collection contains many environmental and artificial sequence entries without taxonomic information (e.g. uncultured marine bacteria). We therefore compiled a database without those. The file ncbi_nt_no_env_11jun2019.zip contains therefore all ncbi nt entries excluding the descendants of environmental eukaryotes (taxid 61964), environmental prokaryotes (48479), unclassified sequences (12908) and artificial sequences (28384).en
dc.language.isoenen
dc.publisherThe University of Sydney
dc.rightsCreative Commons Attribution-NonCommercial-ShareAlike 4.0en
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subjectmetagenomicsen
dc.subjectmetatranscriptomicsen
dc.titleIndexed reference databases for KMA and CCMetagenen
dc.typeDataseten
dc.subject.asrcFoR::060501 - Bacteriologyen
dc.identifier.doi10.25910/5cc7cd40fca8e
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/ncbi_nt_kma.zip
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/ncbi_nt_no_env_11jun2019.zip
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/RefSeq_bf.zip
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/Simulated_datasets.zip
usyd.facultySeS faculties schools::Faculty of Medicine and Health, Sydney Medical Schoolen
usyd.departmentMarie Bashir Institute for Infectious Diseases and Biosecurityen


Show simple item record

Associated file/s

There are no files associated with this item.

Associated collections

Show simple item record

Licence

Creative Commons Attribution-NonCommercial-ShareAlike 4.0
Except where otherwise noted, this item's licence is described as Creative Commons Attribution-NonCommercial-ShareAlike 4.0

There are no previous versions of the item available.