Show simple item record

FieldValueLanguage
dc.contributor.authorRossetto Marcelino, Vanessa
dc.contributor.authorBuchmann, Jan
dc.contributor.authorClausen, Philip
dc.date.accessioned2019-04-30
dc.date.available2019-04-30
dc.date.issued2019-04-30
dc.identifier.urihttp://hdl.handle.net/2123/20336
dc.descriptionFull dataset accessible via: http://dx.doi.org/10.25910/5cc7cd40fca8e
dc.description.abstractThis database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen. A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial The database was built as follows: The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers. Indexing was then performed with KMA using the commands: kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG Three indexed databases are provided: 1 - NCBI nucleotide collection 2 - RefSeq database of bacterial and fungal genomes. V 1.0 - Initial Upload. V 2.0 - Addition of NCBI database: The NCBI nucleotide collection contains many environmental and artificial sequence entries without taxonomic information (e.g. uncultured marine bacteria). We therefore compiled a database without those. The file ncbi_nt_no_env_11jun2019.zip contains therefore all ncbi nt entries excluding the descendants of environmental eukaryotes (taxid 61964), environmental prokaryotes (48479), unclassified sequences (12908) and artificial sequences (28384).en_AU
dc.language.isoenen_AU
dc.publisherThe University of Sydney
dc.rightsCreative Commons Attribution-NonCommercial-ShareAlike 4.0en_AU
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subjectmetagenomicsen_AU
dc.subjectmetatranscriptomicsen_AU
dc.titleIndexed reference databases for KMA and CCMetagenen_AU
dc.typeDataseten_AU
dc.subject.asrcFoR::060501 - Bacteriologyen_AU
dc.identifier.doi10.25910/5cc7cd40fca8e
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/ncbi_nt_kma.zip
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/ncbi_nt_no_env_11jun2019.zip
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/RefSeq_bf.zip
dc.bitstream.urlhttps://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/Simulated_datasets.zip
usyd.facultySeS faculties schools::Faculty of Medicine and Health, Sydney Medical Schoolen_AU
usyd.departmentMarie Bashir Institute for Infectious Diseases and Biosecurityen_AU


Show simple item record

Associated file/s

There are no files associated with this item.

Associated collections

Show simple item record

Licence

Creative Commons Attribution-NonCommercial-ShareAlike 4.0
Except where otherwise noted, this item's licence is described as Creative Commons Attribution-NonCommercial-ShareAlike 4.0

There are no previous versions of the item available.