Indexed reference databases for KMA and CCMetagen
Field | Value | Language |
dc.contributor.author | Rossetto Marcelino, Vanessa | |
dc.contributor.author | Buchmann, Jan | |
dc.contributor.author | Clausen, Philip | |
dc.date.accessioned | 2019-04-30 | |
dc.date.available | 2019-04-30 | |
dc.date.issued | 2019-04-30 | |
dc.identifier.uri | http://hdl.handle.net/2123/20336 | |
dc.description | Full dataset accessible via: http://dx.doi.org/10.25910/5cc7cd40fca8e | |
dc.description.abstract | This database was built to identify taxa in metagenome samples using the CCMetagen pipeline. The whole NCBI nt collection allows a complete taxonomic overview, including from microbial eukaryotes that may be present in the dataset. This database is already indexed, ready to use with KMA and CCMetagen. A manual describing how to use this dataset can be found at: https://github.com/vrmarcelino/CCMetagen Additionally, a tutorial on the whole analysis of a set of metatranscriptome samples can be found at: https://github.com/vrmarcelino/CCMetagen/tree/master/tutorial The database was built as follows: The partially non-redundant nucleotide database was downloaded from the NCBI website (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz) in January 2018. This database was formatted to include taxids in sequence headers. Indexing was then performed with KMA using the commands: kma_index -i nt_taxid.fas -o ncbi_nt -NI -Sparse TG Three indexed databases are provided: 1 - NCBI nucleotide collection 2 - RefSeq database of bacterial and fungal genomes. V 1.0 - Initial Upload. V 2.0 - Addition of NCBI database: The NCBI nucleotide collection contains many environmental and artificial sequence entries without taxonomic information (e.g. uncultured marine bacteria). We therefore compiled a database without those. The file ncbi_nt_no_env_11jun2019.zip contains therefore all ncbi nt entries excluding the descendants of environmental eukaryotes (taxid 61964), environmental prokaryotes (48479), unclassified sequences (12908) and artificial sequences (28384). | en_AU |
dc.language.iso | en | en_AU |
dc.publisher | The University of Sydney | |
dc.rights | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 | en_AU |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ | |
dc.subject | metagenomics | en_AU |
dc.subject | metatranscriptomics | en_AU |
dc.title | Indexed reference databases for KMA and CCMetagen | en_AU |
dc.type | Dataset | en_AU |
dc.subject.asrc | FoR::060501 - Bacteriology | en_AU |
dc.identifier.doi | 10.25910/5cc7cd40fca8e | |
dc.bitstream.url | https://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/ncbi_nt_kma.zip | |
dc.bitstream.url | https://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/ncbi_nt_no_env_11jun2019.zip | |
dc.bitstream.url | https://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/RefSeq_bf.zip | |
dc.bitstream.url | https://ses-data.library.sydney.edu.au/public/20336_RossettoMarcelino/Simulated_datasets.zip | |
usyd.faculty | SeS faculties schools::Faculty of Medicine and Health, Sydney Medical School | en_AU |
usyd.department | Marie Bashir Institute for Infectious Diseases and Biosecurity | en_AU |
Associated file/s
There are no files associated with this item.
Associated collections
Licence