Show simple item record

FieldValueLanguage
dc.contributor.authorMusumeci, Lucia
dc.contributor.authorArthur, Jonathan W
dc.contributor.authorCheung, Florence SG
dc.contributor.authorHoque, Ashraful
dc.contributor.authorLippman, Scott
dc.contributor.authorReichardt, Juergen KV
dc.date.accessioned2009-09-23
dc.date.available2009-09-23
dc.date.issued2009-01-01
dc.identifier.issn1059-7794
dc.identifier.urihttp://hdl.handle.net/2123/5419
dc.descriptionThis file contains a complete list of single nucleotide differences (SNDs) identified in the research described in the paper: "Single Nucleotide Differences (SNDs) in the dbSNP Database May Lead to Errors in Genotyping and Haplotyping Studies" Lucia Musumeci[1,6,*], Jonathan W. Arthur[2,3,*], Florence SG Cheung[1], Ashraful Hoque[4], Scott Lippman[4], and Juergen KV Reichardt[1,5] [1] Plunkett Chair of Molecular Biology (Medicine) Bosch Institute The University of Sydney Medical Foundation Building (K25) 92–94 Parramatta Road Camperdown, NSW 2006 Australia [2] Discipline of Medicine, Sydney Medical School The University of Sydney Camperdown, NSW 2006 Australia [3] Sydney Bioinformatics The University of Sydney Camperdown, NSW 2006 Australia [4] The University of Texas M. D. Anderson Cancer Center Houston, TX 77030 USA [5] Corresponding Author: [email protected] [*] LM and JWA contributed equally to this work [6] Current Affiliation: Immunology and Infectious Diseases Unit, GIGA-R, Liège University, Liège, Belgium This paper has been accepted for publication in the journal Human Mutation, September 2009. Each row of the data file corresponds to a reported SNP in the dbSNP database subsequently identified in the paper as a SND. The columns contain the following information (in column order): 1) The RefSNP id (rs#) for the SND. 2) The base pair position of the reported polymorphic residue within the full sequence up- and down-stream of the SNP contained in the dbSNP database 3) The reported polymorphic alleles 4) The number of times the SNP and its surrounding sequence aligned to the genome within the criteria of sequence identity and sequence coverage defined in the Materials and Methods section of the paper. 5) Whether this SNP has been identified as a SND or not. (Note: this data file contains only SNDs, so all entries in this column are listed as "SND") 6) The heterozygosity of this SND as reported in the dbSNP chromosome report. The quantity is used to determine "very strong" or "strong" subgroups of SNDs according to the procedure outlined in the Materials and Methods section. 7) The standard error of the heterozygosity of this SND as reported in the dbSNP chromosome report. This quantity is not used in this study. 8) The maximum reported probability of this SND being real as reported in the dbSNP chromosome report. This quantity is not used in this study. 9) The validation codes of this SND as reported in the dbSNP chromosome report. The quantity is used to determine "very strong" or "strong" subgroups of SNDs according to the procedure outlined in the Materials and Methods section. 10) The positions in the genome sequence where the SND aligns. This is a string containing multiple entries separated by a pipe (|). Each entry (position) includes four pieces of information, separated by colons. This information is: * The chromosome number to which the SND aligns * The strand the SND aligns to, where 1 = sense and -1 = antisense * The base pair position to which the SND aligns in the chromosome * The allele found at that positionen
dc.description.abstractThe creation of single-nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. We describe the identification of SNDs (Single Nucleotide Differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs. Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes.en
dc.publisherThe University of Sydney
dc.subjectsingle nucleotide polymorphismen_AU
dc.subjectSNPen_AU
dc.subjectparalogueen_AU
dc.subjectsingle nucleotide differenceen_AU
dc.subjectSNDen_AU
dc.subjectalignmenten_AU
dc.titleSingle Nucleotide Differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studiesen
dc.typeDataseten_AU
usyd.facultyFaculty of Scienceen_AU
usyd.departmentSydney Bioinformaticsen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.