Please use this identifier to cite or link to this item:
|Title:||Single Nucleotide Differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies|
Arthur, Jonathan W
Cheung, Florence SG
Reichardt, Juergen KV
|Keywords:||single nucleotide polymorphism|
single nucleotide difference
|Publisher:||John Wiley & Sons|
|Citation:||in Press. Human Mutation|
|Abstract:||The creation of single-nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. We describe the identification of SNDs (Single Nucleotide Differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs. Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes.|
|Description:||This file contains a complete list of single nucleotide differences (SNDs) identified in the research described in the paper: "Single Nucleotide Differences (SNDs) in the dbSNP Database May Lead to Errors in Genotyping and Haplotyping Studies" Lucia Musumeci[1,6,*], Jonathan W. Arthur[2,3,*], Florence SG Cheung, Ashraful Hoque, Scott Lippman, and Juergen KV Reichardt[1,5]  Plunkett Chair of Molecular Biology (Medicine) Bosch Institute The University of Sydney Medical Foundation Building (K25) 92–94 Parramatta Road Camperdown, NSW 2006 Australia  Discipline of Medicine, Sydney Medical School The University of Sydney Camperdown, NSW 2006 Australia  Sydney Bioinformatics The University of Sydney Camperdown, NSW 2006 Australia  The University of Texas M. D. Anderson Cancer Center Houston, TX 77030 USA  Corresponding Author: email@example.com [*] LM and JWA contributed equally to this work  Current Affiliation: Immunology and Infectious Diseases Unit, GIGA-R, Liège University, Liège, Belgium This paper has been accepted for publication in the journal Human Mutation, September 2009. Each row of the data file corresponds to a reported SNP in the dbSNP database subsequently identified in the paper as a SND. The columns contain the following information (in column order): 1) The RefSNP id (rs#) for the SND. 2) The base pair position of the reported polymorphic residue within the full sequence up- and down-stream of the SNP contained in the dbSNP database 3) The reported polymorphic alleles 4) The number of times the SNP and its surrounding sequence aligned to the genome within the criteria of sequence identity and sequence coverage defined in the Materials and Methods section of the paper. 5) Whether this SNP has been identified as a SND or not. (Note: this data file contains only SNDs, so all entries in this column are listed as "SND") 6) The heterozygosity of this SND as reported in the dbSNP chromosome report. The quantity is used to determine "very strong" or "strong" subgroups of SNDs according to the procedure outlined in the Materials and Methods section. 7) The standard error of the heterozygosity of this SND as reported in the dbSNP chromosome report. This quantity is not used in this study. 8) The maximum reported probability of this SND being real as reported in the dbSNP chromosome report. This quantity is not used in this study. 9) The validation codes of this SND as reported in the dbSNP chromosome report. The quantity is used to determine "very strong" or "strong" subgroups of SNDs according to the procedure outlined in the Materials and Methods section. 10) The positions in the genome sequence where the SND aligns. This is a string containing multiple entries separated by a pipe (|). Each entry (position) includes four pieces of information, separated by colons. This information is: * The chromosome number to which the SND aligns * The strand the SND aligns to, where 1 = sense and -1 = antisense * The base pair position to which the SND aligns in the chromosome * The allele found at that position|
|Type of Work:||Dataset|
|Appears in Collections:||Research Papers and Publications. CCS Medicine|
This work is protected by Copyright. All rights reserved. Access to this work is provided for the purposes of personal research and study. Except where permitted under the Copyright Act 1968, this work must not be copied or communicated to others without the express permission of the copyright owner. Use the persistent URI in this record to enable others to access this work.
|README.txt||Instructions for using dataset||3.16 kB||Text||View/Open|
Items in Sydney eScholarship Repository are protected by copyright, with all rights reserved, unless otherwise indicated.