|
|
The Sydney eScholarship Repository >
Medicine >
Central Clinical School >
Medicine >
Research and Papers -- CCS Medicine >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2123/5419
|
| Title: | Single Nucleotide Differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies |
| Authors: | Musumeci, Lucia Arthur, Jonathan W Cheung, Florence SG Hoque, Ashraful Lippman, Scott Reichardt, Juergen KV |
| Keywords: | single nucleotide polymorphism SNP paralogue single nucleotide difference SND alignment |
| Issue Date: | 2009 |
| Publisher: | John Wiley & Sons |
| Citation: | in Press. Human Mutation |
| Abstract: | The creation of single-nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population.
We describe the identification of SNDs (Single Nucleotide Differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs.
Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes. |
| Description: | This file contains a complete list of single nucleotide differences (SNDs) identified
in the research described in the paper:
"Single Nucleotide Differences (SNDs) in the dbSNP Database May Lead to Errors in Genotyping and Haplotyping Studies"
Lucia Musumeci[1,6,*], Jonathan W. Arthur[2,3,*], Florence SG Cheung[1], Ashraful Hoque[4], Scott Lippman[4], and Juergen KV Reichardt[1,5]
[1] Plunkett Chair of Molecular Biology (Medicine)
Bosch Institute
The University of Sydney
Medical Foundation Building (K25)
92–94 Parramatta Road
Camperdown, NSW 2006
Australia
[2] Discipline of Medicine, Sydney Medical School
The University of Sydney
Camperdown, NSW 2006
Australia
[3] Sydney Bioinformatics
The University of Sydney
Camperdown, NSW 2006
Australia
[4] The University of Texas M. D. Anderson Cancer Center
Houston, TX 77030
USA
[5] Corresponding Author: jreichardt@med.usyd.edu.au
[*] LM and JWA contributed equally to this work
[6] Current Affiliation: Immunology and Infectious Diseases Unit, GIGA-R, Liège University, Liège, Belgium
This paper has been accepted for publication in the journal Human Mutation, September 2009.
Each row of the data file corresponds to a reported SNP in the dbSNP database
subsequently identified in the paper as a SND. The columns contain the following
information (in column order):
1) The RefSNP id (rs#) for the SND.
2) The base pair position of the reported polymorphic residue within the full
sequence up- and down-stream of the SNP contained in the dbSNP database
3) The reported polymorphic alleles
4) The number of times the SNP and its surrounding sequence aligned to the genome
within the criteria of sequence identity and sequence coverage defined in the
Materials and Methods section of the paper.
5) Whether this SNP has been identified as a SND or not. (Note: this data file
contains only SNDs, so all entries in this column are listed as "SND")
6) The heterozygosity of this SND as reported in the dbSNP chromosome report. The
quantity is used to determine "very strong" or "strong" subgroups of SNDs
according to the procedure outlined in the Materials and Methods section.
7) The standard error of the heterozygosity of this SND as reported in the dbSNP
chromosome report. This quantity is not used in this study.
8) The maximum reported probability of this SND being real as reported in the
dbSNP chromosome report. This quantity is not used in this study.
9) The validation codes of this SND as reported in the dbSNP chromosome report. The
quantity is used to determine "very strong" or "strong" subgroups of SNDs
according to the procedure outlined in the Materials and Methods section.
10) The positions in the genome sequence where the SND aligns. This is a string
containing multiple entries separated by a pipe (|). Each entry (position) includes
four pieces of information, separated by colons. This information is:
* The chromosome number to which the SND aligns
* The strand the SND aligns to, where 1 = sense and -1 = antisense
* The base pair position to which the SND aligns in the chromosome
* The allele found at that position |
| URI: | http://hdl.handle.net/2123/5419 |
| ISSN: | 1059-7794 |
| Appears in Collections: | Research and Papers -- CCS Medicine
|
Items in Sydney eScholarship Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
|