<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://hdl.handle.net/2123/5413">
    <title>Sydney eScholarship Collection:</title>
    <link>http://hdl.handle.net/2123/5413</link>
    <description />
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://hdl.handle.net/2123/6443" />
        <rdf:li rdf:resource="http://hdl.handle.net/2123/5419" />
      </rdf:Seq>
    </items>
    <dc:date>2013-05-22T09:10:06Z</dc:date>
  </channel>
  <item rdf:about="http://hdl.handle.net/2123/6443">
    <title>Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration</title>
    <link>http://hdl.handle.net/2123/6443</link>
    <description>Title: Harvest: an open-source tool for the validation and improvement of peptide identification metrics and fragmentation exploration
Authors: McHugh, Leo C; Arthur, Jonathan W
Abstract: Background//&#xD;
Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying “metrics” that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset.&#xD;
&#xD;
Results//&#xD;
We present Harvest: an open source software tool for analysing fragmentation patterns and assessing the power of a new piece of information about the MS/MS fragmentation process to more clearly differentiate between correct and random peptide assignments. We demonstrate this functionality using data metrics derived from the properties of individual datasets in a peptide identification context. Using Harvest, we demonstrate how the development of such metrics may improve correct peptide assignment confidence in the context of a high-throughput proteomics experiment and characterise properties of peptide fragmentation.&#xD;
&#xD;
Conclusions//&#xD;
Harvest provides a simple framework in C++ for analysing and prototyping metrics for peptide matching, the core of the protein identification problem. It is not a protein identification package and answers a different research question to packages such as Sequest, Mascot, X!Tandem, and other protein identification packages. It does not aim to maximise the number of assigned peptides from a set of unknown spectra, but&#xD;
instead provides a method by which researchers can explore fragmentation properties and assess the power of novel metrics for peptide matching in the context of a given experiment. Metrics developed using Harvest may then become candidates for later integration into protein identification packages.
Description: Source code for Harvest 1.0</description>
    <dc:date>2010-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2123/5419">
    <title>Single Nucleotide Differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies</title>
    <link>http://hdl.handle.net/2123/5419</link>
    <description>Title: Single Nucleotide Differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies
Authors: Musumeci, Lucia; Arthur, Jonathan W; Cheung, Florence SG; Hoque, Ashraful; Lippman, Scott; Reichardt, Juergen KV
Abstract: The creation of single-nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. &#xD;
&#xD;
We describe the identification of SNDs (Single Nucleotide Differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs.&#xD;
&#xD;
Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes.
Description: This file contains a complete list of single nucleotide differences (SNDs) identified&#xD;
in the research described in the paper:&#xD;
&#xD;
"Single Nucleotide Differences (SNDs) in the dbSNP Database May Lead to Errors in Genotyping and Haplotyping Studies"&#xD;
&#xD;
Lucia Musumeci[1,6,*], Jonathan W. Arthur[2,3,*], Florence SG Cheung[1], Ashraful Hoque[4], Scott Lippman[4], and Juergen KV Reichardt[1,5]&#xD;
&#xD;
[1] Plunkett Chair of Molecular Biology (Medicine)&#xD;
Bosch Institute&#xD;
The University of Sydney&#xD;
Medical Foundation Building (K25) &#xD;
92–94 Parramatta Road&#xD;
Camperdown, NSW 2006&#xD;
Australia&#xD;
&#xD;
[2] Discipline of Medicine, Sydney Medical School&#xD;
The University of Sydney&#xD;
Camperdown, NSW 2006&#xD;
Australia&#xD;
&#xD;
[3] Sydney Bioinformatics&#xD;
The University of Sydney&#xD;
Camperdown, NSW 2006&#xD;
Australia&#xD;
&#xD;
[4]  The University of Texas M. D. Anderson Cancer Center &#xD;
Houston, TX 77030&#xD;
USA&#xD;
&#xD;
[5] Corresponding Author: jreichardt@med.usyd.edu.au&#xD;
&#xD;
[*] LM and JWA contributed equally to this work&#xD;
&#xD;
[6] Current Affiliation: Immunology and Infectious Diseases Unit, GIGA-R, Liège University, Liège, Belgium&#xD;
&#xD;
This paper has been accepted for publication in the journal Human Mutation, September 2009.&#xD;
&#xD;
Each row of the data file corresponds to a reported SNP in the dbSNP database &#xD;
subsequently identified in the paper as a SND. The columns contain the following&#xD;
information (in column order):&#xD;
&#xD;
1)	The RefSNP id (rs#) for the SND.&#xD;
&#xD;
2)	The base pair position of the reported polymorphic residue within the full&#xD;
	sequence up- and down-stream of the SNP contained in the dbSNP database&#xD;
	&#xD;
3)  The reported polymorphic alleles&#xD;
&#xD;
4)  The number of times the SNP and its surrounding sequence aligned to the genome&#xD;
	within the criteria of sequence identity and sequence coverage defined in the&#xD;
	Materials and Methods section of the paper.&#xD;
	&#xD;
5)  Whether this SNP has been identified as a SND or not. (Note: this data file&#xD;
    contains only SNDs, so all entries in this column are listed as "SND")&#xD;
	&#xD;
6)	The heterozygosity of this SND as reported in the dbSNP chromosome report. The&#xD;
    quantity is used to determine "very strong" or "strong" subgroups of SNDs &#xD;
	according to the procedure outlined in the Materials and Methods section.&#xD;
	&#xD;
7)	The standard error of the heterozygosity of this SND as reported in the dbSNP&#xD;
	chromosome report. This quantity is not used in this study.&#xD;
	&#xD;
8)	The maximum reported probability of this SND being real as reported in the&#xD;
	dbSNP chromosome report. This quantity is not used in this study.&#xD;
	&#xD;
9)	The validation codes of this SND as reported in the dbSNP chromosome report. The&#xD;
    quantity is used to determine "very strong" or "strong" subgroups of SNDs &#xD;
	according to the procedure outlined in the Materials and Methods section.&#xD;
&#xD;
10) The positions in the genome sequence where the SND aligns. This is a string&#xD;
    containing multiple entries separated by a pipe (|). Each entry (position) includes&#xD;
	four pieces of information, separated by colons. This information is:&#xD;
	&#xD;
	* The chromosome number to which the SND aligns&#xD;
	* The strand the SND aligns to, where 1 = sense and -1 = antisense&#xD;
	* The base pair position to which the SND aligns in the chromosome&#xD;
	* The allele found at that position</description>
    <dc:date>2009-01-01T00:00:00Z</dc:date>
  </item>
</rdf:RDF>

