Novel Description Logic Formalisms and their Application to Lipidomics
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Campbell, Alexander AAbstract
Lipids are a major class of biomolecule and potentially the most varied by structure, function, and origin. As the development of highthroughput techniques for analysis of genes and proteins has given rise to the new fields of genomics and proteomics, the development of highthroughput ...
See moreLipids are a major class of biomolecule and potentially the most varied by structure, function, and origin. As the development of highthroughput techniques for analysis of genes and proteins has given rise to the new fields of genomics and proteomics, the development of highthroughput techniques for lipids has given rise to lipidomics. The great increases in volumes of data being produced require new methods of processing and automated analysis, but lipidomics face a great problem in characterisation due to their highly varied structures. The task of assigning identified lipids within classification schemes has traditionally fallen to human experts, but this approach will not scale with the volumes of data being produced and leads to increased likelihood of error. Recently, automated software has been produced to classify all molecules, including lipids, but it remains to be seen whether it will be taken up by the community. The increase in volume of data also necessitates the automation of more tasks usually performed by humans beyond classification. Ontologies have been used for modelling in the life sciences, most notably with the Gene Ontology, and are noted for their ability to formally capture the meaning of the domain of interest in a machine readable way, unlike with human prose, due to formal syntax and semantics. This further permits machine reasoning, where a program can interpret the ontology and draw additional inferences - a form of interpretation usually reserved for humans. In the present work a study of the Protein Data Bank’s (PDB’s) small molecule dictionary is performed, identifying resolved lipids by means of cross-reference with external sources of lipidomic data and classification. Furthermore, the consistency of classification of these external data sources is analysed. In order to permit a reasoner to make inferences about relations between entries in data sources, a chemical relation ontology based on the International Chemical Identifier iv (InChI) is developed. For correct functioning of the chemical relation ontology, the reasoning power of the underlying ontologies was extended. The extensions to the ontologies lead to an interesting novel form of logical rule for use with ontologies, and its properties and implementation are demonstrated and discussed. The present work aims to demonstrate the usefulness of ontologies to data intensive pursuits beyond their ability to model domains. The chemical relation ontology and associated work on rules show that ontologically aware modelling of opaque or difficult to understand data can lead to the inference of information usually reserved for human experts, and the study of the PDB’s small-molecule dictionary and of the consistency of the lipid data sources shows the utility of integrating data from external sources by logical, ontological means, and also the capability detecting inconsistency between and within data sources.
See less
See moreLipids are a major class of biomolecule and potentially the most varied by structure, function, and origin. As the development of highthroughput techniques for analysis of genes and proteins has given rise to the new fields of genomics and proteomics, the development of highthroughput techniques for lipids has given rise to lipidomics. The great increases in volumes of data being produced require new methods of processing and automated analysis, but lipidomics face a great problem in characterisation due to their highly varied structures. The task of assigning identified lipids within classification schemes has traditionally fallen to human experts, but this approach will not scale with the volumes of data being produced and leads to increased likelihood of error. Recently, automated software has been produced to classify all molecules, including lipids, but it remains to be seen whether it will be taken up by the community. The increase in volume of data also necessitates the automation of more tasks usually performed by humans beyond classification. Ontologies have been used for modelling in the life sciences, most notably with the Gene Ontology, and are noted for their ability to formally capture the meaning of the domain of interest in a machine readable way, unlike with human prose, due to formal syntax and semantics. This further permits machine reasoning, where a program can interpret the ontology and draw additional inferences - a form of interpretation usually reserved for humans. In the present work a study of the Protein Data Bank’s (PDB’s) small molecule dictionary is performed, identifying resolved lipids by means of cross-reference with external sources of lipidomic data and classification. Furthermore, the consistency of classification of these external data sources is analysed. In order to permit a reasoner to make inferences about relations between entries in data sources, a chemical relation ontology based on the International Chemical Identifier iv (InChI) is developed. For correct functioning of the chemical relation ontology, the reasoning power of the underlying ontologies was extended. The extensions to the ontologies lead to an interesting novel form of logical rule for use with ontologies, and its properties and implementation are demonstrated and discussed. The present work aims to demonstrate the usefulness of ontologies to data intensive pursuits beyond their ability to model domains. The chemical relation ontology and associated work on rules show that ontologically aware modelling of opaque or difficult to understand data can lead to the inference of information usually reserved for human experts, and the study of the PDB’s small-molecule dictionary and of the consistency of the lipid data sources shows the utility of integrating data from external sources by logical, ontological means, and also the capability detecting inconsistency between and within data sources.
See less
Date
2017-03-31Licence
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Sydney Medical SchoolDepartment, Discipline or Centre
Discipline of PharmacologyAwarding institution
The University of SydneyShare