NEW!!!
Eamonn Maguire, Alejandra Gonzalez-Beltran, Patricia L. Whetzel, Susanna-Assunta Sansone, Philippe Rocca-Serra
OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets
Bioinformatics (2012).
[PubMed];[DOI];
[Abstract ]
|
|
ABSTRACT:
Motivation: Data collection in spreadsheets is ubiquitous, but current solutions lack support for collaborative semantic annotation that would promote shared and interdisciplinary annotation practices, supporting geographically distributed players.
Results: OntoMaton is an open source solution that brings ontology look-up and tagging capabilities into a cloud-based collaborative editing environment, harnessing Google Spreadsheets and the NCBO Bioportal Web services. It is a general purpose, format-agnostic tool that may serve as a component of the ISA software suite. OntoMaton can also be used to assist the ontology development process.
Availability: OntoMaton is freely available from Google widgets under the CPAL open source license; documentation and examples at: https://github.com/ISA-tools/OntoMaton
Contact: isatools@googlegroups.com
|
|
|
NEW!!!
Kenneth Haug, Reza M. Salek, Pablo Conesa, Janna Hastings, Paula de Matos, Mark Rijnbeek, Tejasvi Mahendraker, Mark Williams, Steffen Neumann, Philippe Rocca-Serra, Eamonn Maguire, Alejandra González-Beltrán, Susanna-Assunta Sansone, Jules L. Griffin, Christoph Steinbeck.
MetaboLights: An open-access general-purpose repository for Metabolomics studies and associated meta-data
Nucleic Acids Research (2012).
[PubMed];[DOI];
[Abstract ]
|
|
ABSTRACT:
MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
|
|
|
Alejandra González-Beltrán, May Yong, Gairin Dancey, Richard Begent
Guidelines for Information About Therapy Experiments: A proposal on best practice for recording experimental data on cancer therapy
BMC Research Notes.
[HTML];[PubMed]
[Abstract ]
|
|
ABSTRACT:
Background: Biology, biomedicine and healthcare have become data-driven enterprises, where scientists and clinicians need to generate, access, validate, interpret and integrate different kinds of experimental and patient-related data. Thus, recording and reporting of data in a systematic and unambiguous fashion is crucial to allow aggregation and re-use of data. This paper reviews the benefits of existing biomedical data standards and focuses on key elements to record experiments for therapy development. Specifically, we describe the experiments performed in molecular, cellular, animal and clinical models. We also provide an example set of elements for a therapy tested in a phase I clinical trial.
Results: We introduce the Guidelines for Information About Therapy Experiments (GIATE), a minimum information checklist creating a consistent framework to transparently report the purpose, methods and results of the therapeutic experiments. A discussion on the scope, design and structure of the guidelines is presented, together with a description of the intended audience. We also present complementary resources such as a classification scheme, and two alternative ways of creating GIATE information: an electronic lab notebook and a simple spreadsheet-based format. Finally, we use GIATE to record the details of the phase I clinical trial of CHT-25 for patients with refractory lymphomas. The benefits of using GIATE for this experiment are discussed.
Conclusions: While data standards are being developed to facilitate data sharing and integration in various aspects of experimental medicine, such as genomics and clinical data, no previous work focused on therapy development. We propose a checklist for therapy experiments and demonstrate its use in the 131Iodine labeled CHT-25 chimeric antibody cancer therapy. As future work, we will expand the set of GIATE tools to continue to encourage its use by cancer researchers, and we will engineer an ontology to annotate GIATE elements and facilitate unambiguous interpretation and data integration.
|
|
|
Alejandra González-Beltrán, Ben Tagger, Anthony Finkelstein
Federated Ontology-based Queries over Cancer Data
BMC Bioinformatics Volume 13 Supplement 1, Semantic Web Applications and Tools for Life Sciences (SWAT4LS) 2010.
[Abstract ]
|
|
ABSTRACT:
Background: Personalised medicine provides patients with treatments that are specific to their genetic
profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines,
such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer
the safest and most effective therapeutic strategy based on the gene variations of each subject. In
particular, this is valid in oncology, where knowledge about genetic mutations has already led to new
therapeutics. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and
improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast
amounts of data, however, coupled with the use of different terms – or semantic heterogeneity – in each
discipline makes the retrieval and integration of information difficult.
Results: Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support
access to distributed information. Each data source in caGrid is associated with metadata at increasing
levels of abstraction, including syntactic, structural, reference and domain metadata. The domain
metadata consists of ontology-based annotations associated with the structural information of each data
source. However, caGrid’s current querying functionality is given at the structural metadata level, without
capitalising on the ontology-based annotations. This paper presents the design of and theoretical
foundations for distributed ontology-based queries over cancer research data. Concept-based queries are
translated to the target query language, where join conditions between multiple data sources are found by
exploiting the semantic annotations. The system has been implemented, including a graphical user
interface, over the caGrid infrastructure providing a proof of concept. The approach is applicable to other
model-driven architectures. An extensive evaluation of the rewriting technique is included.
Conclusions: To support personalised medicine in oncology, it is crucial to retrieve and integrate
molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the
data makes this a challenging task. Ontologies provide a formal framework to support querying and
integration. This paper provides an ontology-based solution for querying distributed databases over
service-oriented, model-driven infrastructures.
|
|
|
May Yee Yong, Alejandra González-Beltrán, Richard Begent
Establishing a knowledge trail from molecular experiments to clinical trials
New Biotechnology (2011)
[PubMed];
[HTML];
[DOI]
[Abstract ]
|
|
ABSTRACT:
During the development cycle of a new antibody therapy, the therapeutic agent will be tested on subsequently more biologically complex models. New experiments' designs are based upon data gathered from prior models. New researchers who inherit the data and researchers from groups with different culture or expertise are often called upon to interpret these data. Experiments which are not recorded consistently or employ ambiguous terminology can make interpreting these results difficult. The researcher who had originally collected the data may not be at hand to correct any misunderstanding or offer clarification and data can be unknowingly misused. This introduces an element of risk into the therapy development process. We have developed a reporting guideline for recording therapy experiments. This guideline can be used for antibody therapy experiments performed in molecular, cellular, animal and clinical model. Adherence to a checklist of experiment elements enables consistent documentation of data and helps encourage better behaviours in reporting results [1]. A shared controlled vocabulary makes the meaning of data unambiguous to all users, ensuring that data be used the way it was intended.
|
|
|
Carsten Kettner, Dawn Field, Susanna Sansone, Chris Taylor, Jan Aerts, Nigel Binns, Andrew Black, Cedrik M. Britten, Ario de Marco, Jennifer Fostel, Pascale Gaudet, Alejandra González-Beltrán, Nigel Hardy, Jan Hellmans, Hennin Hermjakob, Nick Juty, Jim Leebens-Mack, Eamonn Maguire, Steffen Neumann, Sandra Orchard, Helen Parkinson, William Piel, Shoba Ranganathan, Philippe Rocca-Serra, Annapaola Santarsiero, David Shotton, Peter Sterk, Andreas Untergasser, Patricia L. Whetzel
Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop
Standards in Genomic Sciences, Vol 3, No 3 (2010)
[HTML];
[PDF];
[DOI]
[Abstract ]
|
|
ABSTRACT:
This report summarizes the proceedings of the second workshop of the 'Minimum Information for Biological and Biomedical Investigations'' (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at http://mibbi.org/
|
|
|
James P. McCusker, Joshua A. Phillips, Alejandra González-Beltrán,
Anthony Finkelstein, Michael Krauthammer
“Semantic web data warehousing for caGrid”
BMC Bioinformatics, Vol 10, Supp 10, 2009.
from Semantic Web Applications and Tools for Life Sciences, 2008 Endinburgh, UK 28 November 2008
[HTML][PDF];[DOI]
[Abstract ]
|
|
ABSTRACT:
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and
services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this
information. Although the data models exposed on caGrid are semantically well-annotated, it is currently up to
the caGrid client to infer relationships between the different models and their classes. In this paper, we present a
Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is
accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML)
information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We
demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data
from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus.
We argue that semantic integration is necessary for integration of data from distributed web services and that
Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers
facing similar integration challenges.
|
|
|
Alejandra González-Beltrán, Peter Milligan, Paul Sage
“Range queries over skip tree graphs”
Journal of Computer Communications,
Volume 31 Issue 2, pp 358-374, February 2008.
Special Issue: Foundation of Peer-to-Peer Computing.
[PDF];
[DOI];
[BibTex];
[Abstract ]
|
|
|
ABSTRACT:
The support for complex queries, such as range, prefix and aggregation queries, over structured
peer-to-peer systems is currently an active and significant topic of research. This paper demonstrates
how Skip Tree Graph, as a novel structure, presents an efficient solution to that problem area
through provision of a distributed search tree functionality on decentralised and dynamic environments.
Since Skip Tree Graph is based on skip trees, a concurrent approach to skip lists, it constitutes
an augmentation of skip graphs that extends its functionality and allows for important
performance improvements. This work presents a thorough comparison between these two related
peer-to-peer overlay networks, their construction, search algorithms and properties.
Being based on tree structures, skip tree graphs supports aggregation queries and multicast/broadcast
operations, which cannot be directly implemented in its predecessor. The repair mechanism for healing
the structure in case of failures is more efficient and harnesses the parallelism inherent in P2P
networks. Particular consideration is given to the performance of different range-query schemes
over the two related structures. Theoretical and experimental results conclude that Skip Tree Graphs
outperform skip graphs on both exact-match and range searches.
|
|
NEW!!!
Alejandra González-Beltrán, Eamonn Maguire, Philippe Rocca-Serra and Susanna-Assunta Sansone
“The open source ISA software suite and its international user community: knowledge management of experimental data”
In proceedings
Network Tools and Applications in Biology (NETTAB) 2012.
Como, Lombardia, Italy. 13-16 November 2012.
[PDF]; [Slides];
[Abstract ]
|
|
|
ABSTRACT:
Motivation and Objectives
Both in academia and industry, data generation is currently in the order of petabytes in the biomedical domain. The availability of this massive amount of data brings with it many challenges, especially when considering data sharing and integration aiming at a later re-use. In this context, the adoption of standard formats, minimum information guidelines and terminologies/ontologies for the rich annotation of experimental data is crucial. Annotation is a time-consuming task that must be supported by software tools, which should also enable querying, linking, integrating, reasoning and analysing the data as well as the information about it.
The Investigation/Study/Assay (ISA) infrastructure (Rocca-Serra et al 2010) aims at facilitating this rich description of heterogeneous experimental data and supporting the different steps of the data management workflow. The infrastructure revolves around a general-purpose file format (ISA-Tab) and includes an open source software suite supporting compliance with community standards and dealing with the harmonization of the experimental metadata. The ultimate goal is to allow for the gradual progression from unstructured, usually non-digital metadata kept in lab notebooks to structure data that can be interpreted by machines (see Figure 1). The success of the ISA infrastructure is evidenced by the growing ISA Commons community (Sansone et al 2012), which encompasses increasingly diverse domains varying from metabolomics, (meta)genomics, proteomics, system biology to environmental health, environmental genomics and stem cell discovery (Ho Sui et al 2012).
We will present the components of the ISA infrastructure, the rationale behind them and their evolution. In particular, we will introduce our efforts to expand the infrastructure into three important directions: collaboration in a cloud environment, support for analysis with R, and the semantic web world. We will show use cases to exemplify the usage of the ISA infrastructure.
|
|
Alejandra González-Beltrán, Ben Tagger, Anthony Finkelstein
“Ontology-based Queries over Cancer Data”
In proceedings
Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2010).
Berlin, Germany. December 10, 2010.
Best paper award
[PDF]; [Slides];
[Abstract ]
|
|
|
ABSTRACT:
The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resources without exploiting the semantic annotations. In this paper, we present the design and development of an ontology-based querying functionality that consists of: the generation of OWL2 ontologies from the underlying data resources’ metadata and a query rewriting and translation process based on reasoning, which converts a query at the domain ontology level into queries at the software infrastructure level. We present a detailed analysis of our approach as well as an extensive performance evaluation. While the implementation and evaluation was performed for the caGrid infrastructure, the approach could be applicable to other model and metadata-driven environments for data sharing.
|
|
Joshua Phillips, Alejandra González-Beltrán, Anthony Finkelstein, Jyotishman Pathak
“Exposing caGrid Data Services as Linked Data”
In the proceedings of the
2010 AMIA Summit on Clinical Research Informatics (AMIA CRI 2010).
San Francisco, CA, USA. March 12-13, 2010.
[PDF];
[Abstract ]
|
|
|
ABSTRACT:
The National Cancer Institute (NCI) enables sharing cancer-related data through an open, federated information network based on the caGrid middleware. The caGrid supports interoperability by building standard-based services with precise semantic definitions. However, rapid, yet flexible, integration of data is not supported. In this research, we address this requirement by exposing the caGrid data services as Linked Data and illustrate the approach integrating data from a tissue bank repository (caTissue) with a microarray/gene expression database (caArray).
|
|
Alejandra González-Beltrán, Anthony Finkelstein, J Max Wilkinson, Jeff Kramer
“Domain Concept-Based Queries for Cancer Research Data Sources ”
In the proceedings of the
22nd IEEE International Symposium on Computer-Based Medical Systems 2009 (CBMS 2009).
Special track on HealthGrid Computing - Applications to Biomedical Research and Healthcare
August 3-4 2009, Albuquerque, New Mexico, USA
[PDF];[Demo];
[Show talk];
[Abstract ]
|
|
|
ABSTRACT:
Biomedical scientists generate, access, validate and interpret
multiple distributed and heterogeneous data sets. Semantic
annotations for these data sets are paramount for exchanging
and using the data, and take the form of concepts
from a domain ontology. ONIX is a platform that facilitates
the access to cancer research data resources and one of its
goals is to interoperate with caGrid a grid computing infrastructure
for data sharing. In this paper, we present the
ONIX approach to building a semantic layer with support
for concept-based queries, which exploit semantic annotations
of resources, focusing on caGrid resources. The main
contributions of this work are: the automatic generation of
OWL ontologies from resources metadata; concept-based
query construction and validation; rewriting and translation
from concept-based queries to the caGrid query language.
|
|
Giulio Napolitano, Alejandra González-Beltrán,
Colin Fox, Adele Marshall, Anthony Finkelstein, Peter McCarron
“Biomedical Ontolgies and Grid Computing as New Resources for Cancer Registries”
In proceedings of the International Conference on Health Informatics 2009 (HEALTHINF 2009).
14-17 January 2009, Porto, Portugal
[PDF];[BibTex];
[Abstract ]
|
|
|
ABSTRACT:
Cancer registry information systems need to deal with several data sets annotated with different coding
systems. Designing, maintaining and linking these datasets involves dealing with semantic issues, tackling
the shortcomings exhibited by coding systems as well as considering an appropriate computing
infrastructure. We argue that biomedical ontologies and a Grid service infrastructure, together with a clear
separation between semantic and coding models, can prove beneficial to cancer registries in terms of
accuracy of knowledge modelling, interoperability and knowledge sharing with other registries and related
data sources, automation of information retrieval. A real-life example is illustrated and a brief review of
related projects is provided. We conclude that a formal semantic layer, which is the basis of large scale
meaning-oriented projects such as the Semantic Web, is the key to the provision of a uniform, science-based
view across cancer registries and related systems.
|
|
|
Alejandra González-Beltrán, Paul Sage, Peter Milligan
“Skip Tree Graph: a Distributed and Balanced Search Tree for Peer-to-Peer Networks”
In proceedings of the IEEE International Conference on Communications 2007 (ICC 2007), pages 1881-1886.
24-28 June 2007, Glasgow, Scotland
[DOI];
[BibTex];
[Abstract ]
|
|
ABSTRACT:
Skip Tree Graph is a novel, distributed, data structure for peer-to-peer systems that supports exact-match
and order-based queries such as range queries efficiently. It is based on skip trees, which are randomised
balanced search trees equivalent to skip lists and designed to provide improved concurrency.
Skip tree graphs constitute an extension of skip graphs enhancing their performance in both, exact-match
and range queries. Moreover, skip tree graph maintains the underlying balanced tree structures using
randomisation and local operations, which provides a greater degree of concurrency and scalability.
|
|
|
Alejandra González-Beltrán, Peter Milligan, Paul Sage
“Heterogeneity-Aware Distributed Access Structure”
In proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing 2005 (P2P 2005), pages 152-153.
31 August - 2 September 2005, Konstanz, Germany
[DOI];
[BibTex];
[Abstract ]
|
|
|
ABSTRACT:
Efficient access to distributed and dynamic multidimensional data is vital for applications in large,
heterogeneous, decentralised, resource-sharing environments such as Grids and Peer-to-Peer systems.
Most systems providing this functionality assume homogeneous participants. This paper proposes HADAS,
an access structure exploiting heterogeneity to build a self-aware adaptive information system.
|
|
|
May Yong, Alejandra González-Beltrán, Richard Begent
“Integrating Comprehensive Cancer Imaging Centre's data: Semantic linkage of images and experimental data to aid translational research”
The 3rd Annual Cancer Research UK & EPSRC Cancer Imaging Conference. Imperial College London, April 26th, 2012.
[Poster] ;
|
|
Alejandra González-Beltrán
“Book Review: Semantic Web Information Management”
BCS Informer Newsletter of the British Computer Society Information Retrieval Specialist Group. Winter 2012.
[HTML]
|
|
Alejandra González-Beltrán, May Yong, Richard Begent
“Towards a reporting structure for cancer therapy experiments”
UCL Computational Biology Symposium 15th February 2011.
[Poster] ;
[Abstract]
|
|
|
ABSTRACT:
In biomedical research, it is important to be able to reproduce,
compare, reuse and integrate data from different experiments. Thus,
recording and reporting experiments - their design, context and results
- in an unambiguous manner is crucial for the advancement of biomedical
research. This helps to avoid unnecessary repetition and makes reliable
analysis possible due to improved statistical power of the data. While
the experiments’ description should not convey different
interpretations, the data exchange formats must also allow sharing,
integration and interoperability.
In this work, we focus on cancer therapy experiments. The Guidelines on
Information about Therapy Experiments (GIATE) arose from a collaboration
among members of the Antibody Society with the objective to identify the
main elements needed to record antibody therapy experiments. Then, GIATE
was extended to consider therapy experiments in more general terms.
GIATE represents information ranging from the molecular target and
agents involved in the therapy and their interaction; the anatomical and
disease states of the models used for testing, which include molecular,
cellular, pre-clinical and clinical models; equipments and experiment
setups, measurements to determine the agents therapy effects and outcomes.
This work presents a series of GIATE resources, which establish best
practices to record cancer therapy experiments. These resources revolve
around the three basic components of a reporting structure (Taylor et
al, Nat Biotechnol 2008): a) a minimum information (MI) specification,
b) data standards that capture the MI in non-proprietary formats, c)
controlled vocabularies or ontologies, which use unambiguous, standard
terms to describe the MI specification.
For GIATE, we introduce:
• A minimum information checklist enumerating the elements that should
be recorded for cancer therapy experiments
• A list of Common Data Elements (CDEs), as per the ISO 11179 metadata
registries’ standard, which correspond to the elements in the checklist
above. Some CDEs were extracted from the caBIG® semantic infrastructure;
others were created specifically for GIATE. The CDEs are annotated with
the NCI thesaurus ontology.
• A simple data format (spreadsheet-based) for therapy data called
GIATE-TAB.
• A formal conceptualization (or ontology) for cancer therapy
experiments, which supports the unambiguous description of experiments
following the MI checklist above. We used the Web Ontology Language
(OWL), which is based on description logics and recommended by the
World-Wide Web Consortium. By using OWL as data exchange format,
following a semantic web/linked data approach, sharing, integration and
interoperability are guaranteed.
GIATE is part of the ‘Minimum Information for Biological and Biomedical
Investigations’ (MIBBI) project (http://mibbi.org).
|
|
|
Alejandra González-Beltrán, Ben Tagger, Anthony Finkelstein
“Querying distributed cancer databases using domain concepts”
UCL Computational Biology Symposium 15th February 2011.
[Poster] ;
[Abstract]
|
|
Alejandra González-Beltrán, Ben Tagger, Anthony Finkelstein
“Ontology-based queries for the caGrid infrastructure”
In “Building a Collaborative Biomedical Network“,
caBIG Annual Meeting, September 13-15, 2010, Washington, D.C., U.S.A.
[Poster] ;
[Abstract]
|
|
|
ABSTRACT:
Sharing, searching and integrating data is important for the advancement of biomedical research. To facilitate these tasks, the UK National Cancer Research Institute Informatics Initiative (NCRI II) promotes standards and tools and collaborates with the US National Cancer Institute caBIG® programme. This study presents our work at University College London on ontology-based queries over the caBIG® infrastructure, caGrid, as part of the NCRI II plan to maximise the impact of cancer research.
In caGrid, data are exposed as services following a model-driven architecture: they are based on information models described by the Unified Modeling Language (UML) and annotated with concepts from the NCI thesaurus (NCIt) ontology. The annotations provide unambiguous meaning and support semantic interoperability. A metadata registry, caDSR, maintains the mappings between UML models and their semantic annotations. In this way, the NCIt ontology serves as a conceptual unified view of the data services. However, the caGrid query functionality does not currently consider this conceptual view and it is based solely on the UML models.
Our approach for ontology-based queries over caGrid is based on semantic web technologies and, in particular, the Web Ontology Language (OWL). OWL is a formal language for knowledge modeling based on description logics, whose latest version, OWL2, is a W3C recommendation since October 2009. Our approach involves:
- module extraction from the NCIt ontology,
- UML to OWL conversion and
- query rewriting from queries expressed with NCIt concepts to caGrid queries.
As an extension to previous work, we developed a caGrid analytical service offering methods that, given a project in caDSR, extract modules from the NCIt and convert the UML models into OWL. Additionally, we have revised the query rewriting process in light of the OWL2 profiles and the theoretical results on the trade-off between expressiveness and computational complexity.
|
|
|
Alejandra González-Beltrán, May Yong, Richard Begent
“Towards an unambiguous and formal description of cancer therapy experiments”
In “Building a Collaborative Biomedical Network“,
caBIG Annual Meeting, September 13-15, 2010, Washington, D.C., U.S.A.
[Poster];
[Abstract]
|
|
|
ABSTRACT:
Recording and reporting experiments - their design, context and results - in an unambiguous manner is crucial for the advancement of biomedical research, as it enables reproduction, well-grounded comparisons, reuse and integration of data from different experiments. Thus, unnecessary repetition is avoided and reliable analysis is possible due to improved statistical power of the data. While the experiments’ description should avoid different interpretations, the data exchange formats must also allow sharing, integration and interoperability.
In this work, we focus on cancer therapy experiments. Previously, some of the authors presented Guidelines on Information about Therapy Experiments (GIATE). GIATE consists of a list of Common Data Elements (CDEs), as per the ISO 11179 metadata registries’ standard. Some CDEs were extracted from the caBIG® semantic infrastructure; others were created specifically. The CDEs are annotated with the NCI thesaurus ontology.
We present GIATE as an ontology, or formal conceptualization, of cancer therapy experiments. As opposed to a list of information elements, the ontology supports an unambiguous and formal description. We used the Web Ontology Language (OWL), which is based on description logics and recommended by the World-Wide Web Consortium. By using OWL as data exchange format, following a semantic web/linked data approach, sharing, integration and interoperability are guaranteed.
While GIATE was developed independently of the NCI thesaurus to focus on the specific sub-domain related to therapies, we also produced a matching between the two ontologies. This matching will facilitate the interoperability between GIATE-compliant knowledge bases with caBIG® data services. This poster also includes a case study demonstrating how the GIATE ontology is used to model two different experiments.
It is our view that the GIATE ontology is a further step towards achieving integrative translational research and that it should undergo comprehensive inspection by the cancer research community to be used as a recording standard.
|
|
|
Alejandra González-Beltrán, Anthony Finkelstein, J Max Wilkinson, Jeff Kramer
“Semantic concept-based queries for ONIX - caGrid case ”
In “Solving Basic and Clinical Research Challenges in Cancer and Beyond“,
caBIG Annual Meeting, July 20-22, 2009, Washington, D.C., U.S.A.
[Poster]
|
James McCusker, Michael Krauthammer, Joshua Phillips,Alejandra González-Beltrán, Anthony Finkelstein
“Semantic Web Data Warehousing for caGrid”
In “Solving Basic and Clinical Research Challenges in Cancer and Beyond“,
caBIG Annual Meeting, July 20-22, 2009, Washington, D.C., U.S.A.
|
Alejandra González-Beltrán
“Have you found what you meant?”
In NCRI Informatics Initiative Newsletter. Issue 11. Autumn 2008.
|
Giulio Napolitano, Alejandra González-Beltrán, Colin Fox, A Marshall, Anthony Finkelstein, P. McCarron.
“Biomedical Ontologies as a New Resources for Cancer Registries”
UK Association of Cancer Registries (UKACR) Annual Conference “Using Information to Improve Cancer Outcomes”. 10-11 September 2008. Keble College, Oxford, UK.
|
Alejandra González-Beltrán, Anthony Finkelstein, Jeff Kramer, J. Max Wilkinson.
“ONIX Semantic Query Infrastructure”
NCI/NCRI Joint Conference: Biomedical Research Without Borders. Bethesda, USA. 2-3 September 2008.
|
Alejandra González-Beltrán, Anthony Finkelstein, J Max Wilkinson
“Platform Architecture and Requirements Testing (PART2) for ONIX Federated Queries”
In “Getting Connected with caBIG“,
caBIG Annual Meeting, June 23-25, 2008, Washington, D.C., U.S.A.
|
Steven Johnstone, Fionn Murtagh, Alejandra González-Beltrán, Pedro Contreras, Peter Milligan, Paul Sage, Marta Turcsányi-Szabó, Lydia Montandon, Ann Jones
“Learning Objects in the Form of Code”
TRAILS, Kaleidoscope Deliverable D22.3.1, 30 September 2004
Personalised and Collaborative Trails of Digital and Non-Digital Learning Objects
Kaleidoscope Network of Excellence, Shaping the scientific evolution of Technology Enhanced Learning
[PDF]
|