Bioinformatics for Glycan Expression

Integrated Technology Resource for Biomedical Glycomics: Technological Research and Development Project IV, A Project funded by NIH

Subproject 4: Bioinformatics of Glycan Expression

William S. York, Senior Investigator. Amit Sheth, John Miller and Krzysztof J. Kochut, Investigators

A detailed description of the progress we have made in this project is given in the file Project4-Progress.pdf

The basics of Semantic Technology. What is an ontology, and why do we use an ontology?

We have developed a highly expressive ontology, called GlycO, for the semantic representation of knowledge in the glycomics domain. Along with another ontology ProPreO that we are developing, GlycO is the central organizing component of our glycoinformatics system. Open the GlycO demo.

As clearly demonstrated by the last demo, the routine use of ontologies for the retrieval and exploration of complex data will require significant advances in semantic visualization and browsing technologies. Our approach to this challenge is the development of a new tool, called GlycoVista. Open the GlycoVista demo.

The ProPreO ontology captures knowledge of the experimental proteomics and glycomics analysis process. Together with GlycO the fundamental relationships between glycomics concepts and their association to experimental data are described, allowing individual elements of the data to be classified and viewed in the overall context of the biological/biochemical system. These ontologies will serve as the glue that ties the components of our bioinformatics system together and as a semantic basis for a portal that we will develop to facilitate data access and to reveal relationships within the data. Open the ProPreO demo.

We have initiated the development of a web portal that will provide a gateway for the access of the semantic organization and visualization tools that will comprise our glycoinformatics system. This portal will include features that will interesting to scientists in a broad range of biological disciplines, including and beyond glycomics. Open the Stargate demo.

As part of our ongoing efforts to streamline the high-throughput analysis of glycomics data, we have developed a suite of tools to convert mass spectral data into matrix formats that facilitate processing (e.g., for quantitative analysis of glycopeptides) and visualization. For example, LC-MS data is "binned" to create a matrix that can be processed and visualized with programs written in the R language. Open the MS visualization demo.

We have developed a simple genomics database conversion utility that facilitates MS/MS ion searching when the investigator is attempting to identify deglycosylated N-glycans. Genomic databases modified with this utility produce results that are easier to analyze and have a lowered false-discovery rate. Open the database utility demo.

The manipulation and processing of LC-MS data is part of a larger initiative to develop workflow protocols for high-throughput glycomics analysis. This necessary due to the extremely large amount of data that will be collected at the CCRC and other glycomics facilities world-wide. Workflow protocols define how tasks are structured, who performs them, what their relative order is, how they are synchronized, how information flows to support the tasks, and how tasks are being tracked. We have developed preliminary workflow protocols for the LC-MS/MS analysis of glycopeptides. These will serve as a prototype for more complex workflows that include parallel chromatography steps and comparative quantitation schemes as they mature in the laboratory. Open the Workflow demo.