PADIS

Pubblicazioni Aperte DIgitali Sapienza > Biotecnologie cellulari ed ematologia > SCIENZE PASTEURIANE >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10805/1372

Title: Resource for benchmarking the applicability of protein structure models
Authors: CARBAJO, DANIEL
Tutor: TRAMONTANO, ANNA
Keywords: protein structure prediction
Issue Date: 29-Feb-2012
Abstract: The function of a protein is closely related to the structure it attains. The sequence of a protein is of limited biological relevance without some knowledge of both its structure and its function; protein structures provide a wealth of information that cannot be deduced from their primary sequence alone; therefore, we can get a complete understanding of protein roles by analyzing them in structural terms. Structure-based methodologies are consequently regarded as more robust than sequence-based ones. The limiting step for these structure-based methodologies is actually having the structure of a protein at hand. Due to the ever-increasing gap between known protein sequences and structures and the ever-growing number of protein structure prediction methods available, which are becoming more and more accurate over time, the use of protein structure models is mandatory. However, and in spite of progress in the field of protein structure prediction, computed models often contain structural inaccuracies in both backbone and side-chain spatial coordinates; instead of being discarded, these models can provide important insights into the function of the native counterpart; this, in turn, demands the existence of robust methods that can effectively make use of computed models in the midrange and low range of accuracy, routinely produced by proteome-scale protein structure modeling projects. Any structure-based algorithm that does not require high-resolution structures will prove to have a big advantage and an inestimable practical value. ModelDB, the tool introduced here, strives to serve as a resource to test any structure-based method (such as an active site or ligand-binding site predictor) on protein structure models of different quality. This has the final goal of benchmarking the applicability of protein structure models for a given novel algorithm. ModelDB builds sets of models of decreasing quality, which we call decoys, given the sequences experimentally determined proteins. A decoy is a computergenerated protein structure that possesses some characteristics of native proteins, but is not biologically real. Our system is implemented in such a way that any structure-based existing method can be tested on the real structure and on the decoy models. The next step is to automatically assess at which level of quality the results of the tested method differ from those obtained with the native structure. Each decoy model is directly compared to its corresponding native structure and precise quality scores are computed. For a visual insight on how models of different qualities look like and differ from the native counterpart in a spatial context, they are "colored" following different colorschemes defined by the following spatial descriptors: Solvent accessibilities, secondary structures, cavity occurrences, average depths, protrusion indexes or burial indexes. This, in turn, allows an easy visualization and understanding of these parameters' variations in the protein structural context. Besides, functional annotation is provided when available, in terms of catalytic sites, ligand-binding sites and other sites of relevance like glycosylation sites. The tool is publicly available either as an on-line tool or a local application for larger calculations; it makes use of other in-house tools that also exists independently on-line and for local use. One of these tools, mappON, colors input structures according to diverse descriptors and outputs a table with the descriptors of selected residues (and those surrounding them); thus, it serves to analyze properties of key residues in the protein structural context and visually examine the results. The other, MAP, has some features intended to deal with the common problem in bioinformatics of mapping sequence residues onto structures, or structure residues onto another structure. Very few other public resources exist for readily retrieving decoy sets of protein structures, and we indeed have no record of any other automated pipeline for producing such decoys in an easy and user-friendly fashion. Our tool, apart from allowing to build new decoy sets for a given protein a scientist is interested in, covers many more different proteins representing a bigger portion of the protein structural space than any other resource. Furthermore, the on-line version has the advantage to let the user visually inspect and compare all the models of ranging quality for a given protein in the same spatial frame. The decoy sets are conceived to test structure-based methods and define to which extent they can make use of predicted protein structure models. However, the functional documentation, the model quality estimates and the different color schemes allow many large-scale analyses to be performed as well.
URI: http://hdl.handle.net/10805/1372
Research interests: My research interests include, but are not restricted to: - protein structure prediction/modeling - structural analysis and comparison of protein surfaces - identification/prediction of functional sites - relationships between sequence, structure, function and evolution in proteins - drug design - effects of mutation on protein structures
Skills short description: I am a Molecular Biologist by training with a strong expertise in Bioinformatics and Computational Biology. My multidisciplinary training, with experience both in "wet" and "dry" (in silico) laboratories, allows me to perform tasks ranging from PCRs and gel electrophoresis to program writing, web design and management and accurate statistics analyses, covering most of the duties in current biological/biochemical research. I am actually more focused in Bioinformatics, being particularly skilled at software development.
Personal skills keywords: Licentiate in Biology (5 years ≈ BSc + MSc): UAM (University Autónoma of Madrid). Specialized (≈ MSc) in Molecular Genetics and Cell Biology.
MSc in Bioinformatics and Computational Biology: UCM (University Complutense of Madrid).
PhD in Bioinformatics, Pasteurian Sciences School
Appears in PhD:SCIENZE PASTEURIANE

Files in This Item:

File Description SizeFormat
DanielCarbajo_CompelteThesis.pdfDaniel Carbajo Pedrosa - PhD thesis2.14 MBAdobe PDF

File del Curriculum Vitae:

CurriculumVitae.pdf 113.72 kBAdobe PDF


This item is protected by original copyright

Recommend this item

Items in PADIS are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback Sviluppo e manutenzione a cura del CINECA