Friday, October 26, 2012
Genomics GM_D0004
Title : Mining the Structural Genomics Pipeline:Identification of Protein Properties that Affect High-throughput Experimental Analysis
Author : Chern-Sing Goh1,2, Ning Lan1,2, Shawn M. Douglas1,2,3, Baolin Wu4
Nathaniel Echols1,2, Andrew Smith1,2,3, Duncan Milburn1,2
Gaetano T. Montelione2,5,6,7, Hongyu Zhao4,8 and Mark Gerstein1,2,3*
Year : 2004
Place of publish :
Abstract :
Structural genomics projects represent major undertakings that will
change our understanding of proteins. They generate unique datasets
that, for the first time, present a standardized view of proteins in terms
of their physical and chemical properties. By analyzing these datasets
here, we are able to discover correlations between a protein’s characteristics
and its progress through each stage of the structural genomics pipeline,
from cloning, expression, purification, and ultimately to structural
determination. First, we use tree-based analyses (decision trees and
random forest algorithms) to discover the most significant protein features
that influence a protein’s amenability to high-throughput experimentation.
Based on this, we identify potential bottlenecks in various
stages of the structural genomics process through specialized “pipeline
schematics”. We find that the properties of a protein that are most significant
are: (i) whether it is conserved across many organisms; (ii) the
percentage composition of charged residues; (iii) the occurrence of hydrophobic
patches; (iv) the number of binding partners it has; and (v) its
length. Conversely, a number of other properties that might have been
thought to be important, such as nuclear localization signals, are not
significant. Thus, using our tree-based analyses, we are able to identify
combinations of features that best differentiate the small group of
proteins for which a structure has been determined from all the currently
selected targets. This information may prove useful in optimizing highthroughput
experimentation. Further information is available from
http://mining.nesg.org/.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment