
The detail view contains ID number of the CM run providing the ability to use it in creation of a cleverClassifier to study new datasets, as well as a link to perform Gene Ontology analysis using the second part of our toolkit, the cleverGO.
PROTEIN SCAFFOLD ROC CURVES FULL
The analysis is not restricted to the consensus information only - a link to a full CM view is provided in the main panel (with details on p-value, cross-validation performances, ROC curves and other statistics). The colour of each array-spot represents differential states of enrichment for the dataset pairs and allows easy interpretation of increase, decrease or insufficient signal. The indicators collate 10 predictors for each selected feature and represent their consensus with a colour, akin to a micro-array slide (Fig. The user can glean what trend is detected in the data using different physico-chemical features. In each multiCM run, the information is compiled together from individual models into a high-level overview: For a detailed description of CM performances, we refer to our previous publication. In each list, the CM screens physico-chemical properties encoded by protein sequences to identify those that best discriminate positive and negative classes (currently supported physico-chemical properties are: nucleic acid binding propensity, membrane propensity, alpha-helix propensity, aggregation propensity, beta sheet propensity, burial propensity and hydrophobicity, but custom properties can be included, as explained in the online Tutorial). Individual sets are classified as positive or negative for binary comparison (the assignment is only needed to create two groups and does not influence the calculations). The multiCM accepts multiple protein sets in FASTA format. The purpose of our analysis is twofold: to provide examples that can be used as a reference in other studies and to shed light on the link between nucleic-acid binding abilities and protein features, such as structural disorder and aggregation, that are increasingly recognized as key factors for cellular function and homeostasis. sapiens, and the relationship between aggregation and longevity in C. cerevisiae chaperones and their substrates, the physico-chemical determinants of protein insolubility in S. We demonstrate the usefulness of our methods by investigating the RNA-binding abilities of S. cleverGO integrates multiple analyses in one platform and facilitates GO processing through an interactive analysis accessible via web browser. For instance, GOrilla calculates GO terms enrichments, but other tools are needed to summarize the results (e.g. Current tools do not provide a unique interface that combines GO term analysis with intuitive interpretation and visualization. While GO statistics are important to characterize the functional role of proteins, their interpretation is difficult without further downstream processing. The second algorithm, the cleverGO, is inspired by the need to simplify Gene Ontology (GO) annotation output. The first method presented here, the multiCleverMachine ( multiCM), is an extension of the cleverMachine approach (CM ) to classify multiple protein datasets using physico-chemical properties. Our goal is to start bridging this gap by providing algorithms for analysis of protein sets and discovery of mechanisms that regulate protein function and interactions. While several application programming interfaces are available to analyse computational and experimental results, a simple and intuitive interface is currently lacking or missing. There is a growing gap between amount of proteomic data and availability of tools for their analysis. Each of the pages contains links to the corresponding documentation and tutorial. The multiCM and cleverGO can be freely accessed on the Web at and. In an intuitive way, multiCM and cleverGO provide accurate classifications of physico-chemical features and annotations of biological processes, molecular functions and cellular components, which is extremely useful for the discovery and characterization of new trends in protein datasets. Our results are in striking agreement with available experimental evidence and unravel features that are key to understand the mechanisms regulating cellular homeostasis. We illustrate the powerfulness of our approach by investigating the links between RNA-binding ability and other protein features, such as structural disorder and aggregation, in S. Comparison between multiple protein datasets requires the choice of an appropriate reference system and a number of variables to describe their differences. Here we introduce an innovative approach to discriminate multiple protein datasets ( multiCM) and to measure enrichments in gene ontology terms ( cleverGO) using semantic similarities.
