A new unsupervised learning tool for analyzing large datasets using very limited known data via clustering was developed by the group of Prof. Domany. This solution was originally demonstrated for inferring pathway deregulation scores for specific tumor samples on the basis of expression data.
Nearly all methods analyze pathway activity in a global atomistic manner, based on an entire sample set, not attempting to characterize individual tumors. Other methods use detailed pathway activity mechanism information and other data that is unavailable in a vast majority of cancer datasets.
The new algorithm described here transforms gene-level information into pathway- level information, generating a compact and biologically relevant representation of each sample. This can be used as an effective prognostic and predictive tool that helps healthcare providers to find optimal treatment strategies for cancer patients. Furthermore, this method can be generically used for reducing the degrees of freedom in order to derive meaningful output from multi-dimensional data using limited knowns.
- Personalized cancer treatment.
- A tool for mining insight from large datasets with limited knowns.
- Provides personalized solutions.
- Can be utilized for rare conditions with very limited known information.
- Proved on real oncologic datasets.
- A Generic unsupervised learning tool.
The algorithm analyzes NP pathways, one at a time, assigning a score DP(i) to each sample i and pathway P, which estimates the extent to which the behavior of pathway P deviates from normal, in sample i. To determine this pathway deregulation score the expression levels of those dP genes that belong to P using available databases are used. Each sample i is a point in this dP dimensional space; the entire set of samples forms a cloud of points, and the principal curve that captures the variation of this cloud is calculated. Then each sample is projected onto this curve. The pathway deregulation score is defined as the distance DP(i), measured along the curve, of the projection of sample i, from the projection of the normal samples.