Searching for differentially expressed patterns (Chuang et al., 2007) is a state-of-the-art method in network based analysis of gene expression data. The method is based on a simple search in protein-protein interaction (PPI) network guided by differential expression. The search starts from a random gene defining a seed of pattern (subgraph) being constructed. The pattern, actually a set of genes within, is iteratively expanded by another gene whose related protein interacts with some of the set members. Only the gene which improves the overall score of subgraph built so far is added. The score is computed on the aggregated, particularly averaged, expression profiles of genes within the pattern, and may be implemented as an arbitrary function quantifying the differential expression (e.g. t-test, mutual information). Thus, the search is performed from several (possible all) seed genes, and respective patterns are reported eventually.The authors claim the subgraph features extracted by their method are better then single gene markers as to the stability, replicability and predictive performance. Anyway, the subgraph features are more interpretable in terms of cellular processes crucial for oncogenesis.

This study finds and reports gene-network patterns according to the original method paper (Chuang et al., 2007) .

1 Domain Description

The pattern search workflow is demonstrated in breast cancer domain (Wang et al., 2005 ). The dataset is used the same as in the original article (Chuang et al., 2007). It contains 286 samples, expressed in 12,500 genes. The phenotype considered here is discretized time of relapse. It follows with 180 samples with early relapse (<= 5 years) and 106 samples with late relapse (> 5 years). The protein-protein interactions, which are employed to construct the patterns, were obtained from Prasad et al., 2009. These are 39,240 curated interactions.

2 Workflow Setting

This workflow enables to mine 10 best-scored differential patterns. Firstly, the data and protein-protein interactions are uploaded. Then the search itself is run. As the patterns are sufficiently defined by underlying gene-sets, the output of pattern-search block is a list of modules (i.e. gene-sets). To filter this potentially huge list, the special pattern-filter block is used. Within the block, resulting modules are reevaluated by an arbitrary statistics (related to the one used in pattern-search block), and eventually n top scoring modules are released. To reevaluate the modules, another data input is necessary, as the evaluation is based on GE metaprofile of each module. To visualize the patterns corresponding to these filtered modules, there is another block, whose inputs are (besides the modules) the feature interactions and data set again. The data set is used to colour pattern genes according to their differential expression, while the interactions are used to draw the pattern as a gene-subnetwork.

Bibliography

Han-Yu Chuang, Eunjung Lee, Yu-Tsueng Liu, et al.
Network-based classification of breast cancer metastasis
Molecular System Biology, 3 (1), 2007
URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2063581/
Yixin Wang, Jan Klijn, Yi Zhang, et al.
Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer
The Lancet, 365 (9460): 671 - 679, 2005
Keshava Prasad and Renu Goel, Kumaran Kandasamy, et al.
Human Protein Reference Database - 2009 update
Nucleic Acids Research - Database Issues, 44 (5): 767 - 772, 2009
URL http://www.ncbi.nlm.nih.gov/pubmed/18988627
Christine Staiger, Sidney Cadotm Raul Kooter, et al.
A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer
PloS one, 7 (4), 2012
URL http://journals.plos.org/plosone/article?id=10.1371/