miXGENE

Regulatory co-modules represent a possible way to interpret heterogeneous omics data. When dealing with mRNA and miRNA data, we extract and interpret whole sets of miRNAs and their co-regulated genes instead of the individual mRNAs and miRNAs whose expression is available. The co-modules stem both from the measurements and prior knowledge on protein-protein interactions and miRNA targets. They tend to cover mRNAs and miRNAs with correlated expression profiles as well as validated or predicted interactions.

In this case study, we employ a foreign SNMNMF method to get such co-modules from TCGA ovarian cancer data . miXGENE delivers the co-modules represented as the lists of genes and miRNAs and interprets the co-modules in terms of functional enrichment analysis. There are two workflow examples shown. The first one represents a simplified task which emphasizes short runtime and responsiveness of the tool for real-time demonstration. Both the mRNA and miRNAs sets were substantially reduced as well as the set of annotation terms. The SNMNMF method itself was simplified, it works with the user given thresholds (e.g., for the co-module membership definition or the evaluation of an annotation term significance) instead of their time-demanding automated tuning searching for their best settings. The second workflow focuses on reaching the complete and realistic results. It deals with the full input data and leaves all the automated optimization procedures functional. This workflow is intended for the off-line way of interaction with the tool.

The demonstration workflow for 50 mRNAs is presented here. The workflow starts by uploading mRNA and miRNA expression matrices provided by the user in the zipped comma-separated-values files. These are real-valued matrices sized as follows: 385 samples x 12,454 mRNAs and 385 samples x 559 miRNAs. The biological samples in both the matrices match. Then, the corresponding mRNA/miRNA target matrix is provided. It is a binary interaction matrix of the size 12,454 x 559. Consequently, the protein-protein binary interaction (PPI) matrix is uploaded. The matrix has the size 12,454 x 12,454. Then, the standard data filters recommended by the SNMNMF authors are applied. They remove from both the mRNA and miRNA sets all the features that have low variance (less than the given threshold percentile) or low absolute expression values (all the values lower than the given percentile taken from whole the expression set). Finally, the size of the mRNA expression matrix is internally reduced to 50 mRNAs only. Analogically, the interaction matrices are shrinked as well. This concludes the pre-processing phase.

SNMNMF method is applied in the NIMFA_SNMNMF working block. The method takes the pre-processed expression and interaction datasets and constructs the comodules, i.e., the sets of closely related mRNAs and miRNAs both in terms of their prior interactions and expression profile correlations. The rank parameter gives the number of output comodules, l1 and l2 represent the regularization parameters assigning the weight of the interaction matrices, g1 and g2 regularize the sparsity of the factorization matrices. When increasing l parameters, the role of the interaction matrices increases in comparison with the expression profiles (l1 is for PPIs and l2 represents the role of the miRNA target matrix). When increasing g parameters, the factorization matrices become sparser. It means that the comodules tend to comprehend a lower number of mRNAs and miRNAs (g2) and the original expressions are re-expressed in terms of a lower number of comodules (g1).

The final phase of the workflow presents the generated comodules to the user. It takes the comodule incidence matrices H1 and H2 (the first gives the degree of comodule membership of miRNAs, the second pertains to mRNAs), threshodls them to minimize the size of the output (the standard deviation is used for this purpose, the higher the value of the threshold, the fewer members we get) and presents all the above-threshold members of the comodule to the user. For mRNAs, an additional gene ontology term enrichment can be computed for each comodule. The tool reports all the terms whose mRNA representatives are significantly over-represented in the given comodule with respect the whole set of mRNAs being analyzed. In the final table, these three types of the output (mRNA, miRNA, GO terms) are provided for each comodule.

The extended workflow for 5000 genes can be found here. The workflow is conceptually same with its demonstration predecessor. The main difference is that the number of mRNAs is internally reduced to 5000 mRNAs.

Identification of miRNA regulatory co-modules from ovarian cancer data Oct. 3, 2014, 9:28 a.m. by Jiří Kléma

This software was created by IDA group at Czech Technical University in Prague. The source code of this project is available on GitHub