Common biochemical properties of metabolic genes recurrently dysregulated in tumors

Background Tumor initiation and progression are associated with numerous metabolic alterations. However, the biochemical drivers and constraints that contribute to metabolic gene dysregulation are unclear. Methods Here, we present MetOncoFit, a computational model that integrates 142 metabolic features that can impact tumor fitness, including enzyme catalytic activity, pathway association, network topology, and reaction flux. MetOncoFit uses genome-scale metabolic modeling and machine-learning to quantify the relative importance of various metabolic features in predicting cancer metabolic gene expression, copy number variation, and survival data. Results Using MetOncoFit, we performed a meta-analysis of 9 cancer types and over 4500 samples from TCGA, Prognoscan, and COSMIC tumor databases. MetOncoFit accurately predicted enzyme differential expression and its impact on patient survival using the 142 attributes of metabolic enzymes. Our analysis revealed that enzymes with high catalytic activity were frequently upregulated in many tumors and associated with poor survival. Topological analysis also identified specific metabolites that were hot spots of dysregulation. Conclusions MetOncoFit integrates a broad range of datasets to understand how biochemical and topological features influence metabolic gene dysregulation across various cancer types. MetOncoFit was able to achieve significantly higher accuracy in predicting differential expression, copy number variation, and patient survival than traditional modeling approaches. Overall, MetOncoFit illuminates how enzyme activity and metabolic network architecture influences tumorigenesis.


Supplemental Figure 2. MetOncoFit predictions for colorectal cancer
Top | Differential expression: Similar to other cancers, several topological features are associated with differential expression in colon cancer. Genes with low topological distances to biomass components -UMP and phosphatidyl serine, were upregulated. 10-fold cross validation accuracy is 98%.
Middle | Copy number variation: The biomass epicenter score is associated with a gain in copy number, genes that were farther from the network center showed a significant gain in copy number. Fatty acid metabolism showed a non-linear association with copy number, wherein genes that decreased flux through this pathway displayed copy number alterations, which can be either a loss or gain in copy number. 10-fold cross validation accuracy is 94%.
Bottom | Patient survival: Flux change in several metabolic pathways, including urea cycle, arginine and proline metabolism are predictive of patient survival. The sum of topological distances to biomass components are associated with negative patient outcomes; down regulation of genes closer to the network center was associated with positive patient survival. 10-fold cross validation accuracy is 98%. 6

Supplemental Figure 3. MetOncoFit predictions for B-cell Lymphoma
Top | Differential expression: Several topological features are associated with differential expression in B-cell Lymphoma cancer. Genes topologically closer to biomass and media componentslysine, ammonia and monoacyl glycerol, were upregulated.10-fold cross validation accuracy is 97%.
Middle | Copy number variation: Increased flux through pyruvate metabolism and high biomass and media epicenter scores are associated with a gain in copy number. 10-fold cross validation accuracy is 75%.
Bottom | Patient survival: Flux through pyruvate metabolism is associated with patient mortality; further downregulation of genes topologically closer to alanine and glycogen are associated with positive patient survival. 10-fold cross validation accuracy is 95%. 7 8

Supplemental Figure 4. MetOncoFit predictions for ovarian cancer
Top | Differential expression: Similar to other cancers, several topological features are associated with differential expression in colon cancer. Genes with low topological distances to biomass componentscholesterol, glycogen and Isophosphatidyl choline, were upregulated. 10-fold cross validation accuracy is 97%.
Middle | Copy number variation: Genes with high media epicenter scores are associated with a gain in copy number. 10-fold cross validation accuracy is 72%.
Bottom | Patient survival: Metabolic flux through arginine and proline metabolism, and urea cycle are predictive of patient survival. 10-fold cross validation accuracy is 93%. 9

Supplemental Figure 5. MetOncoFit predictions for prostate cancer
Top | Differential expression: Flux through folate metabolism is associated with gene upregulation. Glycolytic flux showed a non-linear association with differential gene expression, wherein genes that decreased flux through this pathway were either up-and down-regulated.
Hence flux through this pathway is predictive of dysregulation but not in a direct linear fashion. 10-fold cross validation accuracy is 99%.
Middle | Copy number variation: Increased flux through glutamate metabolism and glycolysis is associated with a gain in copy number. Metabolic flux through fatty acid metabolism is associated with copy number loss. 10-fold cross validation accuracy is 64%.
Bottom | Patient survival: Several topological features are associated with both positive patient outcomes and patient mortality. Downregulation of genes with low topological distances to biomass componentsproline, glycogen and Isophosphatidyl choline, is associated with better patient survival. 10-fold cross validation accuracy is 98%. 12

Supplemental Figure 6. MetOncoFit predictions for renal cancer
Top | Differential expression: Several topological and metabolic flux features are associated with gene upregulation. For example, genes away from the network center (i.e. high topological distance to all biomass components) were likely to be upregulated, while those closer to the amino-acid -proline are downregulated. 10-fold cross validation accuracy is 97%.
Middle | Copy number variation: Flux through arginine and proline metabolism, urea cycle and glutamate pathway are predictive of copy number alterations. Similar to other cancers, genes away from the network center were likely to show a gain in copy number. 10-fold cross validation accuracy is 83%.
Bottom | Patient survival: Upregulation of genes with low topological distances to biomass componentslysine, water and glutamate, is associated with better patient survival. Increased flux through oxidative phosphorylation pathway is also predictive of increased patient survival. 10-fold cross validation accuracy is 99%.
13 Supplemental Figure 8. Gain in copy number for metabolic genes in the urea cycle is a recurring metabolic rewiring strategy in NSCLC Genes selected in the heatmaps are from the KEGG gene set for arginine and proline metabolism, which contains the urea cycle. Several genes that directly participate in or are proximal to the urea cycle, including GLS2, GOT2, and NOS3 display a gain in copy number and have high metabolic flux through the urea cycle.