Metabolic network reconstruction

KEGG has produced gene annotation files for most sequenced organisms which can be download from its ftp (ftp://ftp.genome.jp/pub/kegg/genes/organisms). For metabolic network reconstruction, the most useful files are those containing the gene-KO (KEGG orthology) and gene-EC relationships. Based on these relationships and the reaction-KO, reaction-EC relationships obtained from the Reaction file (ftp://ftp.genome.jp/pub/kegg/ligand/reaction/reaction), one can easily get the metabolic network (represented as a reaction list and the corresponding enzyme coding genes) for a specific genome. There could be some differences between a network obtained by KO based method and an EC based network. For example, many genes are annotated as enzymes with unclear EC numbers (1.-.-.-) and the exact reactions catalyzed by these genes are not clear. However, through the gene-KO, reaction-KO relationships we may find the exact reactions for them. To get a more complete network, the users are recommended to create a merged network generated from both methods.

Metabolic network analysis

For large scale genome based metabolic networks, there are mainly two types of analysis methods: (1) stoichiometric matrix based methods such as flux balance analysis (FBA); (2) graph theory based structural analysis methods.
Flux balance analysis is a popular tool for functional analysis of genome scale metabolic networks. However for FBA one needs to add extra transport/exchange reactions and choose which metabolites are external metabolites. These processes are not straight forward from the KEGG based metabolic networks. Therefore this web tool is focused on graph theory based structural analysis methods.

Graph representation of metabolic network

Graph theory based methods are mainly used to examine the system level organization of metabolic networks. A graph is a simplified representation of metabolic networks. For example, reaction: A+B=C+D can be represented as a graph including four metabolic links: A-C, A-D, B-C and B-D. Many reactions include the so called currency metabolites such as H2O, CO2 and ATP. Links through currency metabolites in a metabolic graph may lead to biological meaningless pathways. For example, in the glycolysis pathway if ADP is included in the graph we may get a two step path from glucose to pyruvate via ADP as shown in the figure. To obtain a metabolic graph which captures the true biological connectivity, the connections through currency metabolites should be excluded. Two approaches are used in this web tool. One is based on the metabolic connection database compiled by Ma and Zeng, Bioinformatics, 19:270. In this database, the reactions were manually examined to determine which metabolic connections should be included. Another is based on the KEGG Rpair database. For each reaction, only the "main" Rpairs (those appeared in the KEGG pathway maps) are considered.

Network structure analysis

Many structure features of the reconstructed metabolic networks can be calculated using the web tool. A brief description of the network structure properties can be seen below and links for detail description in Wikipedia are provided.
Connection Degree: the number of links connected with a node. In a directed network, there are in degree and out degree considering the direction of the links. Nodes with high degree are often important nodes in a network.
Degree distribution: the distribution of node degrees in a network. Many complex networks including metabolic networks are scale free networks which have power law degree distribution.
Average Path Length: path length is defined as the number of the steps in the shortest paths from one node to another in a graph. The average path length is the average of the path lengths for all connected pairs of nodes in a graph.
Centrality: Closeness Centrality: measure how close is a node to other connected nodes. Betweenness centrality: the fraction of shortest paths between pairs of nodes that passes through a given node or edge. Load centrality: a varied form of betweeness centrality. For detail see Ulrik Brandes: On Variants of Shortest-Path Betweenness Centrality and their Generic Computation. Social Networks 30(2):136-145, 2008.
Connected component: a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices. Such a subgraph is strongly connected if the link direction is considered in a directed graph and is weakly connected if direction is ignored.
Output/input domain: the output domain of a node is defined as the number of nodes which can be reached by the node through paths. The input domain of a node is defined as the number of nodes which can reach the node through paths.
Bow-tie structure: A common global level organization structure found in many directed networks. There are mainly four subsets in a bow-tie structure: giant strongly connected component, the input, the output and the isolated subsets. For detail see Ma and Zeng, Bioinformatics 19:1423.

Pathway analysis and network decomposition

A genome scale metabolic networks often contains hundreds or thousands of reactions and it is very difficult to check the biological function for such a large network. This web tool provides two different ways for function analysis. One is to find the possible pathways from one metabolite to another and thus to analyze the metabolic capability. Multiple shortest pathways can be found and visualized automatically so that the users can easily check the found pathways.

Another way for functional analysis is to decompose the network into several small functionally somehow independent modules. By visually or statistically examining the biological function of each module, one can obtain a functional overview of the whole network. Network decomposition is often a computationally expensive process. We have developed a new fast decomposition method and made it available here. Furthermore, the decomposition method can generate partitions with different numbers of modules rather than just one optimal partition and thus offer the users more flexibility.