KEGG has produced gene annotation files for most sequenced organisms which can be
download from its ftp (ftp://ftp.genome.jp/pub/kegg/genes/organisms).
For metabolic network reconstruction, the most useful files are those
containing the gene-KO (KEGG orthology) and gene-EC relationships. Based on
these relationships and the reaction-KO, reaction-EC relationships obtained
from the Reaction file (ftp://ftp.genome.jp/pub/kegg/ligand/reaction/reaction),
one can easily get the metabolic network (represented as a reaction list and
the corresponding enzyme coding genes) for a specific genome. There could be
some differences between a network obtained by KO based method and an EC based
network. For example, many genes are annotated as enzymes with unclear EC
numbers (1.-.-.-) and the exact reactions catalyzed by these genes are not
clear. However, through the gene-KO, reaction-KO relationships we may find the
exact reactions for them. To get a more complete network, the users are
recommended to create a merged network generated from both methods.

For large scale genome based metabolic networks, there are mainly two types of
analysis methods: (1) stoichiometric matrix based methods such as flux balance analysis
(FBA); (2) graph theory based structural analysis methods.

Flux balance analysis is a popular tool for functional analysis of genome scale metabolic networks. However for FBA one needs to add extra transport/exchange reactions and choose which metabolites are external metabolites. These processes are not straight forward from the KEGG based metabolic networks. Therefore this web tool is focused on graph theory based structural analysis methods.## Graph representation of metabolic network

Graph theory based methods are mainly used to examine the system level organization
of metabolic networks. A graph is a simplified representation of metabolic
networks. For example, reaction: A+B=C+D can be represented as a graph
including four metabolic links: A-C, A-D, B-C and B-D. Many reactions
include the so called currency metabolites such as H2O, CO2 and ATP. Links
through currency metabolites in a metabolic graph may lead to biological
meaningless pathways. For example, in the glycolysis pathway if ADP is included
in the graph we may get a two step path from glucose to pyruvate via ADP as
shown in the figure. To obtain a metabolic graph which captures the true
biological connectivity, the connections through currency metabolites should be
excluded. Two approaches are used in this web tool. One is based on the metabolic
connection database compiled by Ma and Zeng, Bioinformatics,
19:270. In this database, the reactions were manually examined to determine
which metabolic connections should be included. Another is based on the KEGG Rpair database. For each
reaction, only the "main" Rpairs (those appeared in the KEGG pathway maps) are
considered.
## Network structure analysis

Many structure features of the reconstructed metabolic networks can be calculated
using the web tool. A brief description of the network structure properties can
be seen below and links for detail description in Wikipedia are provided.

**Connection Degree**: the number of links
connected with a node. In a directed network, there are in degree and out
degree considering the direction of the links. Nodes with high degree are often
important nodes in a network.

**Degree distribution:
**the distribution of node degrees in a network. Many complex networks including metabolic
networks are scale
free networks which have power law degree distribution.

**Average Path Length:
**path length is defined as the** **number of the steps in the shortest paths from one node
to another in a graph. The average path length is the average of the path
lengths for all connected pairs of nodes in a graph.

**Centrality**: Closeness Centrality: measure
how close is a node to other connected nodes. Betweenness centrality: the fraction of shortest paths
between pairs of nodes that passes through a given node or edge. Load
centrality: a varied form of betweeness centrality. For detail see Ulrik
Brandes: On Variants of Shortest-Path
Betweenness Centrality and their Generic Computation. Social
Networks 30(2):136-145, 2008.

**Connected
component**: a subgraph in which any two vertices are
connected to each other by paths, and which is connected to no additional
vertices. Such a subgraph is strongly connected if the link direction is
considered in a directed graph and is weakly connected if direction is ignored.

__Output/input domain__:
the output domain of a node is defined as the number of nodes which can be reached
by the node through paths. The input domain of a node is defined as the number of
nodes which can reach the node through paths.

__Bow-tie structure__:
A common global level organization structure found in many directed networks.
There are mainly four subsets in a bow-tie structure: giant strongly connected
component, the input, the output and the isolated subsets. For detail see Ma
and Zeng, Bioinformatics 19:1423.

Flux balance analysis is a popular tool for functional analysis of genome scale metabolic networks. However for FBA one needs to add extra transport/exchange reactions and choose which metabolites are external metabolites. These processes are not straight forward from the KEGG based metabolic networks. Therefore this web tool is focused on graph theory based structural analysis methods.

A genome scale metabolic networks often contains hundreds or thousands of reactions and
it is very difficult to check the biological function for such a large network.
This web tool provides two different ways for function analysis. One is to find
the possible pathways from one metabolite to another and thus to analyze the
metabolic capability. Multiple shortest pathways can be found and visualized
automatically so that the users can easily check the found pathways.

Another way for functional analysis is to decompose the network into several small functionally somehow independent modules. By visually or statistically examining the biological function of each module, one can obtain a functional overview of the whole network. Network decomposition is often a computationally expensive process. We have developed a new fast decomposition method and made it available here. Furthermore, the decomposition method can generate partitions with different numbers of modules rather than just one optimal partition and thus offer the users more flexibility.

Another way for functional analysis is to decompose the network into several small functionally somehow independent modules. By visually or statistically examining the biological function of each module, one can obtain a functional overview of the whole network. Network decomposition is often a computationally expensive process. We have developed a new fast decomposition method and made it available here. Furthermore, the decomposition method can generate partitions with different numbers of modules rather than just one optimal partition and thus offer the users more flexibility.