user: GUEST
width: 600

The purpose of JACOP is the automated classification of a set of protein sequences.

In contrast with MSA-based/phylogeny approach, JACOP does not require that sequences are arranged as a " meaningful " multiple sequence alignment. JACOP is especially suited for modular proteins.
In addition to a possible classification, JACOP also provides diagnostic clues about the different regions of each sequence in respect to the whole classification (send matches to Catalogue factory).

General advices
Interpretation So-called "independent groups" are defined such as no homology is detected for two sequences that belong to two different groups.

Within an independent group, the sequences are further partitioned into sub-groups using the PAM (Partitioning Around Medoids) method. The "silhouette coefficient" is used as an indicator of the "quality" of the clustering.
  • If silhouette coefficient is close to 1, it means that the sequence is assigned to a very appropriate cluster.
  • If silhouette coefficient is about 0, it means that the sequence could be assigned to another cluster as well, and the sequence lies equally far away from both clusters. The next best cluster is also given.

The overall average silhouette width for the entire plot is simply the average of the silhouette coefficient for all objects in the whole dataset.

In addition a hierarchical representation (i.e. a tree) of the sequences is also provided to complete the picture even though the classification implied by this dendrogram is less robust than the one produced by the PAM method.

Within each independent group, the optimal partitioning is searched for using the PAM method. The minimal and maximal numbers of clusters to evaluate are given according to the vertical gray lines on the tree picture.
JACOP default behavior, gray lines

When the number of sequences is too large in an independent group (i.e. more than 200 sequences), the PAM method is not performed. The clusters are obtained by cutting the tree. This partition is less robust and reproducible. The cluster number is evaluated with a single value, chosen by users on the query screen (default is 0.50). A vertical red line highlights this value on the tree picture.
JACOP behavior for large independent groups, red line
JACOP: a simple and robust method for the automated classification of protein sequences with modular architecture.
Sperisen P, Pagni M.
BMC Bioinformatics. 2005 Aug; 6:216.   [RIS]