Aller au contenu Aller au menu Aller à la recherche

accès rapides, services personnalisés
Institut de minéralogie, de physique des matériaux et de cosmochimie
UMR 7590 - Sorbonne Université/CNRS/IRD/MNHN

Hydrophobic Cluster Analysis (HCA)

Hydrophobic Cluster Analysis (HCA) has been developed in the late 80’s, under Jean-Paul Mornon’s inspiration (1, 2, 3). It gives information about protein regular secondary structures from the only information of a single amino acid sequence (thus without the knowledge of homologous sequence) (4,5).

HCA is well adapted to predict foldable domains (i.e. regions with a high density in hydrophobic clusters, mainly corresponding to the regular secondary structures) (6,7) andto highlight the structural invariants of protein folds. It is particularly useful for deciphering information within «orphan » sequences and identifying remote relationships (8,9).

HCA (or SEG-HCA) has been implemented in a few other tools, in particular for predicting order and disorder (Medor (10), FELSS (11)). A new version of SEG-HCA (6) and TREMOLO-HCA (8) was developed (pyHCA (12)), also introducing HCA-scores to assess the foldability of sequence segments. HCA plots can be drawn at IMPMC and RPBS.


HCA considers the simple dichotomy between hydrophobic and non-hydrophobic amino acids, using a two-dimensional representation of the protein sequence (Figure 1).  From the 1D amino acid sequence (panel A), a 2D plot is created (panel D) by writing the amino acid sequence along an α-helix (panel B) and cutting it along its horizontal axis. The helix forms in a two-dimensional space a plane (panel C) on which every line of amino acids corresponds to a helix turn. The plane is duplicated and the hydrophobic clusters are defined by joining contiguous strong hydrophobic amino acids (V,I,L,F,M,Y,W). Hydrophobic clusters mainly correspond to the regular secondary structures, with a shape generally indicative of the alpha (horizontal) of beta (vertical) state (1,2,3). The alpha-helical net (connectivity distance (minimal distance between two hydrophobic clusters) = 4) and the (V,I,L,F,M,Y,W) alphabet provide the best correspondence between clusters and regular secondary structures (4).


©  T Bitard-Feildel


Principle of the HCA plot

Panel A, the protein sequence (1D), in which hydrophobic amino acids are represented as white letters, is written on an α-helix, displayed on a cylinder (panel B). This one is cut along the horizontal axis and unrolled, in order to
get the full environment of each amino acid, as it exists on the 1D sequence (panel C). Strong hydrophobic amino acids (V, I, L, F, M, Y, W) are encircled and their contours are joined (panel D), forming clusters. Horizontal and vertical
clusters are mainly associated with alpha helices and beta strands, respectively. Symbols are used to highlight amino acids with particular structural behavior  (star = P, black diamond = G, square = T, dotted square = S).


Principle of the method

Hydrophobic clusters can be described as constrained binary patterns (due to the consideration of the connectivity distance) and as such, carry a much more differentiated information about regular secondary structures than simple binary patterns (13). Some hydrophobic clusters have strong preferences for the alpha and beta states, as reported in an HCA dictionary (5,14). These are generally associated with a typical sequence periodicity of polar and non-polar amino acids. The leading role of the binary pattern periodicity in the formation of regular secondary structures was supported in a general way by breaking down the whole set of HCA hydrophobic clusters into basic units (called Quarks (11, 101, 1001, 10001)) (15).


Application examples

A lot of applications have been performed using HCA. Among them are the identification of novel families of domains (e.g. BRCT (16), TUDOR (17), LOTUS (18), BAH (19), BETA-CASP (20), RUN (21), FERM (22), dDENN/DENN/uDENN (23), REPULS (24), OCRE (25), ZP-N (26), EMI (27), HEBO (28)) and the identification of hidden relationships starting from orphan sequences (e.g. 29-39).



1 Gaboriaud C, Bissery V, Benchetrit T, Mornon JP. Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett. 1987 224:149-55.

2 Lemesle-Varloot L, Henrissat B, Gaboriaud C, Bissery V, Morgat A, Mornon JP. Hydrophobic cluster analysis: procedures to derive structural and functional information from 2-D-representation of protein sequences. Biochimie. 1990 72:555-74.


3 Callebaut I, Labesse G, Durand P, Poupon A, Canard L, Chomilier J, et al. Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives. Cell Mol Life Sci. 1997 53:621-45.

4 Woodcock S, Mornon JP, Henrissat B. Detection of secondary structure elements in proteins by hydrophobic cluster analysis. Protein Eng. 1992 5:629-35.

5 Eudes R, Le Tuan K, Delettre J, Mornon JP, Callebaut I. A generalized analysis of hydrophobic and loop clusters within globular protein sequences. BMC structural biology. 2007 7:2.

6 Faure G, Callebaut I. A comprehensive repertoire of foldable segments within genomes. PLoS Comput Biol. 2013 9: e1003280

7 Bitard-Feildel T, Callebaut I.Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep. 2017 Jan 30;7:41425. d

8 Faure G, Callebaut I. Identification of hidden relationships from the coupling of Hydrophobic Cluster Analysis and Domain Architecture information. Bioinformatics. 2013 29:1726-33.

9 Bitard-Feildel T, Heberlein M, Bornberg-Bauer E, Callebaut I. Detection of orphan domains in Drosophila using "hydrophobic cluster analysis". Biochimie. 2015 Dec;119:244-53.

10  Lieutaud P, Canard B, Longhi S. MeDor: a metaserver for predicting protein disorder. BMC Genomics 2008 9:S25.

11 Piovesan D, Walsh I, Minervini G, Tosatto SCE. FELLS: fast estimator of latent local structure. Bioinformatics. 2017 Jun 15;33(12):1889-1891.

12 Bitard-Feildel T, Callebaut I HCAtk and pyHCA: a toolkit and Python API for the Hydrophobic Cluster Analysis of protein sequences. Submitted.

13 Hennetin J, Le Tuan K, Canard L, Colloc'h, N, Mornon JP, Callebaut I. Non-intertwined binary patterns of hydrophobic/nonhydrophobic amino acids are considerably better markers of regular secondary structures than nonconstrained patterns. Proteins. 2003 51:236-44.

14 Lamiable A, Rebehmed J, Quintus F, Mornon JP, Callebaut I. DISCO: a topology-based investigation of protein interaction sites using Hydrophobic Cluster Analysis. In preparation.

15  Rebehmed J, Quintus F, Mornon JP, Callebaut I. The respective roles of polar/nonpolar binary patterns and amino acid composition in protein regular secondary structures explored exhaustively using hydrophobic cluster analysis. Proteins. 2016 84:624-38

16  Callebaut I, Mornon JP. From BRCA1 to RAP1: a widespread BRCT module closely associated with DNA repair. FEBS Lett. 1997 400:25-30.

17  Callebaut I, Mornon JP. The human EBNA-2 coactivator p100: multidomain organization and relationship to the staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development. Biochem J. 1997 32:125-32.

18 Callebaut I, Mornon JP. LOTUS, a new domain associated with small RNA pathways in the germline. Bioinformatics. 2010 26:1140-4.

19 Callebaut I, Courvalin JC, Mornon JP. The BAH (bromo-adjacent homology) domain: a link between DNA methylation, replication and transcriptional regulation. FEBS Lett. 1999 446:189-93.

20 Callebaut I, Moshous D, Mornon JP, de Villartay JP. Metallo-beta-lactamase fold within nucleic acids processing enzymes: the beta-CASP family. Nucleic Acids Res. 2002 30: 3592-601.

21  Callebaut I, de Gunzburg J, Goud B, Mornon JP. RUN domains: a new family of domains involved in Ras-like GTPase signaling. Trends Biochem Sci. 2001 26 :79-83.

22 Girault JA, Labesse G, Mornon JP, Callebaut I. The N-termini of FAK and JAKs contain divergent band 4.1 domains. Trends Biochem Sci. 1999 Feb;24(2):54-7

23  Levivier E, Goud B, Souchet M, Calmels TP, Mornon JP, Callebaut I. DENN, DENN, and dDENN: indissociable domains in Rab and MAP kinases signaling pathways. Biochem Biophys Res Commun. 2001 287:688-95.

24  Lescasse R, Pobiega S, Callebaut I, Marcand S. End-joining inhibition at telomeres requires the translocase and polySUMO-dependent ubiquitin ligase Uls1. EMBO J. 2013 Mar 20;32(6):805-15.

25 Callebaut I, Mornon JP. OCRE: a novel domain made of imperfect, aromatic-rich octamer repeats. Bioinformatics. 2005 21:699-702.

26  Callebaut I, Mornon JP, Monget P. Isolated ZP-N domains constitute the N-terminal extensions of Zona Pellucida proteins. Bioinformatics. 2007 23:1871-4

27 Callebaut I, Mignotte V, Souchet M, Mornon JP. EMI domains are widespread and reveal the probable orthologs of the Caenorhabditis elegans CED-1 protein. Biochem Biophys Res Commun. 2003 300:619-23.

28  Zhang S, Pondarre C, Pennarun G, Labussiere-Wallet H, Vera G, France B, Chansel M, Rouvet I, Revy P, Lopez B, Soulier J, Bertrand P, Callebaut I, de Villartay JP. A nonsense mutation in the DNA repair factor Hebo causes mild bone marrow failure and microcephaly. J Exp Med. 2016 213:1011-28.

29  Faure G, Revy P, Schertzer M, Londono-Vallejo A, Callebaut I. The C-terminal extension of human RTEL1, mutated in Hoyeraal-Hreidarsson syndrome, contains harmonin-N-like domains. Proteins. 2014 82:897-903.

30  Burgess A, Mornon JP, de Saint-Basile G, Callebaut I. A concanavalin A-like lectin domain in the CHS1/LYST protein, shared by members of the BEACH family. Bioinformatics. 2009 25:1219-22.

31 Callebaut I, Mornon JP.The PWAPA cassette: Intimate association of a PHD-like finger and a winged-helix domain in proteins included in histone-modifying complexes. Biochimie. 2012 94:2006-12.

32  Wojcik J, Girault JA, Labesse G, Chomilier J, Mornon JP, Callebaut I. Sequence analysis identifies a ras-associating (RA)-like domain in the N-termini of band 4.1/JEF domains and in the Grb7/10/14 adapter family. Biochem Biophys Res Commun. 1999 259:113-20.

33 Callebaut I, Malivert L, Fischer A, Mornon JP, Revy P, de Villartay JP. Cernunnos interacts with the XRCC4 x DNA-ligase IV complex and is homologous to the yeast nonhomologous end-joining factor Nej1. J Biol Chem. 2006 281:13857-60.

34  Ye Q, Callebaut I, Pezhman A, Courvalin JC, Worman HJ. Domain-specific interactions of human HP1-type chromodomain proteins and inner nuclear membrane protein LBR. J Biol Chem. 1997 Jun 6;272(23):14983-9.

35  Sallon C, Callebaut I, Boulay I, Fontaine J, Logeart-Avramoglou D, Henriquet C, Pugnière M, Cayla X, Monget P, Harichaux G, Labas V, Canepa S, Taragnat C. Thrombospondin-1 (TSP-1), a new bone morphogenetic protein-2 and -4 (BMP-2/4) antagonist identified in pituitary cells. J Biol Chem. 2017 292:15352-15368.

36 Callebaut I, Mornon JP. The V(D)J recombination activating protein RAG2 consists of a six-bladed propeller and a PHD fingerlike domain, as revealed by sequence analysis. Cell Mol Life Sci. 1998 54:880-91.

37  Calmels TP, Callebaut I, Léger I, Durand P, Bril A, Mornon JP, Souchet M. Sequence and 3D structural relationships between mammalian Ras- and Rho-specific GTPase-activating proteins (GAPs): the cradle fold. FEBS Lett. 1998 426:205-11.

38  Thoreau E, Petridou B, Kelly PA, Djiane J, Mornon JP. Structural symmetry of the extracellular domain of the cytokine/growth hormone/prolactin receptor family and interferon receptors revealed by hydrophobic cluster analysis. FEBS Lett. 1991 282:26-31.

39 Rebehmed J, Revy P, Faure G, de Villartay JP, Callebaut I. Expanding the SRI domain family: a common scaffold for binding the phosphorylated C-terminal domain of RNA polymerase II. FEBS Lett. 2014 Nov 28;588(23):4431-7.




Traductions :

    Un piège bactérien à strontium et baryum

    Les cyanobactéries sont des bactéries ubiquistes, capables d’utiliser la lumière pour synthétiser des molécules organiques à partir de dioxyde de carbone et d’eau tout en libérant du dioxygène. Apparues à la surface de la Terre il y a plus de 2,5 milliards d’années, elles ont irrémédiablement transformé...

    » Lire la suite


    Guillaume Fiquet (Guillaume.Fiquet @

    Directeur de l'institut

    33 +1 44 27 52 17


    Nalini Loret (Nalini.Loret @

    Attachée de direction

    33 +1 44 27 52 17


    Evancia Mahambou (evancia.mahambou @

    Gestion du personnel

    33 +1 44 27 74 99


    Danielle Raddas (cecile.duflot @

    Gestion financière

    33 +1 44 27 56 82


    Cécile Duflot (cecile.duflot @

    Chargée de communication

    33 +1 44 27 46 86


    Expertise de météorites


    Expertise de matériaux et minéraux


    Stages d'observation pour les 3e et les Seconde : (feriel.skouri-panet @


    Adresse postale

    Institut de minéralogie, de physique des matériaux et de cosmochimie - UMR 7590

    Sorbonne Université - 4, place Jussieu - BC 115 - 75252 Paris Cedex 5


    Adresse physique

    Institut de minéralogie, de physique des matériaux et de cosmochimie - UMR 7590 - Sorbonne Université - 4, place Jussieu - Tour 23 - Barre 22-23, 4e étage - 75252 Paris Cedex 5


    Adresse de livraison

    Accès : 7 quai Saint Bernard - 75005 Paris, Tour 22.

    Contact : Antonella Intili : Barre 22-23, 4e étage, pièce 420, 33 +1 44 27 25 61



    Fax : 33 +1 44 27 51 52

    L'IMPMC en chiffres

    L'IMPMC compte environ 195 personnes dont :


    • 40 chercheurs CNRS
    • 46 enseignants-chercheurs
    • 19 ITA CNRS
    • 15 ITA non CNRS
    • 50 doctorants
    • 13 post-doctorants
    • 12 bénévoles


     Chiffres : janvier 2016