Release Home page ProDom, the Protein Domain Database
Main form Release informations Documentation The ProDom team Support
Help Data files format History References How to link Usefull tools Contact Site map
 

The ProDom Help File

If you find bugs, wish to make comments or suggestions about ProDom please send a message to Aurelie Laugraud

Help on available Web Services can be found there : http://prodom.prabi.fr/prodom/current/html/webservices.html

ProDom is a protein domain family database constructed automatically by clustering homologous segments. The ProDom building procedure MKDOM2 is based on recursive PSI-BLAST searches [ALTS2]. The source protein sequences are non-fragmentary sequences derived from UniProtKB (Swiss-Prot and TrEMBL databases). ProDom was first established in 1993 [SONN] and maintained by the Laboratoire de Génétique Cellulaire and the Laboratoire de Interactions Plantes-Microorganismes (INRA/CNRS) in Toulouse. It is now maintained by the PRABI (bioinformatics center of Rhone-Alpes). The ProDom database consists of domain family entries. Each entry provides a multiple sequence alignment of homologous domains and a family consensus sequence.

 

The Main form

The main form is separated in two or three parts:
  1. ProDom Browsing
  2. Compare your sequence with ProDom
  3. Search by Kingdom (ProDom-CG only)

ProDom Browsing

With this form, you may select one or several ProDom entries, using different search criteria:
  • Display a ProDom entry
    Type a ProDom AC number to display the corresponding entry
    Ex: PD000039 (or, shorter PD39)
  • All Proteins in ProDom families
    Type one or several ProDom AC to display:
    • The domain decomposition of proteins found in all those families (AND button selected)
    • The domain decomposition of proteins found in One or several of those families (OR button selected)
      Ex: PD39 PD309
  • Search by related databases
    Type one or several ID, AC, entry name belonging to a cross-referred database (ie interpro,pfamA,prosite,PDB), you'll get the list of ProDom families who have a link to those entries.
    Ex:
    1. kringle: All the ProDom families who have a link to any related database entry with kringle as name
    2. interpro kringle: this restricts the previous request to the interpro entries
    3. 1aac: all the ProDom families who have a link to this PDB protein.
    4. IPR000001: all the ProDom families linked to the IPR000001 entry of interpro.
    5. PS00010: all the ProDom families linked to the IPR000001 entry of prosite.
    6. PF00034: all the ProDom families linked to the PF00034 entry of pfamA.
  • UniProtKB entry
    Type one or several UniProtKB ID or AC to retrieve the domain decomposition of those proteins:
    1. fixj: all the FIXJ proteins
    2. mouse: all the proteins from the MOUSE organism
    3. UFO_HUMAN: only the UFO protein from the HUMAN (from little green men ?).
    4. P30530: The same request as before, but with a UniProtKB AC instead of an ID.
  • Keyword search
    This requests looks for the keyword(s) you typed inside the KW line of the database. You may type one or several keywords, they may be connected with AND or OR booleans.

Compare your sequence with ProDom

With this form, you may start a Blast-P or Blast-X search against:
  • The consensus sequence provided with the ProDom families
  • The multiple alignments provided with each ProDom family
The first search is faster, as it is less CPU intensive, but the second is more sensisitive.
We use the Blast program from NCBI.
The Results page shows:
  1. The results as a graphical representation: the HSP are superimposed, lower scores at bottom and higher scores at top: the lower scores may thus be hidden by the higher scores.
  2. A form to let you execute the program multalin to align your sequence with the HSP found
  3. If there are PDB links in the ProDom family retrieved, a form to let you execute Swiss-Model jobs on your sequence.
  4. If there is a PDB link with your sequence, a form is provided, letting you execute a geno3d job with your sequence.
  5. The results, as a textual representation: be careful, however, the original output is filtered to yield non-redundant similarities

Search by Kingdom (ProDom CG only)

When using ProDom-CG, we may search the database with kingdom relative criteria: for each kingdom (archae, bacteria, eukaryotic), you may select:
  • None: Only the families without any protein from an organism belonging to this kingdom.
  • Some: Only the families with some (at least one) proteins from one of several organisms belonging to this kingdom.
  • Maybe:This kingdom is not taken into consideration for the search
  • All: Only the families with at least one protein from every organism of this kingdom which is present in ProDom-CG.
 

The ProDom entry (upper frame)

 

what do all those icons mean ?

This motif is the graphic representation of this family. Several families have a motif representation, consistent throughout the whole database.
Graphic representation of all proteins in this domain, with the decomposition in domains
The list of ProDom families which are related to the current family. "Related" means that there are far homologies between them.
This button is not drawn if the family is not related to any other one, or for the families with poor alignment or homogeneity (norMD values low).
If no pdb links are found for this family or its related families, and if this family satisfies several quality checks, , then this family could be a good candidate for structural determination. If present, click the button for more infos.
When in ProDom, use this button to access the corresponding ProDom-CG family, if it exists.
When in ProDom-CG, use it to access the corresponding ProDom family, which should always exist.
To retrieve the ProDom family in MSF format.
To retrieve the ProDom family in Fasta format.
To compute a profile, using psi-blast against this family.
Warning !!! You should retrieve the binary file, unfortunately this file will not work on every architecture. This functionality is still experimental.
The normd value is computed for every ProDom family. If this stamp is displayed here, the normd may be considered as "high" (> 0.4), meaning the alignment is of "good quality".
To access the Predict Protein server, through a pre-filled form.
Fill ESPript with this family, to print a high quality representation of this family.
Fill STRAP with this family, to see the alignment and phylogenetic tree of this family based on structure.
This frame may be printed: just press this button.

The consistency indicators

  • Distances are counted in PAM (percent accepted mutations = number of accepted point mutations per hundred residues).
    For example 20 PAM correspond to 82% identity.
  • The DIAMETER is the largest distance between 2 domains in the family.
  • RADIUS OF GYRATION is the root mean distance between the consensus and all members of the family
    The Smaller these two values, the most homogeneous the family.

Gene Ontology Links

  • We tried to compute some Gene Ontology annotation for as much as ProDom families as possible.
  • The data displayed are:
    • The entry name
    • The ontology: F for Molecular Function, P for Biological Process, C for Cellular Component
    • The precision: from 0 to 1, the higher, the more precise the term is inside the Gene Ontology graph.
    • The probability of assignment

InterPro Links

  • The links between InterPro and ProDom were computed using the MatchDom program (search for overlaps between InterPro and ProDom domains).
  • We scanned all InterPro domain families (release 18.0) with each ProDom family.

pfam-A Links

  • Links between Pfam-A and ProDom were computed using the MatchDom program (search for overlaps between Pfam-A and ProDom domains).
  • We scanned all Pfam-A domains with each ProDom family.

PROSITE Links

  • Links between PROSITE and ProDom were computed using the LASSAP program.
  • We scanned ProDom consensus sequences with PROSITE patterns, excluding the most frequent patterns.
  • We also scanned ProDom consensus sequences with PROSITE profiles, using the pfsearch program.

PDB Links

  • A fasta file is generated from the current release of the pdb, using the ATOMS lines (NOT SEQRES).
  • Links between the PDB and PRODOM were then computed using the LASSAP program.
 

The ProDom entry (lower frame)

what is the use of those tools ?

The family, or the current subfamily, will be represented as a tree, using the displayFam program, if you press this button.
Develop the whole family, or only develop a subfamily (a cluster of domains)
Please note you can adjust parameters for the clustering process with the form found under the alignment
You are lost inside a very deep subfamily ? Press this button to recover the start default display.

The ProDom-CG evolutionary scenario (upper frame, ProDom-CG only)

A most probable evolutionary scenario is proposed for every non unique protein domain family. It is computed using a bayesian network algorithm, and displayed in superposed to the species tree. Nodes are red colored when the protein domain is present.
 

Requests producing a list of proteins

what is the use of those tools ?

The simplified output display mode (default) consists in representing a line per each protein architecture, not a line per protein. Click this button to enter the complete output mode, in which you represent a line for each protein.
When in complete output display, press this button to enter the simplified output display.
When in simplified output display, press one of those buttons (they are drawn near some architecture representations) to open the "same architecture" window.
Press this button to display the list of proteins related to the protein displayed; related means here that they share at least a prodom domain.
FIXJ_AZOCA The name of the protein is a link, follow it to go to the corresponding entry of the UniProtKB server.
Protein or family lists may be very long. No more than 200 items will be displayed together. Press this button to display the 200 first items of the list.
Shift the displayed list by 200. There is no overlap between the current and the new displays. Convenient to rapidly scan the retrieved proteins of families.
Shift the displayed list by 50. There is a big overlap thetween the current and the new displays. Convenient to carefully compare the lines.
|New Window| Open a new window, exactly the same as the current window. Combine this with the shift keys to compare far objects in the list
|Close| Close this window.
 

The same architecture screen

This screen presents, for each architecture shared by several proteins, the list of proteins sharing this architecture.
The list of protein Ids is in fact a list of links: You may click every name to go to the expasy server.
 

Requests producing a list of families

what is the use of those tools ?

Press this button to display the list of proteins containing a domain of this family.
Press this button to display this family prodom entry.
The logo of this family, if any.
PD001842 The accession number of this family.
64 The effectif (number of domains) of this family. The accession number of the ProDom family retrieved.
The "Normd quality" logo, if the normd of this family is above 0.4.
Please note that the logo will never be printed for mailies whose effectif is lower than 3, as this makes no sens computing the normd in this case.
Protein or family lists may be very long. No more than 200 items will be displayed together. Press this button to display the 200 first items of the list.
Shift the displayed list by 200. There is no overlap between the current and the new displays. Convenient to rapidly scan the retrieved proteins of families.
Shift the displayed list by 50. There is a big overlap thetween the current and the new displays. Convenient to carefully compare the lines.
|New Window| Open a new window, exactly the same as the current window. Combine this with the shift keys to compare far objects in the list
|Close| Close this window.
 

Known bugs

Something wrong ? not clear ? found a bug ? Please press here and send us a mail!!!
  Due to some problems in the ProDom build process, some families or some proteins may have disappeared from this release. We apologize for any inconvenience.
The biggest families in ProDom will not be correctly printed in ESPript
The biggest families in ProDom will not be correctly sent in MSF format
You should retrieve only binary profile files, as texte profile files are not really interesting. However, those files will not work with any computer architecture.
The normd value could not be computed for every family. If there is no value displayed, it means that we could not compute this value.
  Some very looooooooooooooooooooong proteins (see Q8WZ42_HUMAN for instance) are not correctly displayed by any browser
  This Graphical User Interface was tested with following browsers:
  • Netscape 4.7x
  • Mozilla 1.x and others (Netscape 7.x, Mozilla Firefox)
  • Opera 6.x
  • Internet Explorer 5.5
Please note that your browser must be Javascript and CSS enabled, and must be able to display png graphics. Thus, older versions of those browsers are not supported.

© The ProDom database is copyrighted by INRA and CNRS
© UniProtKB copyright (c) 2002-2011 UniProt Consortium
ProDom - Server maintained by Dominique Guyot , on behalf of the ProDom team
Graphics design Sandrine Dalmar
Last updated on December 22nd, 2011.