Description and Usage of Proteomic Tools Available Online

Note: The documentation and references for the algorithms and methods used are usually listed on the webpage

Useful information in using the tools listed

Comparison of BLAST and Smith/Waterman - compares the BLAST and Smith/Waterman similarity search algorithms and when to use each

E-Value - explanation of scores used in Pfam and other databases, and how it applies to standard or fragment searches

NScores - explanation of normalized matched scores

SWISS-PROT list of species - for when you need to enter the abbreviation of the species as a search field

SWISS-PROT list of keywords - for when you need to find the keyword that best fits your search

Protein identification and characterization

AAComIdent - a tool which allows identification of a protein from its amino acid composition

AAComSim - a tool which allows the comparison of the amino acid composition of a SWISS-PROT entry with all other SWISS-PROT entries to find the proteins whose amino acid compositions are closest

CombSearch - an experimental unified interface to query several protein identification tools accessible on the web

FindMod - predict potential protein post-translational modifications and potential single amino acid substitutions in peptides

FindPept - identify peptides that result from unspecific cleavage of proteins from their experimental masses

GlycoMod - predict possible oligosaccharide structures that occur on proteins from their experimentally determined masses

GlycanMass - calculates the mass of an oligosaccharide structure

MultiIdent - a tool that allows the identification of proteins using pI, MW, amino acid composition, sequence tag and peptide mass fingerprinting data. One or more species and a SWISS-PROT keyword can also be specified for the search.

PepMAPPER, Mascot, PepSea, PeptideSearch - various peptide mass fingering tools from UMIST, UK; Matrix Science Ltd., London; Protana, Denmark; and EMBL, Heidelberg respectively

PeptIdent - identify proteins with peptide mass fingerprinting data, experimentally measured pi and Mw

PeptideMass - calculate masses of peptides and their post-translational modifications for a SWISS-PROT entry or for a user sequence

TagIdent - identify proteins with pi, Mw and sequence tag, or generate a list of proteins close to a given pi and Mw

DNA to Protein

Backtranslation - translates a protein sequence back to a nucleotide sequence

FSED - frameshift error detection

Genewise - compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors

Protein Machine, MBS translator - various other nucleotide to protein translators from EBI, and MBS respectively

Translate - translates a nucleotide sequence to a protein sequence

Similarity searches

BLAST and WU-BLAST - interfaces to various versions of the Basic Local Alignment Search Tool

Bic ultra-fast rigorous (Smith/Waterman) - similarity searches using the Bioccelerator [At EBI]

Fasta3 - similarity search using FASTA version 3 at the EBI

FDF - Smith/Waterman type searches on Paracel's Fast Data Finder (FDF) at EMBnet-CH

PropSearch - searches for structural homologs using a 'properties' approach

SAMBA - Systolic Accelerator for Molecular Biological Applications (S/W search of Swiss-Prot)

SAWTED - Structure Assignment With Text Description

Scanps - similarity searches using Barton's algorithm

Pattern and Profile Searches

InterPro Scan - integrated profile search (functional sites and domains) in PROSITE, Pfam, PRINTS, SMART, ProDom, etc.

FPAT - regular expression searches in protein databases

Frame-ProfileScan - scans a short DNA sequence against protein profile databases (including PROSITE)

Hits - protein sequences and motifs site with querying tools to search the Hits database for domains and motifs

Pfam HMM Search - scans a sequence against the Pfam protein families database

PATTINPROT - scans a protein sequence or a protein database for one or several pattern(s) at PBIL

PPSearch - scans a sequence against PROSITE (allows a graphical output) at EBI

PRATT - interactively generates conserved patterns from a series of unaligned proteins

ProfileScan - scans a sequence against protein profile databases (including PROSITE)

PROSITE Scan - scans a sequence against PROSITE (allows mismatches) at PBIL

ScanProsite - scans a sequence against PROSITE or a pattern against SWISS-PROT and TrEMBL

SMART - Simple Modular Architecture Research Tool at EMBL, domain and architecture analysis

TEIRESIAS - generate patterns from a collection of unaligned protein or DNA sequences at IBM

Post-translational modification prediction

big-PI Predictor - GPI Modification Site Prediction

ChloroP - prediction of chloroplast transit peptides

MITOPROT - prediction of mitochondrial targeting sequences

NetOGlyc - prediction of type O-glycosylation sites in mammalian proteins

NetPhos - prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins

NetPicoRNA - prediction of protease cleavage sites in picornaviral proteins

Predotar - prediction of mitochondrial and plastid targeting sequences

PSORT - prediction of protein sorting signals and localization sites

SignalP - prediction of signal peptide cleavage sites

Primary Structure Analysis

Colorseq - tool to highlight (in red) a selected set of residues in a protein sequence

Coils - prediction of coiled coil regions in proteins (Lupas's method) at EMBnet-CH [Also available at PBIL]

Compute pi/Mw - compute the theoretical pi and Mw from a SWISS-PROT or TrEMBL entry or for a user sequence

drawhca - draw an HCA (Hydrophobic Cluster Analysis) plot of a protein sequence

HelixWheel / HelixDraw - representations of a protein fragment as a helical wheel

HLA_Bind - prediction of MHC type I (HLA) peptide binding

Multicoil - prediction of two- and three-stranded coiled coils

Paircoil - prediction of coiled coil regions in proteins (Berger's method)

PEST - identification of PEST regions, proteins with intracellular half-lives of less than two hours

ProtParam - physico-chemical parameters of a protein sequence (amino-acid and atomic compositions, pi, extinction coefficient, etc.)

ProtScale - amino acid scale representation (Hydrophobicity, other conformational parameters, etc.)

RandSeq - random protein sequence generator

REP - searches a protein sequence for a repeats

Secondary Structure Prediction

GOR IV (Garnier et al, 1996)

HNN - Hierarchical Neural Network method (Guermeur, 1997)

Jpred - a consensus method for protein secondary structure prediction at EBI

nnPredict - University of California at San Francisco (UCSF)

Predator - protein secondary structure prediction from single or multiple sequences at EMBL (Argos' group)

Prof - Cascaded Mutiple Classifiers for Secondary Structure Prediction

PSA - BioMolecular Engineering Research Center (BMERC) / Boston

PSIpred - various protein structure prediction methods at Brunel University

SOPMA (Geourjon and Deléage, 1995)

Tertiary Structure

3D-PSSM - protein fold recognition using 1D and 3D sequence profiles coupled with secondary structure information (Foldfit)

CPHmodels - automated neural-network based protein modelling server

Geno3d - automatic modelling of protein three-dimensional structure

SWEET - constructing 3D models of saccharids from their sequences

SWISS-MODEL - an automated knowledge-based protein modelling server

Swiss-PdbViewer - a program to display, analyse and superimpose protein 3D structures, works together with SWISS-MODEL

Transmembrane regions detection

DAS - prediction of transmembrane regions in prokaryotes using the Dense Alignment Surface method (Stockholm University)

HMMTOP - prediction of transmembrane helices and topology of proteins (Hungarian Academy of Sciences)

PredictProtein - prediction of transmembrane helix location and topology (Columbia University)

TMAP - transmembrane detection based on multiple sequence alignment (Karolinska Institut; Sweden)

TMHMM - prediction of transmembrane helices in proteins (CBS; Denmark)

TMpred - prediction of transmembrane regions and protein orientation (EMBnet-CH)

TopPred2 - topology prediction of membrane proteins (Stockholm University)

Sequence alignment

Binary

LALIGN - finds multiple matching subsegments in two sequences

SIM + LALNVIEW - alignment of two protein sequences with SIM, results can be viewed with LALNVIEW

Multiple

ALIGN - multiple sequence alignment at Genestream (IGH)

AMAS - Analyse Multiply Aligned Sequences

Bork's alignment tools - various tools to enhance the results of multiple alignments (including consensus building).

CINEMA - Color Interactive Editor for Multiple Alignments

CLUSTALW - multiple sequence alignment [at EBI, PBIL, EMBnet-CH or at MBS (MBSALIGNER)]

DIALIGN - multiple sequence alignment based on segment-to-segment comparison, at University of Bielefeld, Germany

ESPript - tool to print a multiple alignment

Match-Box - multiple sequence alignment at University of Namur, Belgium

MSA - multiple sequence alignment at Washington University

Multalin - multiple sequence alignment [At INRA or at PBIL]

MUSCA - multiple sequence alignment using pattern discovery, at IBM

plogo - sequence logos at CBS/Denmark

T-Coffee - multiple sequence alignment [At EMBnet Switzerland or at GPCR]

WebLogo - sequence logos at Cambridge/UK

Other

Boehringer Mannheim "Biochemical Pathways" - digitized version of the Boehringer Mannheim "Biochemical Pathways"

AACompIdent
  • searches the SWISS-PROT database for proteins which is closest in amino acid composition
  • enter protein sequence, protein identifier, the pI and Mw of that protein, if known, error ranges, and the species or group of species for which you would like to perform the search
  • the keyword for which you would like to perform the search (ex. ZINC-FINGER) producing a list of proteins matching this keyword (or ALL may be specified)
  • calibration protein, a known protein obtained in the same run as the amino acid composition of the unknown protein; if you do not have a calibration protein, leave NULL
  • the SWISS-PROT identifier (ID) of the calibration protein (example: ALBU_HUMAN)
  • e-mail address, the search results will be mailed back to you (this should take about 15 minutes).
AAComSim
  • the comparison of the amino acid composition of a SWISS-PROT entry with all other SWISS-PROT entries so as to find the proteins whose amino acid compositions are closest to that of the selected entry
  • enter the SWISS-PROT identifier, your email address, and SWISS-PROT abbreviation for the species for which you would like to perform you search
  • use the amino acid constellations provided to start your search
MultiIdent
  • the identification of proteins using pI, MW, amino acid composition, sequence tag and peptide mass fingerprinting data
  • one or more species and a SWISS-PROT keyword can also be specified for the search
  • enter the protein sequence, name for this protein, the pI and Mw of that protein, if known, and error ranges
  • enter the species or group of species for which you would like to perform the search producing a list of proteins from this species, as well as a list of proteins independently of species. You may also just specify ALL
  • keyword for which you would like to perform the search (ex. ZINC-FINGER) producing a list of proteins matching this keyword. You may also just specify ALL.
  • calibration protein, a known protein obtained in the same run as the amino acid composition of the unknown protein; if you do not have a calibration protein, leave NULL
  • the SWISS-PROT identifier (ID) of the calibration protein (example: ALBU_HUMAN)
  • set of experimentally determined peptide masses corresponding to the unknown protein.
  • e-mail address, the search results will be mailed back to you (this should take about 15 minutes).
PeptIdent
  • the identification of proteins using pI, Mw and peptide mass fingerprinting data
  • user-specified peptide masses are compared with the theoretical peptides calculated for all proteins in SWISS-PROT making extensive use of database annotations
  • when calculating the theoretical peptides, signal sequences and/or propeptides are removed before computing pI, Mw and peptide masses for each of the resulting chains
  • takes into account post-translational modifications and alternative splicing events
  • results are displayed on-line in your browser window or can be sent by email, in form of a html table (the email is recommended for queries with many peptide masses, large pI/Mw windows or all species
  • the result file contains direct links to FindMod, GlycoMod and FindPept to further characterize matching proteins by predicting potential protein post-translational modifications and finding potential single amino acid substitutions or non-specific cleavage, and to PeptideMass, it also has a link to the BioGraph tool which allows to graphically represent the results of the PeptIdent query
  • enter the pI, molecular weight, species, and peptide masses
TagIdent
  • generates a list of proteins close to a given pI and Mw
  • identifies proteins by matching a short sequence tag of up to 6 amino acids against proteins in the SWISS-PROT databases close to a given pI and Mw
  • the identification of proteins by their mass, if this mass has been determined by mass spectrometric techniques
  • enter the pI, molecular weight, species or group of species, and/or a keyword to restrict your search
FindMod
  • predicts potential protein post-translational modifications (PTM) and find potential single amino acid substitutions in peptides
  • experimentally measured peptide masses are compared with the theoretical peptides calculated from a specified SWISS-PROT entry or from a user-entered sequence
  • mass differences are used to better characterize the protein of interest
  • enter the protein sequence or SWISS-PROT ID, as well as peptide masses

GlycoMod

 

  • predicts the possible oligosaccharide structures that occur on proteins from their experimentally determined masses
  • compares the mass of the glycan to a list of pre-computed masses of glycan compositions
  • can be used for free or derivatized oligosaccharides and for glycopeptides
  • fill out the fields on the page to do a query, each section has a link explaining what each field wants
GlycanMass
  • calculates the mass of an oligosaccharide structure
  • specify monosaccharide composition
FindPept
  • identify peptides that result from unspecific cleavage of proteins from their experimental masses
  • takes into account artefactual chemical modifications, post-translational modifications (PTM) and protease autolytic cleavage
  • if you wish to take into account only specific cleavage, please use FindMod instead
  • experimentally measured peptide masses are compared with the theoretical peptides calculated from a specified SWISS-PROT entry or from a user-entered sequence
  • if autolysis is to be taken into account, an enzyme entry must be specified from the drop-down list of enzymes for which the sequence is known
  • enter protein sequence and the peptide masses of the protein as well as any other post-translational modifications
PeptideMass
  • cleaves one or more protein sequences from the SWISS-PROT or a user-entered protein sequence with a chosen enzyme, and computes the masses of the generated peptides
  • returns theoretical isoelectric point and mass values for the proteins of interest
  • can return the mass of peptides known to carry posttranslational modifications, and can highlight peptides whose masses may be affected by database conflicts, isoforms or splicing variants.
  • enter protein sequence or a SWISS-PROT protein ID, and an enzyme for cleavage
CombSearch
  • queries several protein identification tools available on the net
  • includes peptide mass fingerprinting, amino acid composition, and tagging
  • fill out the fields that apply to your search
Translate
  • translates nucleotide to protein sequence
  • enter the DNA/RNA sequence
  • you can adapt the genetic code if your organism codes it differently
  • gives a verbose or compact option for output
  • the output tab contains the backtranslated sequence
Backtranslation
  • translates protein sequences to nucleotide sequences
  • allow the applet to load
  • enter the protein sequence
  • you can select, import, or edit the codon usage tables (CUT)
Genewise
  • compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors
  • for DNA sequences greater than 6Kb you have to use the email return; less than 6Kb you can do it interactively
  • for DNA sequences greater than 80Kb there seems to be a problem somewhere between the browser and the web server, and this can't be processed
  • enter the DNA and protein sequence
  • there are options for the alignment output and gene prediction output (gene structure, translation, cDNA...)
  • the advanced version of this form gives more options for gene prediction output, and allows you to specify organism, intronic bias, splice site, null (random) model, and the algorithm
  • the advanced form is here, and gene wise algorithm documentaion is here
FSED
  • frame shift error detection
  • suited to the analysis of newly determined sequences before their submission to the databases, the potential frameshift errors being readily resolved by examination of raw data such as gel readings
  • you need to enter the sequence, ORF description (optional), parameters of the output, and the factor file

BLAST

  • blastp - compares an amino acid query sequence against a protein sequence database
  • blastn - compares a nucleotide query sequence against a nucleotide sequence database
  • blastx - compares a nucleotide query sequence translated in all reading frames against a protein sequence database
  • tblastn - compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
  • tblastx - compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that tblastx is extremely slow and cpu-intensive
  • enter the protein/DNA sequence or ID or accession number depending on the type of search, amino acid or nucleotide
  • options on the type of comparison matrix, and gap extensions and openings
  • each of the sites on the left, link to different interfaces and versions, but all use BLAST to search
Bic_SW
  • enter DNA or Protein sequence
  • enter gap penalties, the comparison matrix, the number of alignments, whether its DNA or protein, normalization method if used, and output format and filtering
  • you also select the databases it searchs through
  • here is a good link to descriptions of each parameter
Fasta 3
  • fasta3 scan a protein or DNA sequence library for similar sequences
  • fastx/y3 compare a DNA sequence to a protein sequence database, comparing the translated DNA sequence in forward and reverse frames
  • tfastx/y3 compares a protein to a translated DNA data bank
  • fasts3 compares linked peptides to a protein databank
  • fastf3 compares mixed peptides to a protein databank
  • enter your DNA or Protein sequence, the gap penalties, the comparison matrix, the number of alignments, the databases it searches through
  • you can decide to receive the output by email or interactively
  • there is the option to display a histogram of search matches
  • you can decide which strand of DNA to search
  • other options are output formatting and filtering
  • here is a good link to descriptions of each parameter
FDF
  • swps - used to check for the presence in the database of protein sequences related to a protein query sequence
  • swx - used to check for the presence of a possible protein sequence encoded in an unknown or low-quality DNA query sequence
  • tswn - used to check for the presence of a DNA sequence or clone matching with a protein query sequence
  • specify protein sequence, protein database, comparison matrix, cutoff score, number of summaries, and number of alignments as well as output format

PropSearch

At EMBL
At Montpellier

  • searches for structural homologs using a 'properties' approach
  • finds the putative protein family if querying a new sequence has failed using alignment methods.
  • neglects the order of amino acid residues in a sequence, using the amino acid composition instead
  • molecular weight, content of bulky residues, content of small residues, average hydrophobicity, average charge a.s.o. and the content of selected dipeptide-groups are calculated from the sequence
  • 144 such properties are weighted individually and are used as query vector, the weights have been trained on a set of protein families with known structures, using a genetic algorithm
  • sequences in the database are transformed into vectors as well, and the euclidian distance between the query and database sequences is calculated
  • distances are rank ordered, and sequences with lowest distance are reported on top
  • enter your sequence
SAMBA
  • SAMBA is a 128 processor array for speeding up the comparison of biological sequences
  • the hardware implements a parameterized version of the Smith and Waterman algorithm allowing the computation of local or global alignments with or without gap penalty
  • enter the search title, protein sequence, comparison matrix and gap options
  • only searchs the SWISS-PROT database
SAWTED
  • a method to improve the coverage of the detection of remote homologues of known structure by sequence searches and fold recognition programs
  • return only hits with scores worse than an accepted threshold for reliability
  • compares what is known about the function of the query sequence with that known about the poor scoring hitsome hits, comparing the text of SWISS-PROT annotations related to the query and to the poor scoring hits
  • a single E-value is given for the user to assess the similarity of function
  • click on the "submit-sequence" link up top
  • you have to enter your email address, search title, give a description on the function of your protein, keywords for your protein, amino acid sequence, and number of iterations
Scanps
  • implements various flavours of dynamic programming algorithm such as the Smith-Waterman local alignment method
  • you need to enter the protein sequence, your email, search title, whether to get results via email or interactivly, number of iterations, database to search, comparison matrix, cutoff value (EValue), gap options, alignment options, and formatting of output
  • There are 2 modes available in this service for scanps
  • Simple - Scan protein sequence against protein sequence database with simple gap penalty. Default and fastest method.
  • Affine - Same as Simple, but with Affine gaps - i.e. penalties for opening and extending the gap.
InterPro Scan
  • queries your protein sequence against common signature databases
  • enter your sequence, either in the text box or as a file
  • enter an email address, whether you want the results interactively or through email, and whether you want the Smith & Waterman search included in your run
ScanProsite
  • scan a protein sequence (either from SWISS-PROT or TrEMBL or provided by the user) for the occurrence of patterns stored in the PROSITE database
  • scan the SWISS-PROT and TrEMBL databases (including weekly releases of SWISS-PROT) for the occurrence of a pattern that can originate from PROSITE or be provided by the user
  • choose whether to search for a protein or pattern and click on the appropriate link
  • enter the sequence or AC (accession number) for scanning a protein
  • enter the AC or entry name or your own pattern for scanning a pattern, there is a link on the page for the syntax for patterns
ProfileScan
  • uses the pfscan program to search a single protein sequence against currently available profile databases
  • searchable databases contain generalized profiles and allow the computation of normalized match score (NScores)
  • enter your sequence, choose the databases to search, sensitivity to matches, and how the results will be sorted
Frame-ProfileScan
  • uses the frame-search capabilities of pfscan to query the collection of prosite profiles with a single DNA sequence
  • The six reading frames of the DNA query are inspected, coding frameshifts in the DNA sequence are supported
  • since frame-tolerant searches consume lots of cpu-time, DNA sequence length is limited to about. 2400 bases
  • enter your sequence, choose the databases to search, sensitivity to matches, and how the results will be sorted

Pfam HMM Search

At Washington University (currently down)
At Sanger Centre

  • enter your sequence or AC (accession number)
  • select the sensitivity of your search
  • the type of search, your choice dependes on whether the domain is complete or a partial domain
  • you can choose to include the SMART and TIGR databases
  • choose the type of output and the priority for the graphical display
Pratt
  • allows the user to search for patterns conserved in a set of protein sequences
  • user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported
  • enter your sequence
  • options for pattern conservation, restrictions, number of pattern symbols, flexible spacers, etc.
  • options for format of output
SMART
  • allows the identification and annotation of genetically mobile domains and the analysis of domain architectures
  • domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues
  • for sequence analysis: enter your sequence, ID or AC (accession number), you can also find outlier homologues and homologues of known structure, PFAM domains, signal peptides, and internal repeats
  • for architecture analysis: search for proteins with combinations of specific domains in different species or taxonomic ranges
  • you can also search for domains in the database using a keyword search
TEIRESIAS
  • sequence pattern anaylsis
  • enter your sequence
  • in "options" the help file is here
  • in "parameters" the help file is here
  • in "equivalency sets" the help file is here
  • the help files pertain to all the programs on the site so look for the options that are available to this program
Hits
  • a database devoted to protein domains, also a collection of tools for the investigation of the relationships between protein sequences and motifs described on them
  • motifs are defined by an heterogeneous collection of predictors, which currently include regular expressions, generalized profiles and hidden Markov models
  • tools for querying and exploring the Hits database:
    • Query by protein produces a list of motifs present in one or several proteins
    • Query by motif produces a list of proteins that contain one or several motifs
    • "At least" query is another query by motif form that produces a list of proteins that share a minimal number of motifs
    • Pattern search using a user-supplied regular expression to search protein databases
    • Metamotif search looking for arrangements of motifs in protein databases
    • Blastp search to detect local similarity of sequences in protein databases
    • Relationships between motifs provides basic information about co-occurrence of motifs
    • Clusters of Identical Proteins deals with database redundancy as found in trEST and trGEN
    • Easy metamotif is meant to teach, step by step, how to compose a metamotif expression
    • To further explore query results, two platforms are provided. One can start from there as well
      • The protein hub
      • The motif hub.
    • To mine your own protein sequences, the SIB-Lausanne offers the following services
      • Motif scan in a protein sequence
      • The EMBnet server
      • The ISREC server
PSORT
  • prediction of protein localization sites in cells by applying the stored rules for various sequence features of known protein sorting signals
  • PSORT - for bacterial and plant sequences
  • PSORT II - for animal and yeast sequences
  • iPSORT - for detection of N-terminal sorting signals
  • choose the appropriate psort link
  • enter your sequence or AC (accession) number
  • choose the type of organism for which the sequence is for
SignalP
  • predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms
  • the form is at the bottom of the page
  • submit a sequence name and the sequence (submit only the N-terminal part of your protein, not more than 50-70 amino acids)
  • you can specify whether to use networks trained on sequences from gram-negative prokaryotes, networks trained on sequences from gram-positive prokaryotes, use networks trained on sequences from eukaryotes, or all three
ChloroP
  • predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites
  • input one or several sequences or submit a file
  • at most 50 sequences and 200,000 amino acids per submission; each sequence not more than 4,000 amino acids
MITOPROT
  • calculates the N-terminal protein region that can support a Mitochondrial Targeting Sequence and the cleavage site
  • enter the sequence
Predotar
  • prediction by identifying putative mitochondrial and plastid targeting sequences
  • enter the sequence
NetOGlyc
  • neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins
  • input one or several sequences or submit a file
  • at most 50 sequences and 70,000 amino acids per submission; each sequence not more than 4,000 amino acids
big-PI Predictor
  • GPI modification site predictor
  • select either metazoa or protozoa as the taxon, if neither apply try both in independent runs
  • for being GPI-lipid anchor modified, the protein has to enter the endoplasmic reticulum in eukaryotes. Please verify the biological context of your query protein, whether this condition is fulfilled in your case. Typically, the existance of a signal peptide leader is sufficient
  • enter your sequence
NetPhos
  • neural network predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins
  • enter your sequence
  • choose whether to predict on tyrosine, serine, threonine or all
  • at most 50 sequences and 200,000 amino acids per submission;
    each sequence not more than 4,000 amino acids
NetPicoRNA
  • neural network predictions of cleavage sites of picornaviral proteases
  • enter your sequence, minimum 9 or 15 depending on prediction type
  • types of prediction: (default is 2A and 3C-ER options)
    • prediction of 2Apro sites
    • prediction of 3Cpro sites (entero+rhino)
    • prediction of 3Cpro sites (aphtho)
    • prediction of autocatalytic site
ProtParam
  • a tool which allows the computation of various physical and chemical parameters for a given protein stored in SWISS-PROT or TrEMBL or for a user entered sequence
  • the computed parameters include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY)
  • enter the accession number (AC) or sequence ID or your own sequence
Compute pI/Mw
  • tool which allows the computation of the theoretical pI (isoelectric point) and Mw (molecular weight) for a list of SWISS-PROT and/or TrEMBL entries or for a user entered sequence
  • enter one or more SWISS-PROT protein identifiers (ID) or SWISS-PROT/TrEMBL accession numbers (AC)
  • alternatively, enter a protein sequence in single letter code
REP
  • search protein for collection of repeats
  • currently implemented repeat families are: Ankyrin, Armadillo, HAT, HEAT, HEAT_AAA, HEAT_ADB, HEAT_IMB, Kelch, Leucin Rich Repeats, PFTA, PFTB, RCC1, TPR, WD40
  • enter the sequence
Coils
  • compares a sequence to a database of known parallel two-stranded coiled-coils and derives a similarity score
  • comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation
  • you can change the output format
  • enter your sequence
Paircoil
  • predicts the location of coiled-coil regions in amino acid sequences
  • enter your sequence
  • you can choose the probablity cutoff for the search
Multicoil
  • predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric
  • enter the sequence and sequence name (optional)
PEST
  • proteins with intracellular half-lives of less than two hours are found to contain regions rich in proline, glutamic acid, serine and threonine (P, E, S and T)
  • these so called PEST regions are generally flanked by clusters of positively charged amino acids
  • identifies possible PEST regions in a submitted probe using the Molecular fraction of the P, E, S and T components, and the hydrophobicity index of the region
  • click the run PEST link and enter your sequence and the cutoffs and minimum size
HLA_Bind
  • rank potential 8-mer, 9-mer, or 10-mer peptides based on a predicted half-time of dissociation to HLA class I molecules
  • enter your sequence, choose the HLA molecule class, and cutoffs for half life and number of outputs
ProtScale
  • compute and represent the profile produced by any amino acid scale on a selected protein
  • an amino acid scale is defined by a numerical value assigned to each type of amino acid
  • the most frequently used scales are the hydrophobicity or hydrophilicity scales and the secondary structure conformational parameters scales, but many other scales exist which are based on different chemical and physical properties of the amino acids
  • this program provides 50 predefined scales entered from the literature
  • enter the accession number (AC) or protein ID or your sequence
  • choose the scale from the available list and the format of the output
drawhca
  • enables you to draw an HCA plot
  • you can upload the file of your amino acid sequence or you can paste the sequence by clicking on the appropriate link
Colorseq
  • tool to highlight selected regions of your protein sequence
  • enter your protein sequence
  • select a predefined residue set (hydrophobic, aromatic, etc.) or your own amino acid set
HelixWheel
  • draws an helical wheel, i.e. an axial projection of a regular alpha-helix, for a given sequence, starting number and selected coloring scheme
  • the hydrophobic scale is used to color
HelixDraw
  • display your amino acid sequence as a helical wheel
  • enter your sequence into the text box at the top, choose the number of the amino acid to start the helix from
  • maximum sequence length displayed is 19 amino acids
RandSeq
  • generates a random protein sequence
  • option between equal composition for all amino acids, composition of a specific sequence, average amino acid composition, or user specified composition in percent
  • if user specified, enter the percentages for corresponding amino acid residues
Prof
  • predicts secondary structure given an amino acid sequence
  • enter your sequence
  • results are sent back via email
GOR IV
  • uses all possible pair frequencies within a window of 17 amino acid residues
  • one output is eye-friendly giving the sequence and the predicted secondary structure in rows, H=helix, E=extended or beta strand and C=coil
  • the second gives the probability values for each secondary structure at each amino acid position
  • enter the sequence
HNN
  • gives a secondary structure prediction
  • the abstract for this program and how it predicts the structure is found here
  • enter the sequence
Jpred
  • Jnet is a neural network prediction algorithm that works by applying multiple sequence alignments, alongside PSIBLAST and HMM profiles
  • Consensus techniques are applied that predict the final secondary structure more accurately
  • Jnet can also predict 2 state solvent exposure at 25, 5 and 0% relative exposure
  • this is software that must be downloaded to be used
nnPredict
  • predicts the secondary structure type for each residue in an amino acid sequence
  • basis of the prediction is a two-layer, feed-forward neural network
  • predicted type will be either: 'H', a helix element; 'E', a beta strand element, or '-', a turn element
  • uses the tertiary class of the protein (either none, all- alpha, all-beta, or alpha/beta) for prediction
  • enter the sequence
Predator
  • a secondary structure prediction program and can optimally use a set of unaligned sequences as additional information to predict the query sequence
  • it relies on careful pairwise local alignments of the sequences in the set with the query sequence to be predicted
  • enter your sequence, the results are returned via email
  • options for format and for the sequence set
PSA
PSIpred
  • incorporates methods PSIPRED, GenTHREADER and MEMSAT 2 for predicting structural information about any given protein from its amino acid sequence alone
  • PSIPRED carries out a reliable secondary structure prediction on a protein incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST
  • MEMSAT 2 (Jones, 1994; Jones, 1998) is the latest version of a method for inferring the topology of transmembrane proteins
  • click on the access the server link to get to the form page
  • enter the sequence and email as well as the type of prediction
SOPMA
  • secondary structure prediction program called self-optimized prediction method
  • the abstract for this program and how it predicts the structure is found here
  • enter your sequence, the number of conformational states and similarity threshold for the prediction
SWISS-MODEL
  • first approach mode
    • enter your email to which the results will be sent as well as name and title
    • enter you sequence
    • the lower BLAST limit
    • define the templates you wish to use, you can search for one, pick from a list or use your own which must follow the guidelines provided
  • optimise (project) mode - will allow you to submit a project file to be analysed made in PdbViewer
  • oligomer modelling - using the latest version of SwissPDB-Viewer, the instructions for modlelling of olgiomeric proteins can be found here
  • GPCR mode - modelling of 7TM/GPCR proteins requires choosing a template that aligns your 7 helices and another template that models your protein
  • you'll need Swiss-PdbViewer which is the next description below also containing the link to download it
Swiss-PdbViewer
  • Swiss-PdbViewer is an application that provides a user friendly interface allowing to analyse several proteins at the same time
  • the proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts
  • amino acid mutations, H-bonds, angles and distances between atoms are easy to obtain thanks to the intuitive graphic and menu interface
  • it is possible to thread a protein primary sequence onto a 3D template and get an immediate feedback of how well the threaded protein will be accepted by the reference structure before submitting a request to build missing loops and refine sidechain packing
  • Swiss-PdbViewer can also read electron density maps, and provides various tools to build into the density
  • In addition, various modelling tools are integrated and command files for popular energy minimisation packages can be generated
Geno3D
  • Geno3D server release 1, generates model with no more than 300 amino acids
  • template and query sequences must share more than 35% of pairwise identity
  • database : NPSA 3D SEQUENCES AT 100% HOMOLOGY (from PDB)
  • generation of up to 3 3D models
  • the user provides a sequence to be modeled which is compared using PSI-BLAST method, a protein sequence database issue from PDB (all entries and entries with no more than 95% homology)
  • the user selects a PDB entry as the template for molecular modeling (in this release, only template which are more than 35% of pairwise sequence identity can be selected by user)
  • template, and query sequence are aligned using clustalw program
  • distances and dihedral angles restraints on the query sequence are calculated from the alignment with template 3D structure
  • for gaps, statistical restraints are used
  • these restraints are used as input for CNS software
  • the output are 3D models which satisfies theses restraints as well as possible
  • the results are returned via email as an attachment
CPHmodels
  • predicts protein structure using comparative (homology) modelling
  • enter the sequence and the sequence name
3D-PSSM
  • protein fold recognition
  • enter your sequence, email address and protein title
  • in the advanced interface, you can choose whether its global or local, to filter low complexity regions, and the number of interations of PSI-Blast
SWEET
  • constucts saccharid models
  • the concept is shown visually in a flow chart here
  • there are 3 modes of input, all correspond to the branches of the saccharid
  • click on input/work at the side to query a model
  • the format is in pdb, click on the examples to see how the inputs should be entered, it helps to visualize the branches of your model
DAS
  • predicts transmembrane alpha-helices regions in sequence
  • based on low-stringency dot-plots of the query sequence against a collection of non-homologous membrane proteins using a previously derived, special scoring matrix
  • enter your protein sequence in one letter code
HMMTOP
  • an automatic server for predicting transmembrane helices and topology of proteins
  • the simple form is accessed by clicking on the submit link
  • the advanced form is accessed by clicking on advanced
  • the advance form contains options for the sequence format, type of sequence, speed of prediction, output format and allows you to specify localization of sequence parts
PredictProtein
  • predict secondary structure, base threading, solvent accessablility, as well as transmembrane helices
  • it can also evaluate prediction accuracy
  • the default form allows input of sequence only and performs all predictions
  • the advanced form adds options to format the output
  • the expert form adds options for each type of prediction
  • methods and databases it searches through, what it sweeps through in each database, and how the results are put together is here, very informative
TMAP
  • predicts transmembrane helices bases on multiple sequence alignment
  • enter the sequence
TMHMM
  • prediction of transmembrane helices
  • enter the sequence
  • the documentation and how to interpret the output is linked here
TMpred
  • prediction of membrane-spanning regions and their orientation
  • the algorithm is based on the statistical analysis of TMbase, a database of naturally occuring transmembrane proteins
  • the prediction is made using a combination of several weight-matrices for scoring
  • choose the output format and enter your sequence
TopPred2
  • prediction of location and orientation of transmembrane helices
  • enter the sequence
  • options for organism type, cutoffs, and output
  • the only documentation is a reference to the original publication which is shown on the main page
SIM + LALNVIEW
  • program which finds a user-defined number of best non-intersecting alignments between two protein sequences or within a sequence
  • enter your 2 sequences
  • options for gaps and number of alignments
LALIGN
  • compare two sequences looking for local sequence similarities
  • enter your 2 sequences
  • options for gaps, number of alignments, and scoring matrix
  • documentation and manual are linked here

CLUSTALW

EBI, PBIL, EMBnet-CH or MBS (MBSALIGNER)

  • aligns multiple sequences, the abstract is linked here
  • enter you sequences each proceeded by a >name of sequence, make sure each name is different (click the help link for an example of how to input sequences)
  • you can also upload a file
  • options for alignment, gaps, scoring matrix, output, and cpu (multiprocessing or single process)

T-Coffee

EMBnet Switzerland
GPCR

  • aligns multiple sequences
  • enter your sequence
  • more accurate than ClustalW for sequences with less than 30% identity, but it
    is slower
ALIGN
  • aligns multiple sequences
  • enter your 2 sequences
  • options for alignment
  • interactive or email the results
DIALIGN
  • constructs pairwise and multiple alignments by comparing whole segments of the sequences
  • no gap penalties, efficient when not globally related but shares local similarities like genomic DNA and many proteins
  • enter your sequence (<= 100 sequences)
  • options for threshold and similar regions
Match-Box
  • protein sequence multiple alignment tools based on strict statistical criteria
  • a reliability score is provided below each aligned position
  • the Match-Box program is particularly suitable for finding and aligning conserved structural motives, in particular in protein core
  • enter your sequences, help on input can be found here
  • options for output
MSA
  • a multiple sequence alignment using the algorithm originally proposed by Altschul, Lipman, Kececioglu and Miner (1989) of the NCBI, and later modified by Gupta, Kececioglu and Schäffer
  • enter your protein sequences (up to 8)
  • options for alignment, gaps, and weight of alignments
  • enter the appropriate code in code column depending on the format of your sequence or database ID

Multalin

INRA
PBIL

  • based on the conventional dynamic-programming method of pairwise alignment
  • options for output, gaps, scoring method, conservation levels
  • enter your sequences
MUSCA
  • enter your sequence and equivalency set (either selected or your own)
  • in "options" the help file is here
  • in "parameters" the help file is here
  • in "equivalency sets" the help file is here
  • the help files pertain to all the programs on the site so look for the options that are available to this program
AMAS
  • documentation and manual are here
  • enter your sequences
  • define which sets of sequences in the alignment AMAS will compare to which others
  • options for property table, conservation threshold, and output
Bork's alignment tools
  • Pasting positions: adds x positions to the left and/or y positions to the right of a CLUSTAL multiple sequence alignment
  • Inter-block gap sizes: calculates inter-block gap sizes for blocks in a CLUSTAL multiple alignment and checks for mismatches between aligned sequences and master sequences.
  • Consensus: calculates the consensus for the CLUSTAL or MSF multiple alignment.
CINEMA
  • the program allows visualisation and manipulation of both protein and DNA
    sequences
  • CINEMA allows you to build alignments interactively, either using a free-format Cut and Paste facility to import your own protein or DNA sequences, or by adding sequences directly from the OWL composite database
  • the site has detailed instructions for the controls and methods
  • very computer intensive, needs a fairly fast machine
ESPript
  • prints a multiple alignment
  • upload the file for aligned sequences and secondary structure
  • options for similarity calculations and output
plogo / WebLogo
  • aligned sequences made into logos
  • parameters for logo representation (height proportional to frequence/fraction of frequency to expected frequency, and various graphical formatting)
Boehringer Mannheim "Biochemical Pathways"
  • you can search for keywords matching the entries in "Biochemical Pathways" wall chart, if more than one word is used it will search for matches containing both keywords
  • the result links to all maps in which the entry appears as well as the ENZYME database (nomenclature database)