trRosetta

The trRosetta server is a web-based platform for fast and accurate protein structure prediction.

Help

About trRosetta

   Workflow
The input to trRosetta is the amino acid sequence or a multiple sequence alignment (MSA) of the query protein. Shown in Figure 1a, the trRosetta works as follows.
(1) When the sequence of query protein or MSA is submitted, a deep residual nerual network (Figure 1b) is applied to predict the inter-residue distance and orientation distribtuions. An option of including templates is recently added.
(2) The predicted distance and orientation distribtuions are converted into smooth restraints, which are used to guide the Rosetta to build 3D structure models based direct energy mimization.



Figure 1. The flowchart and network architecture of the trRosetta algorithm.


   MSA Generation
The trRosetta server runs HHblits against the uniclust28_2018_08 database for MSA generation. Six MSAs are generated in total, including two default MSAs at two e-value cutoffs (1 and 0.001) plus four filtered MSAs at coverage cutoffs 75% and 50% of the default MSAs. Each of these MSAs is submitted to the network to predict the 2D geometries. The MSA with the highest average probability of the top-predicted contacts is selected as the final MSA.

   Template Detection
Template detection in the trRosetta server is performed by running HHsearch against the PDB70 database. A template is used for further prediction only when it satisfies the following conditions: Confidence > 0.6, E-value < 0.001 and Coverage > 0.3.

When to choose modeling with templates?
Figure 2 displays a head-to-head comparison between trRosetta modeling without and with templates on 143 CAMEO targets. In general, the inclusion of homologous templates is beneficial for targets in all degrees of difficulties, and the improvement on the easier ones is more significant. Thus, we recommend modeling with templates as the first choice.



Figure 2. A head-to-head comparison between using or not using templates in trRosetta.


   Confidence Estimation
Based on the probability of the top predicted distance and the convergence of the top models, we estimate the TM-score of the predicted models. TM-score is between 0 and 1 and a TM-score higher than 0.5 usually indicates a model with correctly predicted topology.
Figure 3a (b) shows the relationship between the estimated and the real TM-score for modeling without (with) templates on 161 CAMEO targets. The Person's correlation coefficients for not using and using templates are 0.85 and 0.73, respectively.



Figure 3. The relationship between the estimated TM-score and the real TM-score for de novo prediction (a) or using templates (b).


Job submission

Job submission guide:
(1) Input a protein sequence in FASTA format or a MSA in A3M/FASTA/A2M/STO format.
(2) Specify your input type.
(3) (Optional) Provide your email address.
(4) (Optional) Assign your target name.
(5) Choose whether to use templates when homologous templates are available.
(6) Choose whether to keep your results private.
(7) Submit.

Notice: Due to computing resource limitation, we now allow no more than 20 running/pending jobs per user at the same time.


Figure 4. The "Submit" section in trRosetta home page.

Output explanation

The trRosetta modeling results are generally summarized in a webpage, the link of which is sent to the user upon job completion if the email address has been provided during submission(see an example of the trRosetta output). A tallbar file containing the key modeling results can be downloaded from the top of this page.

   Predicted Structure Models
This section contains:
(1) Top-predicted model visualization.
(2) Model quality estimation.
(3) Modeling method description. Note that a template is used only when it satisfies Confidence > 0.6, E-value < 0.001 and Coverage > 0.3.
(4) Separate download links for models, MSA (multiple sequence alignment), inter-residue distance and orientations.


Figure 5. The "Predicted Structure Models" section in trRosetta result page.



   Predicted 2D Information
This section visualizes predicted 2D information including:
(1) Contact map, which displays the predicted probability of residue pairs being in contact, i.e., the distance between their C-beta (C-alpha for Glycine) atoms is less than 8Å.
(2) Distance map, which displays the predicted real distance (4 - 20 Å) between residue pairs.
(3) Orientation maps, containing maps of omega ([-180°, 180°]), theta ([-180°, 180°]) and phi ([0°, 180°]).


Figure 6. The "Predicted 2D Information" section in trRosetta result page.



   Predicted 1D Information
This section only exists when models are built by de novo folding. Otherwise, the predicted 1D information will be integrated into the following section "Templates used by trRosetta".

Predicted 1D information includes:
(1) Secondary structure, which is predicted by PSSpred.
(2) Disorder region, which is predicted by DISOPRED.


Figure 7. The "Predicted 1D Information" section in trRosetta result page.



   Templates used by trRosetta
This section only exists when models are built with restraints from both deep learning and homologous templates.

This section contains:
(1) Predicted 1D information, i.e., secondary structure and disorder region, as described above.
(2) Top 5 homologous models detected by HHsearch. For each template, detailed information is displayed and a model built by MODELLER using query-template alignment can be downloaded.

Note :
Confidence    The probability of a template to be a true positive.
Coverage       The number of non-gap alignment divided by the query sequence length.
Identity           The number of aligned identical residues divided by the query sequence length.
E-value           Statistical significance of the alignment. An E-value closer to 0 indicates a more significant hit.
Z-score           Normalized score of the raw alignment score. A Z-score >11 indicates a reliable hit.


Figure 8. The "Templates used by trRosetta" section in trRosetta result page.

How to cite trRosetta?

Please cite the following articles when you use the trRosetta server:
  • Z Du, H Su, W Wang, L Ye, H Wei, Z Peng, I Anishchenko, D Baker, J Yang, The trRosetta server for fast and accurate protein structure prediction, Nature Protocols, in press (2021).
  • J Yang, I Anishchenko, H Park, Z Peng, S Ovchinnikov, D Baker, Improved protein structure prediction using predicted interresidue orientations, PNAS, 117: 1496-1503 (2020).
  • Need more help?

    If you have more questions or comments about the server, please email yangjynankai.edu.cn.