Each line consists of 13 numbers:

where

Note that p1+p2+p3 should be equal to p0, and p1+..+p10 should be equal to 1, with a tolerance of 0.005.

See an example:

28 116 0.906 0.190 0.597 0.118 0.042 0.018 0.010 0.005 0.004 0.003 0.012

65 101 0.906 0.022 0.670 0.215 0.071 0.018 0.004 0.001 0.000 0.000 0.000

...

The input protein structure file should be in PDB Format.

A detailed description of the PDB Format can be found here.

To facilitate the description, the following variables are defined:

In the prediction-oriented case, except for the contact precision which uses the top L residue pairs ranked by p1+p2+p3, other metrics uses top 15L residue pairs ranked by p1+..+p9 for evaluation.

A residue pair (i, j) is defined as being "correctly predicted" if the difference between Dij and dij is less than a tolerance threshold (2 Å here). Distance precision is defined as the ratio of correctly predicted residue pairs in the set S. Here P(dij≤20) is the cumulative probability of the first nine bins(bin1..bin9), reflecting the confidence of the predicted distance dij.

To effectively utilize the predicted probability, we define the fuzzy certainty of a predicted distance distribution as follows. Similar to the fuzzy analysis, besides the native distance bin, its adjacent bins are also considered but with a weight of 0.5, to reflect the dynamic feature of protein structure. Here P(.) is the predicted probability of the corresponding distance bin.

For each of the first 9 distance bins with distance ≤ 20Å, fuzzy precision (fPRE) and fuzzy recall (fREC) are defined.

To define these metrics, the set S is first divided into a maximum of 9 subsets. For the residue pairs in each subset Sk, the predicted probability of the k-th distance bin is the highest (among the first 9 distance bins). Here the word fuzzy has similar meaning to that in fuzzy certainty, which means assigning a weight of 0.5 for the predicted class (i.e., distance bin) that is not correct but is adjacent to the native class (i.e., native distance bin). Here Nk is the number of residue pairs that belong to the k-th class (according to the experimental structure), lij is the real class label for the residue pair (i, j).

The fuzzy F1 score for each class is a harmonic sum of the corresponding fuzzy precision and fuzzy recall. The macro fuzzy precision/recall/F1 are calculated as the average over the first nine bins.

The absolute error is computed as the absolute difference between the native and the predicted distance averaged over the set S. The relative error is defined similarly but with a normalization by the native distance.

Here DS and dS refer to the vectors containing the native and the predicted distances of the residue pairs in the set S, respectively. Cov(.)/Var(.) stands for the covariance/variance of the corresponding vectors.

the number of correctly predicted residue pairs divided by the number of residue pairs being evaluated. Note that here S refers to the set containing topL residue pairs ranked by p1+p2+p3.

For the native-oriented assessment, the five metrics defined above can be also calculated:

DLDDT is calculated similar as the model quality measure LDDT. Here Ri is the set of residues that are close to the i-th residue within distance 20Å and with sequence separation no less than 12.

For the full-list assessment, the three metrics defined above can be also calculated:

Macro fuzzy certainty is an extension of the previous metrics fuzzy certainty. First, the fuzzy certainty for each class is calculated as below.

Here Sk is the set of residue pairs in the k-th distance bin according to the native structure, Pk(i, j) is the predicted probability in the k-th class for the residue pair (i, j).

The MFC is then calculated as the average of the fuzzy certainty over all classes.