You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-26Lines changed: 28 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,11 +16,10 @@ Ligand-binding site prediction based on machine learning.
16
16
17
17
### Description
18
18
19
-
P2Rank is a stand-alone command line program that predicts ligand-binding pockets from a protein structure.
20
-
It achieves high prediction success rates without relying on an external software for computation of complex features
21
-
or on a database of known protein-ligand templates.
19
+
P2Rank is a stand-alone command-line program for fast and accurate prediction of ligand-binding sites from protein structures.
20
+
It achieves high prediction success rates without relying on external software for computation of complex features or on a database of known protein-ligand templates.
22
21
23
-
### 📰 What's new?
22
+
### ✨ What's new?
24
23
25
24
* Version **2.5** brings speed optimizations (~2x faster prediction), ChimeraX visualizations, and improvements to rescoring (`fpocket-rescore` command).
26
25
* Version **2.4.2** adds support for BinaryCIF (`.bcif`) input and rescoring of fpocket predictions in `.cif` format.
@@ -52,8 +51,8 @@ See more usage examples below...
52
51
### Algorithm
53
52
54
53
P2Rank makes predictions by scoring and clustering points on the protein's solvent accessible surface.
55
-
Ligandability score of individual points is determined by a machine learning based model trained on the dataset of known protein-ligand complexes.
56
-
For more details see the slides and publications.
54
+
Ligandability score of individual points is determined by a machine learning model trained on a dataset of known protein-ligand complexes.
55
+
For more details, see the slides and publications.
57
56
58
57
Presentation slides introducing the original version of the algorithm: [Slides (pdf)](https://bit.ly/p2rank-slides)
59
58
@@ -107,17 +106,17 @@ prank predict -c alphafold test.ds # use alphafold config and model (confi
107
106
108
107
### Prediction output
109
108
110
-
For each structure file `{struct_file}` in the dataset, P2Rank produces several output files:
111
-
*`{struct_file}_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates
112
-
of their centers together with a list of adjacent residues, list of adjacent protein surface atoms, and a calibrated probability of being a ligand-binding site.
113
-
*`{struct_file}_residues.csv`: contains a list of all residues from the input protein with their scores,
114
-
mapping to predicted pockets, and a calibrated probability of being a ligand-binding residue.
115
-
* PyMol and ChimeraX visualizations in `visualizations/` directory (`.pml` and `.cxc` scripts with data files in `data/`).
116
-
* Generating visualizations can be turned off with the `-visualizations 0` parameter.
117
-
*`-vis_renderers 'pymol,chimerax'` parameter can be used to turn individual visualization renderers on/off.
118
-
*`-vis_copy_proteins 0` parameter can be used to turn off copying of protein structures to the visualizations directory (faster, but visualizations won't be portable).
119
-
*Coordinates and ligandability scores of SAS points can be found in `visualizations/data/{struct_file}_points.pdb.gz`. Here, the "Residue sequence number" (23-26 of HETATM record)
120
-
is the rank of the corresponding pocket (0 means the point doesn't belong to any pocket) and the b-factor column corresponds to the ligandability score.
109
+
For each structure file `{struct_file}` in the dataset, P2Rank generates several output files:
110
+
*`{struct_file}_predictions.csv`: lists **predicted pockets** in order of score, including each pocket's score, center coordinates, adjacent residues, adjacent protein surface atoms, and a calibrated probability of being a ligand-binding site.
111
+
*`{struct_file}_residues.csv`: lists **all residues** from the input protein along with their scores, mapping to predicted pockets, and a calibrated probability of being a ligand-binding residue.
112
+
***PyMol and ChimeraX visualizations**: `.pml` and `.cxc` scripts in `visualizations/` directory with additional files in `data/`.
113
+
* Optional settings:
114
+
* Use `-visualizations 0` to disable visualization generation.
115
+
* Use `-vis_renderers 'pymol,chimerax'` to toggle specific renderers on/off.
116
+
* Use `-vis_copy_proteins 0` to prevent copying protein structures to the visualizations directory (faster, but visualizations won't be portable).
117
+
***SAS points data**: coordinates and ligandability scores for solvent-accessible surface (SAS) points are saved in `visualizations/data/{struct_file}_points.pdb.gz`. Here:
118
+
*Residue sequence number (position 23-26) represents the pocket rank (0 indicates no pocket).
@@ -143,13 +142,6 @@ To see the complete commented list of all (including undocumented)
143
142
parameters see [Params.groovy](https://github.com/rdk/p2rank/blob/develop/src/main/groovy/cz/siret/prank/program/params/Params.groovy) in the source code.
prank rescore fpocket.ds -c rescore_2024 # use new experimental rescoring model (recommended for alphafold models)
177
-
178
-
prank eval-rescore fpocket.ds # evaluate rescoring model on a dataset with known ligands
179
169
~~~
180
170
181
171
For rescoring, the dataset file needs to have a specific 2-column format. See examples in `test_data/`: `fpocket.ds`, `concavity.ds`, `puresnet.ds`.
@@ -197,6 +187,18 @@ In this case, the dataset file can be a simple list of pdb/cif files since Fpock
197
187
`prank fpocket-rescore` will produce `predictions.csv` as well, so it can be used as an in-place replacement for `prank predict` in most scenarios.
198
188
Note: if you use `fpocket-rescore`, please cite Fpocket as well.
199
189
190
+
### Evaluate prediction and rescoring models
191
+
192
+
Use following commands to calculate prediction metrics (prediction success rates using DCA, DCC, ...) on structure files, where the ligands are present.
193
+
194
+
~~~ruby
195
+
prank eval-predict -f test_data/1fbl.pdb # evaluate default prediction model on a single file
196
+
prank eval-predict test.ds # evaluate default prediction model on a dataset with known ligands
197
+
prank eval-predict -c alphafold test.ds # evaluate specific prediction model on a dataset with known ligands
198
+
199
+
prank eval-rescore fpocket.ds # evaluate default rescoring model on a dataset with known ligands
200
+
prank eval-rescore -c rescore_2024 fpocket.ds # evaluate specific rescoring model on a dataset with known ligands
0 commit comments