PanDDA

PanDDA Processing Strategies/Help:

data preparation

Software for generating datasets for PanDDA analysis (XChem Explorer)

XChem Explorer is a fragment screening software that uses pandda to identify bound ligands.

This software manages the auto-processing, import, analysis, and refinement of small, medium or large-scale crystallographic analyses.

Krojer, T. et al. (2017) ‘The XChemExplorer graphical workflow tool for routine or large-scale protein–ligand structure determination', Acta D Cryst.

Populating missing/invalid reflections in input MTZ files

A common error during pandda analysis is a lack of 100% completeness in the input MTZ files.

Of course, the completeness of observed reflections (F_o) is not always 100%, but the maps that pandda analyses (2mF_o-DF_c maps) need to be 100% complete.

Missing reflections in the 2mF_o-DF_c maps increase the uncertainty of all of the datasets in an analysis, and appear as ripples in the Z-maps and event maps. Both of these features decrease our ability to identify and model bound ligands and other interesting features.

We can circumvent the problem of missing F_o reflections by replacing the 2mF_o-DF_c reflections where F_o is missing with "DF_c" reflections (our best estimate of F_o).

You can do this in two steps:

Populate the missing F_o reflections with n/a values
Recalculate the 2mF_o-DF_c columns, filling in missing 2mF_o-DF_c with DF_c values.

Step 1:

Populate the missing reflections with CAD:

cad hklin1 INPUT.mtz hklout OUTPUT.mtz <<eof
 monitor BRIEF
 labin file 1 - 
  ALL
 resolution file 1 999.0 HIGHRES
eof

...where INPUT.mtz is the input file, OUTPUT.mtz is the output file, and HIGHRES is the high resolution limit of the data (e.g. 1.61Å). If your unit cell is larger than 999.0Å, you'll need to increase the number 999.0 accordingly, but it should be fine for almost all crystallographic studies.

Step 2:

Populate the missing 2mF_o-DF_c reflections with their estimated values:

This one is easy, you can either:

Recalculate the 2F_o-DF_c coefficients with phenix.maps
```
phenix.maps MODEL.pdb OUTPUT.mtz
```
...which will create a file called OUTPUT_map_coeffs.mtz containing columns 2FOFCWT_fill and PH2FOFCWT_fill, which are the 2FOFC amplitudes and phases with missing values filled with their estimates -- use this file as input to pandda.analyse. If these columns aren't in the output file, you may need to supply additional parameters to phenix.maps -- see the phenix documentation.
Or re-run refinement with phenix.refine or refmac against the new mtz file (OUTPUT.mtz). These two refinement programs will output filled 2mF_o-DF_c structure factors that will be picked up by pandda.analyse.
```
phenix.refine MODEL.pdb OUTPUT.mtz
```
Or re-run refinement with dimple against the new mtz file (OUTPUT.mtz).
```
dimple MODEL.pdb OUTPUT.mtz OUTPUT FOLDER
```

If these datasets still error in PanDDA...

Sometimes, for slightly unknown reasons, it is also necessary to run uniqueify (from ccp4), prior to running CAD in step 1.

uniqueify -f FREE-R FLAG NAME INPUT.mtz OUTPUT.mtz

You then provide OUTPUT.mtz to CAD in step 1, as above.

pandda.analyse

Processing with a known set of ground-state datasets

This strategy is for dealing with a situation where you've collected a series of ground-state (unbound) datasets and a series of ligand-added datasets.

Define a set of ground-state datasets (pandda v0.2.12 and higher only)

If you have a set of known ground-state datasets (e.g. where no ligand was added), then you can provide these dataset IDs to pandda.

This will mean that pandda only uses these datasets for characterising the ground-state density.

ground_state_datasets="id_010,id_011,id_012,..."

(This is a shortcut to running exclude_from_characterisation for all non-ground-state datasets)

If using pandda v0.2, you will need to use exclude_from_characterisation instead (as described below).

Excluding ligand-added datasets from map characterisation

If you have a set of known ground-state datasets (e.g. where no ligand was added), then you should exclude all ligand-added datasets from characterisation with:

exclude_from_characterisation="id_001,id_002,id_003,..."

where all "id_XXX" are dataset ids for ligand-added datasets.

Excluding ground-state datasets from z-map analysis

It will save time in the analysis, but will not hurt the results, to exclude all of the known "ground-state"/"ligand-not-added" datasets from analysis:

exclude_from_z_map_analysis="id_010,id_011,id_012,..."

Processing with a large percentage of unknown bound ligands (a high hit rate)

For a fragment screen with a high hit-rate, the inclusion of the bound datasets in the map characterisation will begin to negatively affect the analysis.

A modified pandda protocol identifying bound datasets

Where this is the case, the pandda analysis should be run on an iterative-outlier-rejection approach.

Run pandda with z_map_type="uncertainty" (pandda v0.3) or z_map.map_type="uncertainty" (pandda v0.2).
This will prevent the pandda variation analysis from suppressing signal in regions with a high hit-rate; this will result in more false positives in the results, but will not reject true positives (bound ligands) as much either.
Analyse the results of this pandda and record a list of identified datasets with bound ligands, or "significant features" (such as bound metal or a sidechain movement).
Note: The event maps may not be perfect at this stage, so choose any dataset that looks at all like a ligand.
Re-run the pandda in a new output directory with exclude_from_characterisation="id_001,id_002,id_003,..." for the identified datasets.
(Keep using z_map.map_type="uncertainty" from 1).
Add the newly identified bound/interesting datasets to the list of dataset ids to exclude from characterisation.
Repeat 3-4 until no new datasets are identified.
Run another new pandda without the z_map.map_type flag, but still with the full exclude_from_characterisation="<list-of-bound-dataset-ids>"
Hopefully, beautiful event maps!

Utilising or re-running previous PanDDA analyses

You can often re-run pandda.analyse without needing to repeat the computationally intensive parts of the analysis.

You use the output from a previous pandda analysis by setting out_dir to the output pandda folder.

Reprocessing old datasets (pandda v0.2 and higher)

pandda.analyse keeps a record of which datasets have been previously loaded by the program. How these "old" datasets are treated by pandda is controlled by the flag existing_datasets=....

If existing_datasets=reload then the datasets will be loaded but not re-analysed; any previously identified events in reloaded dataset will be put into the output csv.
If existing_datasets=reprocess then old datasets are considered as new datasets, and will be re-analysed fully.
If you only want to re-analyse a selection of datasets, these can be specified with reprocess_datasets="id_001,id_002..."
If existing_datasets=ignore then old datasets are ... ignored.

Re-ordering the results for pandda.inspect (pandda v0.2 and higher)

Re-running pandda.analyse with existing_datasets=reload but changing the parameter given to events.sort_by allows you to re-order the results for pandda.inspect without re-processing anything.

Adding new datasets to an existing analysis

You can add new datasets to an analysis by adding them to the data_dirs input folder, and running pandda.analyse with the same commands as the initial run.

Reusing/recalculating statistical maps from a previous analysis

How statistical maps are used/(re-)calculated is controlled by recalculate_statistical_maps=....

If recalculate_statistical_maps=yes, then all previously-calculated statistical maps are deleted and new maps are calculated.
If recalculate_statistical_maps=no, then only existing statistical maps are used for analysis.
If recalculate_statsitical_maps=extend, then existing statistical maps are used for analysis, but new maps are calculated at higher and lower resolutions as required.

modelling and refinement

Working with multi-state models: Merging, Refinement and Re-modelling

Re-running pandda.export

pandda.export can be run multiple times on the same directory. If the models from pandda.inspect have been changed, the output files are updated; you can therefore model ligands in pandda.inspect in stages, and simply re-run pandda.export.

However if pandda.export is set to generate restraints, this may override restraints in the output folder.

PanDDA Processing Strategies/Help:

Software for generating datasets for PanDDA analysis (XChem Explorer)

XChem Explorer is a fragment screening software that uses pandda to identify bound ligands.

Populating missing/invalid reflections in input MTZ files

Step 1:

Step 2:

If these datasets still error in PanDDA...

Processing with a known set of ground-state datasets

Define a set of ground-state datasets (pandda v0.2.12 and higher only)

Excluding ligand-added datasets from map characterisation

Excluding ground-state datasets from z-map analysis

Processing with a large percentage of unknown bound ligands (a high hit rate)

A modified pandda protocol identifying bound datasets

Utilising or re-running previous PanDDA analyses

Reprocessing old datasets (pandda v0.2 and higher)

Re-ordering the results for pandda.inspect (pandda v0.2 and higher)

Adding new datasets to an existing analysis

Reusing/recalculating statistical maps from a previous analysis

Working with multi-state models: Merging, Refinement and Re-modelling

How multi-state models are generated for refinement

Re-modelling multi-state models after refinement

Re-running pandda.export

Re-running pandda.export