~ This page contains common PanDDA strategies for getting the best out of your experimental data ~
Having problems with an analysis? Try the tips & strategies below
This software manages the auto-processing, import, analysis, and refinement of small, medium or large-scale crystallographic analyses.
A common error during pandda analysis is a lack of 100% completeness in the input MTZ files.
Of course, the completeness of observed reflections (Fo) is not always 100%, but the maps that pandda analyses (2mFo-DFc maps) need to be 100% complete.
Missing reflections in the 2mFo-DFc maps increase the uncertainty of all of the datasets in an analysis, and appear as ripples in the Z-maps and event maps. Both of these features decrease our ability to identify and model bound ligands and other interesting features.
We can circumvent the problem of missing Fo reflections by replacing the 2mFo-DFc reflections where Fo is missing with "DFc" reflections (our best estimate of Fo).
You can do this in two steps:
Populate the missing reflections with CAD:
cad hklin1 INPUT.mtz hklout OUTPUT.mtz <<eof
labin file 1 -
resolution file 1 999.0 HIGHRES
...where INPUT.mtz is the input file, OUTPUT.mtz is the output file, and HIGHRES is the high resolution limit of the data (e.g. 1.61Å). If your unit cell is larger than 999.0Å, you'll need to increase the number 999.0 accordingly, but it should be fine for almost all crystallographic studies.
Populate the missing 2mFo-DFc reflections with their estimated values:
This one is easy, you can either:
Recalculate the 2Fo-DFc coefficients with phenix.maps
phenix.maps MODEL.pdb OUTPUT.mtz
...which will create a file called OUTPUT_map_coeffs.mtz containing columns 2FOFCWT_fill and PH2FOFCWT_fill, which are the 2FOFC amplitudes and phases with missing values filled with their estimates -- use this file as input to pandda.analyse. If these columns aren't in the output file, you may need to supply additional parameters to
phenix.maps -- see the phenix documentation.
Or re-run refinement with phenix.refine or refmac against the new mtz file (OUTPUT.mtz). These two refinement programs will output filled 2mFo-DFc structure factors that will be picked up by pandda.analyse.
phenix.refine MODEL.pdb OUTPUT.mtz
Or re-run refinement with dimple against the new mtz file (OUTPUT.mtz).
dimple MODEL.pdb OUTPUT.mtz OUTPUT FOLDER
Sometimes, for slightly unknown reasons, it is also necessary to run uniqueify (from ccp4), prior to running CAD in step 1.
uniqueify -f FREE-R FLAG NAME INPUT.mtz OUTPUT.mtz
You then provide OUTPUT.mtz to CAD in step 1, as above.
This strategy is for dealing with a situation where you've collected a series of ground-state (unbound) datasets and a series of ligand-added datasets.
If you have a set of known ground-state datasets (e.g. where no ligand was added), then you can provide these dataset IDs to pandda.
This will mean that pandda only uses these datasets for characterising the ground-state density.
(This is a shortcut to running exclude_from_characterisation for all non-ground-state datasets)
If using pandda v0.2, you will need to use exclude_from_characterisation instead (as described below).
If you have a set of known ground-state datasets (e.g. where no ligand was added), then you should exclude all ligand-added datasets from characterisation with:
where all "id_XXX" are dataset ids for ligand-added datasets.
It will save time in the analysis, but will not hurt the results, to exclude all of the known "ground-state"/"ligand-not-added" datasets from analysis:
For a fragment screen with a high hit-rate, the inclusion of the bound datasets in the map characterisation will begin to negatively affect the analysis.
Where this is the case, the pandda analysis should be run on an iterative-outlier-rejection approach.
z_map_type="uncertainty"(pandda v0.3) or
exclude_from_charactersation="id_001,id_002,id_003,..."for the identified datasets.
z_map.map_typeflag, but still with the full
You can often re-run pandda.analyse without needing to repeat the computationally intensive parts of the analysis.
You use the output from a previous pandda analysis by setting
out_dir to the output pandda folder.
pandda.analyse keeps a record of which datasets have been previously loaded by the program. How these "old" datasets are treated by pandda is controlled by the flag
existing_datasets=reloadthen the datasets will be loaded but not re-analysed; any previously identified events in reloaded dataset will be put into the output csv.
existing_datasets=reprocessthen old datasets are considered as new datasets, and will be re-analysed fully.
existing_datasets=ignorethen old datasets are ... ignored.
Re-running pandda.analyse with
existing_datasets=reload but changing the parameter given to
events.sort_by allows you to re-order the results for pandda.inspect without re-processing anything.
You can add new datasets to an analysis by adding them to the
data_dirs input folder, and running pandda.analyse with the same commands as the initial run.
How statistical maps are used/(re-)calculated is controlled by
recalculate_statistical_maps=yes, then all previously-calculated statistical maps are deleted and new maps are calculated.
recalculate_statistical_maps=no, then only existing statistical maps are used for analysis.
recalculate_statsitical_maps=extend, then existing statistical maps are used for analysis, but new maps are calculated at higher and lower resolutions as required.
pandda.export can be run multiple times on the same directory. If the models from pandda.inspect have been changed, the output files are updated; you can therefore model ligands in pandda.inspect in stages, and simply re-run pandda.export.
However if pandda.export is set to generate restraints, this may override restraints in the output folder.