EDIA Documentation
Ediascorer takes as input a PDB or mmCIF structure file and a CCP4 electron density grid file to compute all EDIA and the EDIAm for all residues and molecules.
Ediascorer -t 1g9v.pdb -d 1g9v.ccp4 -o out/
The reports are given in one annotated PDB and two CSV files:
• PDB file annotated with EDIA and error analysis in the B factor and occupancy columns
• CSV files with
– Atom scores (*atomscores.csv):
Substructure identifier | Atom name | Substructure name | PDB infile ID | Structure ID | Chain or structure ID| | EDIA | EDIA error analysis | B factor | Occupancy | RSCC (if requested)
– Structure scores (*structuresscores.csv):
Substructure identifier | Name | Protein position or molecule ID | Chain | EDIAm | Min EDIA | Median EDIA | Max EDIA | % degree of neighborhood conformity | # good substructures | % overall percentage of well-resolved interconnected atoms (OPIA) | | # atom with no problem | # atoms with clash | # atom with not enough density | # atoms with too much density | Mean B factor | #atoms with occupancy below 1| RSCC (if requested)
The structure score file helps to identify problematic substructures by listing all residual and molecular EDIAms with additional error analysis annotated with the substructure iden- tifier r and m. Due to only one heavy atom present in water and metallic ions, both are only listed in the atom score CSV file marked with w and t together with all other heavy atoms of the given complex. The CSV or PDB file with the atomic EDIAs can subsequently be used to visualize and analyze the substructure with e.g. Chimera. To allow rapid conformer validation, an additional SDF or MOL2 file can be specified. The PDB complex will then only be evaluated once and all ligand scores subsequently added to the structure score CSV file marked with l. If a specific ligand position k in the ligand file is additionally supplied, the EDIAscorer creates the folder k in the specified output directory. Besides the full evaluation of the complex in the output directory, the binding pocket of the kth ligand in combination with the kth ligand is evaluated. Thus an additional structure and atom score CSV, the annotated PDB file and a file to assist in visualizing the ligand in Chimera are written into the created subdirectory. If an inhouse PDB file without a properly defined resolution in the PDB header is evaluated, the resolution can also be supplied via -u. The structure score CSV file aims to provide the user with an automatically processable file to analyze the quality of e.g. a ligand without visual inspection. The EDIAm is thus complemented with the minium, median and maximum atomic EDIA of the structure. The fluctuation between neighboring atomic EDIA scores is given in the entry "degree of neigh- borhood conformity" c (Equation 3). It sums and norms the EDIA score difference between the set B of bonds, connecting two heavy atoms. 1c = X |(EDIA(b.atom1) ? EDIA(b.atom2))| (3) ||B||b?B
Since there exists e.g. ligands with only partially very good electron density fit, it is of interest to see if the ligand is partitioned into substructures of at least two atoms which are well supported (EDIA of at least 0.8). Following c, the next entry lists the number of such substructures combined with the total percentage of all atoms in these substructures normalized to the number of heavy atoms in the ligand (overall percentage of well-resolved interconnected atoms: OPIA). Subsequently, the number of atoms without any flaws based on the EDIA error analysis is given, followed by the number of atoms with electron density sphere clashes, not enough density in the inner electron density sphere s(a) and too much density in the outer electron density sphere d(a). Hence it is possible to discriminate between ligands with low EDIAm due to bad support throughout the substructure versus partially well supported ligands.. The entry for the ligand closes with its mean B factor, the number of atoms with an occupancy below 1 and if requested the RSCC.