Python: module psfGen

psfGen

Generate PSF information from sequence or coordinates. Particularly useful are the pdbToPSF and seqToPSF routines. PSF information is primarily generated from information in topology files, documented here. Accompanying the topology definition of residues are patches for linking the residues (e.g. across the peptide bond), and for terminating the ends of a residue chain. The default linking and terminating patches are specified via LINK, FIRSt, and LAST XPLOR statements, documented here. Since the linking and terminating patches are specific to a particular topology, they can are specified in the topology file via lines beginning with !LINK, !FIRSt, and !LAST Protein and nucleic acid monomers are connected into a full polymer using PRESidue patch definitions in the topology definition. !LINK comments in the topology file(s) override the default values of !LINK PEPP HEAD - * TAIL + PRO END !LINK PEPT HEAD - * TAIL + * END If any !LINK statement is specified, then the defaults are erased. Additionally, topology-specific options can be given via one or more lines beginning with !option.

Classes

builtins.Exception(builtins.BaseException)

InsertionException
pdbRecordError

builtins.object

Biomt
VariantResidue

class Biomt(builtins.object)

Biomt(chains)

class containing info from a BIOMT record for generating new chains related
by rotation+ translation to specified ATOM entries.

Members are chains, rot, trans.

Methods defined here:

__init__(s, chains): Initialize self. See help(type(self)) for accurate signature.

addEntry(s, key, rot, trans)

entry(s, label)

labels(s)

Data descriptors defined here:

__dict__: dictionary for instance variables (if defined)

__weakref__: list of weak references to the object (if defined)

Data and other attributes defined here:

BiomtEntry = <class 'psfGen.Biomt.BiomtEntry'>

class InsertionException(builtins.Exception)

Exception thrown by pdbToSeq if an insertion code is detected

Method resolution order:: InsertionException; builtins.Exception; builtins.BaseException; builtins.object

Methods defined here:

__init__(s): Initialize self. See help(type(self)) for accurate signature.

Data descriptors defined here:

__weakref__: list of weak references to the object (if defined)

Static methods inherited from builtins.Exception:

__new__(*args, **kwargs) from builtins.type: Create and return a new object. See help(type) for accurate signature.

Methods inherited from builtins.BaseException:

__delattr__(self, name, /): Implement delattr(self, name).

__getattribute__(self, name, /): Return getattr(self, name).

__reduce__(...): Helper for pickle.

__repr__(self, /): Return repr(self).

__setattr__(self, name, value, /): Implement setattr(self, name, value).

__setstate__(...)

__str__(self, /): Return str(self).

add_note(...): Exception.add_note(note) --
add a note to the exception

with_traceback(...): Exception.with_traceback(tb) --
set self.__traceback__ to tb and return self.

Data descriptors inherited from builtins.BaseException:

__cause__: exception cause

__context__: exception context

__dict__

__suppress_context__

__traceback__

args

class VariantResidue(builtins.object)

VariantResidue(name, baseName, deletedAtoms=None, patch=None)

Methods defined here:

__init__(s, name, baseName, deletedAtoms=None, patch=None): Initialize self. See help(type(self)) for accurate signature.

Data descriptors defined here:

__dict__: dictionary for instance variables (if defined)

__weakref__: list of weak references to the object (if defined)

class pdbRecordError(builtins.Exception)

Method resolution order:: pdbRecordError; builtins.Exception; builtins.BaseException; builtins.object

Methods defined here:

__init__(self): Initialize self. See help(type(self)) for accurate signature.

Data descriptors defined here:

__weakref__: list of weak references to the object (if defined)

Static methods inherited from builtins.Exception:

__new__(*args, **kwargs) from builtins.type: Create and return a new object. See help(type) for accurate signature.

Methods inherited from builtins.BaseException:

__delattr__(self, name, /): Implement delattr(self, name).

__getattribute__(self, name, /): Return getattr(self, name).

__reduce__(...): Helper for pickle.

__repr__(self, /): Return repr(self).

__setattr__(self, name, value, /): Implement setattr(self, name, value).

__setstate__(...)

__str__(self, /): Return str(self).

add_note(...): Exception.add_note(note) --
add a note to the exception

with_traceback(...): Exception.with_traceback(tb) --
set self.__traceback__ to tb and return self.

Data descriptors inherited from builtins.BaseException:

__cause__: exception cause

__context__: exception context

__dict__

__suppress_context__

__traceback__

args

Functions

addDisulfideBond(sel1, sel2, bond=True): Add a disulfide bond between residues in the two atom selections.

This function should be called after PSF information is generated.  sel1 and
sel2 can be either selection strings or atomSel.AtomSel instances.  For
example, adding a disulfide bond between residues 8 and 50 can be done by:

psfGen.addDisulfideBond('resid 8', 'resid 50')  # psfGen already imported

The gamma protons are removed from both residues.  Additionally, if the bond
argument is set to True (default), a covalent bond is added between the
sulfur atoms.  Otherwise, if bond is False, the bonding step is omitted, and
the disulfide bond has to be externally enforced as a distance restraint
(for an example see http://nmr.cit.nih.gov/xplor-nih/xplorMan/node391.html).

If atoms required for the given disulfide bond are not present, a
UserWarning exception is thrown.

addResidueName(residueName, systemType='protein', simulation=None, isVariant=False): Add the specified residue name to the set of valid residue names
for the specified systemType. Valid system types are protein (the
default), nucleic, water and metal.

The isVariant argument should be set to True for variant residue
names which are not present in the topology file.

A side effect of this function is that atomSelLang.abbreviations()
are updated such that the abbreviation for the specified systemType
includes resideName (in addition to all previously specified names).

autoProcessPdbSSBonds(pdbRecord): scan a PDB file for PDB SSBOND header fields.
call addDisulfideBond as appropriate

Returns a list of lines which contain SSBOND records.

If an insertion code is found for a particular residue number, that
SSBOND entry is skipped.

cisPeptide(startResid, segName=None): Given a startResid, set up topology to make a cis peptide bond between
the C atom on residue startResid and the N atom on the subsequent residue.

Call this routine after seqToPSF.

If the optional segName argument is not specified, peptide bonds
in all segments with the matching startResid will be made cis.

This function correctly treats bonds involving proline and non-proline
residues by using patches named CIPP and CISP, respectively.
These patches currently follow one of two possible conventions,
either specifying the two residues on either side of the peptide
bond, or specifying only the residue on the N-terminal side of
this bond, the former being used if the line
!option numCisPeptideResids 1
is present in the topology file, and the seond used otherwise.

dAmino(resid, segName=' '): Change given residue to a D-amino acid.

Call this routine after seqToPSF.

The optional segName argument argument should be specified if there is
more than one segment in the structure.

deduceSeqType(seq, simulation=None): given a 3 character residue name, determine whether it's protein or
nucleic acid. DNA/RNA disambiguation is not always possible. The
default is DNA. If the 'URI' or 'URA' residue is present, RNA is recognized.

To make this determination, residue names are obtained from the current
values of the protein and nucleic topology files (set in protocol).

duplicateSegment(segid, newSegid, resid=None, nameSel='all', fast=False): generate psf info for a new segment which is identical to an existing
segment, with only different segid.

findResidueName(name, simulation=None): Given a residue name, check if it has been entered into residueType
entry. If so, return the associated system name, else return None.

grabResidueNames(file): find all residue names in the given file.

initResidueNames(system='all', simulation=None, forceReread=False): Initialize residue names from the appropriate topology file names.
These names come from the protocol.topology dictionary.

Residue names are initialized only on the first call, unless forceReread
is set to True, in which case all current settings are erased, and residue
names are reread.

The simulation argument must be specified.

pdbToPSF(pdbRecord='', psfFilename='', useChainID=True, customRename=False, processSSBonds=True, processBiomt=False, suppressExceptions=False, deleteMissingResidues=False, deleteMissingAtoms=False, simulation=None, **kwargs): Generate XPLOR PSF structure/topology information from a PDB file.

pdbRecord is a string with either the name of the PDB file or a string
containing PDB contents. If it is blank (default), no new PSF information
is generated.

If psfFilename is specified, the PSF is written to the specified file
after pdbRecord is processed.

If useChainID=True (default), and the chainID field is specified in
pdbRecord, chainID overrides the segid field for naming segments; otherwise
segid is used.

processSSBonds specifies whether the SSBOND PDB record is used in PSF
generation.

See seqToPSF for documentation on customRename and processBiomt.

Setting suppressExceptions to True will cause exceptions due to failures in
seqToPSF to be caught, allowing partial processing of mal-formed PDB files.

If deleteMissingResidues=True, the PDB REMARK 465 record is read, and
the indicated residues are deleted.

If deleteMissingAtoms=True, the PDB REMARK 470 record is read, and
the indicated atoms are deleted.

If pdbRecord contains a BIOMT record, and processBiomt is True, then
for each BIOMT entry (starting with 2), a new segid is created by
concatenating the chainID with the BIOMT label (e.g. A2).

pdbToSeq(pdbRecord, useSeqres=False, useChainID=True, processBiomt=False, **kwargs): Return sequence information generated by seqFromPDB or seqFromCIF.

pdbRecord is a string with the contents of a PDB file, from which the
sequence(s) is (are) obtained.

If useChainID=True (default), and the chainID field is specified in
pdbRecord, chainID overrides the segid field for naming segments; otherwise
segid is used.

processBiomt specifies whether the REMARK 350 BIOMT record is used. By
default, Biomolecule 1 is read, specifying a different integer to the
processBiomt argument will read that entry.

kwargs specifies the optional extra argument failIfInsertion.
If failIfInsertion=False, a warning message will be printed and
the residue will be skipped if the insertion code field of the
ATOM record is not blank.  If failIfInsertion=True, an
InsertionException will be raised if an iCode (insertion code) is
encountered.  If the argument is omitted, the module-local
variable of the same name will be used, and its default value is
True.

The return value is an object with three members: seqs, a list of
tuples containing (beginResid,segid,seq,seqType), where seq is a list of
residue names, biomt, which is populated if processBiomt=True, and
an array altRecs which is populated by structures which have multiple
sidechain conformations.

processBiomtEntries(remark350, biomol=1): Read info for symmetric subunits generated by BIOMT entries.

This will read the entry for the biomolecule labelled biomol.

Non-trivial (rotation/translation) transformations are read and stored
along with the specified chain ids, and returned in a Biomt object

remark465(pdbRecord): Given a PDB record, read a REMARK 465 record and return a
dictionary with chains as keys, and values are lists of residue
numbers.

remark470(pdbRecord): Given a PDB record, read a REMARK 470 record and return a
dictionary with chains as keys, and values are dictionaries with resid
keys and list of atom names values.

renameAtoms(sel='all')

renameResidues(seq): single letter to three letter names- only for nucleic acids.

seqFromCIF(pdbRecord=None, cif=None, useSeqres=False, processBiomt=False, failIfInsertion=-1): Return a list of list of sequences for an input record in mmCIF format-
either as a string using the pdbRecord argument or as a
cif.Cif object using the cif argument. The return value is an
object with a seqs attribute. This attribute is a list of
(startResid, segid, seq, seqType) tuples, where seq is a list of residue names,
and seqType is 'auto' unless an O2' atom is found in which case the type becomes
'likely_rna'

pdbRecord is a string with the contents of a PDB file, from which the
sequence(s) is (are) obtained.

processBiomt specifies whether the REMARK 350 BIOMT record is used. By
default, Biomolecule 1 is read, specifying a different integer to the
processBiomt argument will read that entry.

If failIfInsertion=False, a warning message will be printed and
the residue will be skipped if the insertion code field of the
ATOM record is not blank. If failIfInsertion=True, an
InsertionException will be raised if an iCode (insertion code) is
encountered. If the argument is omitted, the module-local
variable of the same name will be used, and its default value is
True.

seqFromPDB(pdbRecord, useSeqres=False, useChainID=True, processBiomt=False, failIfInsertion=-1, includeHETATM=False): Return an object with a seqs attribute. This attribute is a list of
(startResid, segid, seq, seqType) tuples, where seq is a list of
residue names, and seqType is 'auto' unless an O2' atom is found
in which case the type becomes 'likely_rna'. If useSeqres=True, the type of
nucleic acid is determined by the length of the shortest residue name:
if it is 1 character, "rna" is returned, while if it is 2 characters, "dna"
is returned.

pdbRecord is a string with the contents of a PDB file, from which the
sequence(s) is (are) obtained.

If useChainID=True (default), and the chainID field is specified in
pdbRecord, chainID overrides the segid field for naming segments; otherwise
segid is used.

processBiomt specifies whether the REMARK 350 BIOMT record is used. By
default, Biomolecule 1 is read, specifying a different integer to the
processBiomt argument will read that entry. In this case, the return value
will contain a biomt attribute with this information.

If failIfInsertion=False, a warning message will be printed and
the residue will be skipped if the insertion code field of the
ATOM record is not blank.  If failIfInsertion=True, an
InsertionException will be raised if an iCode (insertion code) is
encountered.  If the argument is omitted, the module-local
variable of the same name will be used, and its default value is
True.

seqToPSF(seq, seqType='auto', startResid=1, deprotonateHIS=False, segName=' ', disulfide_bonds=[], disulfide_bridges=[], amidate_cterm=False, beginPatch=None, endPatch=None, customRename=False, sync=True, singleChar=False, psfFilename='', useVariantResnames=False, convertToGly=False, simulation=None, ntermPatch=None, ctermPatch=None, useDIHE=False): Given a protein or nucleic acid sequence, generate PSF info and load
the appropriate parameters.

The seq argument can be a string containing the sequence or the name of a
file containing the sequence.  Lines which contain a '#' as the first
non-whitespace character are treated as comments and ignored.

If seqType is auto, the system type (i.e., 'protein', 'dna', 'rna', 'water',
'metal') is determined from the sequence.  Type 'rna' must be explicitly
specified if a URA or URI residue is not present in the sequence.  However,
the seqType value 'likely_rna' indicates that seqType should be switched to
'rna' if the sequence is consistent with 'dna' - this value might be passed
if the O2' atom has been detected.  A value 'custom' can be specified for
stand-alone residue molecules; for this seqType topology and parameters are
not loaded by this function, and must be loaded via an alternate mechanism.
Normally, topology and parameters corresponding to seqType are initialized
using protocol.initTopology and protocol.initParams.  Protein
residue names not present in the input protein topology file are converted
to 'GLY' if convertToGly=True, otherwise an error will result.

By default, histidines are fully protonated, but if deprotonateHIS
is True, the HD1 atoms are deleted.  Alternatively, deprotonateHIS
can be specified as a sequence of residue numbers specifying which
residues the HD1 atom should be deleted from.  Histidine
protonation state can also be specified using residue name
variants HIS (fully protonated), HSD (HD1 present, no HE2), and
HSE (HE2 present, no HD1).  If a histidine variant residue name is
detected, the deprotonateHIS argument is ignored.

Variant residue names handled directly by seqToPSF are described
by the variantResidues dictionary.  By default, the residue is set
to the base residue name, but the variant name can instead be used
if the useVariantResnames argument is set to True.  If a residue with
the variant's name is present in the topology database it is used
instead of the entry in variantResidues.

If segName is shorter than four characters, leading characters are
space-padded.  It is an error for it to be longer than 4 characters.

Disulfide bonds are specified by a list of resid pairs (numbers) in either
disulfide_bonds or disulfide_bridges.  Use disulfide_bonds for actual bonds
and disulfide_bridges to remove the cysteine HG proton - for representing
disulfide bonds by NOE restraints.

Customization of terminal protein residues is handled by the
amidate_cterm, beginPatch, and endPatch arguments (ntermPatch and
ctermPatch can be used for the later two, but are deprecated).  If
amidate_cterm is set to True, the C-terminus is amidated, otherwise
endPatch is used for the terminal residue (by default, this adds
a second oxygen to the C-terminal carbon).  A custom patch can
also be specified for the N-terminus using the beginPatch
argument (by default the terminal nitrogen atom will have three
protons, or two if it's a proline).  Patch names must be specified
in toppar/protein.top.  Blank values of beginPatch or endPatch
may be specified to suppress patching termini altogether.  The
default values for proteins are beginPatch='NTER' and
endPatch='CTER', while for nucleic acids they are '5TER' and '3TER',
respectively.

If customRename is True, the following atom renamings are made for nucleic
acids:

        ADE H61 --> HN'
        ADE H62 --> HN''
        GUA H21 --> HN'
        GUA H22 --> HN''
        CYT H41 --> HN'
        CYT H42 --> HN''
        THY C5A --> CM

The sync argument specifies whether XPLOR arrays are copied back to
the C++ interface.  This should almost always be the default value (True).

If singleChar is True, the input sequence string is understood to be single
character residue codes separated by zero or more spaces and newlines.

If psfFilename (a string) is specified, the topology information is written
to a file (i.e. the PSF file) named psfFilename.

If useDIHE is True, enable dihedral statements defined in the topology
file.

seqres(lines): Read the pdb DBREF and SEQRES fields and return a list containing
tuples of (startResid,ChainID,sequence,seqType), where startResid
is deduced from the DBREF (or DBREF1) record, and seqType is "rna" if the
shortest residue name has a single character, "dna" if the shortest
residue name is two characters, and "protein" otherwise.

Data
		default_beginPatch = 'NTER' default_endPatch = 'CTER' failIfInsertion = True residueMap = {'A': 'ADE', 'C': 'CYT', 'DA': 'ADE', 'DC': 'CYT', 'DG': 'GUA', 'DT': 'THY', 'G': 'GUA', 'HOH': 'WAT', 'I': 'INO', 'T': 'THY', ...} tagResids = {'CYSP': 'OS1'} terminalAtoms = ['H5T', 'H3T'] variantResidues = {'protein': [<psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>, <psfGen.VariantResidue object>]}