Supplement to Analysis of Information Content Present in Portein-DNA Interactions

This page contains data supplementary to the article submitted to PSB 2008. Below are a description of the files.
joint-mi.txt
The full list of mutual information between joint features and DNA-contacting classes. Abbreviations are given below. The file is arranged in 3 space-separated columns which are joint-feature, maximum MI achieved, cutoff distance at which the max was achieved. Abbreviations used for the joint features shown in the below table.
AbbreviationNum. ValuesFeature
sasa2 2 SASA
sasa33 SASA
sasa44 SASA
sasa2020SASA
ipp22 IPP
ipp33 IPP
ipp44 IPP
aa20Amino Acids
pn3 Pos/Neg/Neut Amino acids
ss3 Secondary Structure
prof55 Full profiles
prof1010Full profiles
prof2020Full profiles
pssmcon5w55 Concatenated PSSMs, sliding window size 5
pssmcon10w510Concatenated PSSMs, sliding window size 5
pssmcon20w520Concatenated PSSMs, sliding window size 5
strn5d14s35 Structural neighbor counts, within 14 angstroms, sequence distance > 3
strn10d14s310Structural neighbor counts, within 14 angstroms, sequence distance > 3
strn20d14s320Structural neighbor counts, within 14 angstroms, sequence distance > 3
strpssm5d14s35 Structural neighbor sums of PSSMs, within 14 angstroms, sequence distance > 3
strpssm10d14s310Structural neighbor sums of PSSMs, within 14 angstroms, sequence distance > 3
strpssm20d14s320Structural neighbor sums of PSSMs, within 14 angstroms, sequence distance > 3
wang5w115 pKa, hydropath, moelcular mass used by Wang and Brown (2006)
wang10w1110pKa, hydropath, moelcular mass used by Wang and Brown (2006)
wang20w1120pKa, hydropath, moelcular mass used by Wang and Brown (2006)
profcon5w55 Concatenated profiles, sliding window size 5
profcon10w510Concatenated profiles, sliding window size 5
profcon20w520Concatenated profiles, sliding window size 5
pssm55 Position Specific Scoring Matrix only
pssm1010Position Specific Scoring Matrix only
pssm20 20Position Specific Scoring Matrix only
culledids.txt
The full list of PDB files and chains which was used for the dataset. Sequences were extracted directly from the PDB file for each chain and used to ensure this set has less than 30% identity.