Personal tools
Home Help files Analysis PSI-Blast Psi-Blast man page

Psi-Blast man page

NAME

     psiblast - Position-Specific Iterated BLAST, or PSI-BLAST


SYNOPSIS

     The blastpgp program can do an  iterative  search  in  which
sequences found in one round of searching are used to build
a score model for the next round of searching. In this
usage, the program is called Position-Specific Iterated
BLAST, or PSI-BLAST. As explained in the accompanying
paper, the BLAST algorithm is not tied to a specific score
matrix. Traditionally, it has been implemented using an AxA
substitution matrix where A is the alphabet size. PSI-BLAST
instead uses a QxA matrix, where Q is the length of the
query sequence; at each position the cost of a letter
depends on the position w.r.t. the query and the letter in
the subject sequence.

The position-specific matrix for round i+1 is built from a
constrained multiple alignment among the query and the
sequences found with sufficiently low e-value in round i.
The top part of the output for each round distinguishes the
sequences into: sequences found previously and used in the
score model, and sequences not used in the score model. The
output currently includes lots of diagnostics requested by
users at NCBI. To skip quickly from the output of one round
to the next, search for the string "producing", which is
part of the header for each round and likely does not appear
elsewhere in the output. PSI-BLAST "converges" and stops if
all sequences found at round i+1 below the e-value threshold
were already in the model at the beginning of the round.

There are several blastpgp parameters specifically for PSI-
BLAST:

-j is the maximum number of rounds (default 1; i.e., regu-
lar BLAST)

-h is the e-value threshold for including sequences in the
score matrix model (default 0.001)

-c is the "constant" used in the pseudocount formula
specified in the paper (default 10)

The -C and -R flags provide a "checkpointing" facility
whereby a score model can be stored and later reused.

-C stores the query and frequency count ratio matrix in a
file

-R restarts from a file stored previously.

When using -R, it is required that the query specified on
the command line match exactly the query in the restart
file. The checkpoint files are stored in a byte-encoded
(not human readable) format, so as to prevent roundoff error
between writing and reading the checkpoint. Users who also
develop their own sequence analysis software may wish to
develop their own scoring systems. For this purpose the code
in posit.c that writes out the checkpoint can be easily
adapated to write out scoring systems derived by other algo-
rithms in such a way that PSI-BLAST can read the files in
later. The checkpoint structure is general in the sense
that it can handle any position-specific matrix that fits in
the Karlin-Altschul statistical framework for BLAST scoring.

The -B flag provides a way to jump start PSI-BLAST from a
master-slave multiple alignment computed outside PSI-BLAST.
The multiple alignment must include the query sequence as
one of the sequences, but it need not be the first sequence.
The multiple alignment must be specified in a format that is
derived from Clustal, but without some headers and trailers.
See example below. The rules are also described by the fol-
lowing words. Suppose the multiple alignments has N
sequences. It may be presented in 1 or more blocks, where
each block presents a range of columns from the multiple
alignment. E.g., the first block might have columns 1-60,
the second block might have columns 61-95, the third block
might have columns 96-128. Each block should have N rows, 1
row per sequence. The sequences should be in the same order
in every block. Blocks are separated by 1 or more blank
lines. Within a block there are no blank lines, and each
line consists of 1 sequence identifier followed by some
white space followed by characters (and gaps) for that
sequence in the multiple alignment. In each column, all
letters must be in upper case, or all letters must be in
lower case. Upper case means that this column is to be
given position-specific scores. Lower-case means to use the
underlying matrix (specified by -M) for this column; e.g.,
if the query sequence has an 'l' residue in the column, then
the standard scores for matching an L are used in the
column.

A sample usage would be:

blastpgp -i seq1 -B align1 -j 2 -d nr

where seq1 is the query
align1 is the alignment file
-j 2 indicates to do 2 rounds
-d nr indicates to use the nr database

The example files
seq1
align1 copied below were kindly supplied by L. Aravind
from a paper he and Chris Ponting published in Protein Sci-
ence:

Aravind L, Ponting CP, Homologues of 26S proteasome subunits
are regulators of transcription and translation, Protein
Science 7(1998) 1250-1254.

seq1
----
> 26SPS9_Hs
IHAAEEKDWKTAYSYFYEAFEGYDSIDSPKAITSLKYMLLCKIMLNTPEDVQALVSGKLALRYAGRQTEA
LKCVAQASKNRSLADFEKALTDYRAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQIEHISSLIKL
SKADVERKLSQMILDKKFHGILDQGEGVLIIFDEPP

align1
------
26SPS9_Hs IHAAEEKDWKTAYSYFYEAFEGYdsidspkaitslkymllckimlntpedvqalvsgklalryagrqtealkcvaqasknr
F57B9_Ce LHAADEKDFKTAFSYFYEAFEGYdsvdekvsaltalkymllckvmldlpdevnsllsaklalkyngsdldamkaiaaaaqk
YDL097c_Sc ILHCEDKDYKTAFSYFFESFESYhnltthnsyekacqvlkymllskimlnliddvknilnakytketyqsrgidamkavae
YMJ5_Ce LYSAEERDYKTSFSYFYEAFEGFasigdkinatsalkymilckimlneteqlagllaakeivayqkspriiairsmadafr
FUS6_ARATH KNYIRTRDYCTTTKHIIHMCMNAilvsiemgqfthvtsyvnkaeqnpetlepmvnaklrcasglahlelkkyklaarkfld
COS41.8_Ci SLDYKLKTYLTIARLYLEDEDPVqaemyinrasllqnetadeqlqihykvcyarvldyrrkfleaaqrynelsyksaihet
644879 KCYSRARDYCTSAKHVINMCLNVikvsvylqnwshvlsyvskaestpeiaeqrgerdsqtqailtklkcaaglaelaarky
YPR108w_Sc IHCLAVRNFKEAAKLLVDSLATFtsieltsyesiatyasvtglftlertdlkskvidspellslisttaalqsissltisl
eif-3p110_Hs SKAMKMGDWKTCHSFIINEKMNGkvw-------------------------------------------------------
T23D8.4_Ce SKAMLNGDWKKCQDYIVNDKMNQkvw-------------------------------------------------------
YD95_Sp IYLMSIRNFSGAADLLLDCMSTFsstellpyydvvryavisgaisldrvdvktkivdspevlavlpqnesmssleacinsl
KIAA0107_Hs LYCVAIRDFKQAAELFLDTVSTFtsyelmdyktfvtytvyvsmialerpdlrekvikgaeilevlhslpavrqylfslyec
F49C12.8_Hs LYRMSVRDFAGAADLFLEAVPTFgsyelmtyenlilytvitttfaldrpdlrtkvircnevqeqltggglngtlipvreyl
Int-6_Mm KFQYECGNYSGAAEYLYFFRVLVpatdrnalsslwgklaseilmqnwdaamedltrlketidnnsvssplqslqqrtwlih

26SPS9_Hs sladfekaltdy-----------------------------------------------------------------------------------
F57B9_Ce rslkdfqvafgsf----------------------------------------------------------------------------------
YDL097c_Sc aynnrslldfntalkqy------------------------------------------------------------------------------
YMJ5_Ce krslkdfvkalaeh---------------------------------------------------------------------------------
FUS6_ARATH vnpelgnsyneviapqdiatygglcalasfdrselkqkvidninfrnflelvpdvrelindfyssryascleylasl------------------
COS41.8_Ci eqtkalekalncailapagqqrsrmlatlfkdercqllpsfgilekmfldriiksdemeefar--------------------------------
644879 kqaakclllasfdhcdfpellspsnvaiygglcalatfdrqelqrnvissssfklflelepqvrdiifkfyeskyasclkmldem----------
YPR108w_Sc yasdyasyfpyllety-------------------------------------------------------------------------------
eif-3p110_Hs -----------------------------------------------------------------------------------------------
T23D8.4_Ce -----------------------------------------------------------------------------------------------
YD95_Sp ylcdysgffrtladve-------------------------------------------------------------------------------
KIAA0107_Hs rysvffqslavv-----------------------------------------------------------------------------------
F49C12.8_Hs esyydchydrffiqlaale----------------------------------------------------------------------------
Int-6_Mm wslfvffnhpkgrdniidlflyqpqylnaiqtmcphilrylttavitnkdvrkrrqvlkdlvkviqqesytykdpitefveclyvnfdfdgaqkk

26SPS9_Hs ----RAELRDDPIISTHLAKLYDNLLEQNLIRVIEPFSRVQIEHISSLIKLSKADVERKLSQMILDKKFHGILDQGEGVLIIFDEPP
F57B9_Ce ----PQELQMDPVVRKHFHSLSERMLEKDLCRIIEPYSFVQIEHVAQQIGIDRSKVEKKLSQMILDQKLSGSLDQGEGMLIVFEIAV
YDL097c_Sc ----EKELMGDELTRSHFNALYDTLLESNLCKIIEPFECVEISHISKIIGLDTQQVEGKLSQMILDKIFYGVLDQGNGWLYVYETPN
YMJ5_Ce ----KIELVEDKVVAVHSQNLERNMLEKEISRVIEPYSEIELSYIARVIGMTVPPVERAIARMILDKKLMGSIDQHGDTVVVYPKAD
FUS6_ARATH ----KSNLLLDIHLHDHVDTLYDQIRKKALIQYTLPFVSVDLSRMADAFKTSVSGLEKELEALITDNQIQARIDSHNKILYARHADQ
COS41.8_Ci ----QLMPHQKAITADGSNILHRAVTEHNLLSASKLYNNIRFTELGALLEIPHQMAEKVASQMICESRMKGHIDQIDGIVFFERRET
644879 ----KDNLLLDMYLAPHVRTLYTQIRNRALIQYFSPYVSADMHRMAAAFNTTVAALEDELTQLILEGLISARVDSHSKILYARDVDQ
YPR108w_Sc ----ANVLIPCKYLNRHADFFVREMRRKVYAQLLESYKTLSLKSMASAFGVSVAFLDNDLGKFIPNKQLNCVIDRVNGIVETNRPDN
eif-3p110_Hs ----DLFPEADKVRTMLVRKIQEESLRTYLFTYSSVYDSISMETLSDMFELDLPTVHSIISKMIINEELMASLDQPTQTVVMHRTEP
T23D8.4_Ce ----NLFHNAETVKGMVVRRIQEESLRTYLLTYSTVYATVSLKKLADLFELSKKDVHSIISKMIIQEELSATLDEPTDCLIMHRVEP
YD95_Sp ----VNHLKCDQFLVAHYRYYVREMRRRAYAQLLESYRALSIDSMAASFGVSVDYIDRDLASFIPDNKLNCVIDRVNGVVFTNRPDE
KIAA0107_Hs ----EQEMKKDWLFAPHYRYYVREMRIHAYSQLLESYRSLTLGYMAEAFGVGVEFIDQELSRFIAAGRLHCKIDKVNEIVETNRPDS
F49C12.8_Hs ----SERFKFDRYLSPHFNYYSRGMRHRAYEQFLTPYKTVRIDMMAKDFGVSRAFIDRELHRLIATGQLQCRIDAVNGVIEVNHRDS
Int-6_Mm lrecESVLVNDFFLVACLEDFIENARLFIFETFCRIHQCISINMLADKLNMTPEEAERWIVNLIRNARLDAKIDSKLGHVVMGNNAV



Powered by Plone, the Open Source Content Management System