Blast on the web
Help on our implementation of blast
Installed Blast
We have currently installed the version of Blast :BLASTP 2.2.15 [Oct-31-2006]
The reference for this tool is:
Altschul, Stephen F., Thomas
L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb
Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
The official Blast man pages are available on this link.
Help for the Bioinformatica Blast
Explanations for the form options.- DB Type
- You can select 3 types of Databases: protein, nucleic acids or a vector database. Selecting first this option will set the correct values to the other selection boxes.
- Search Title
- Here you can select the kind of databases you want to be listed: all of them or a specific subset
- DB name
- Select the database you want to run the search against. Only one choice is possible.
- Program
- Depending on the Db type chosen you will see listed these programs
- Blastp will compare a protein sequence against the protein database of your choice (see option DATABASE).
- Blastx will translate a nucleic acid sequence in all six reading frames and compare all these against the protein database of your choice.
- Blastn will search a DNA sequence against a DNA databank.
Default Word sizes (the size of the initial word that must be matched between the database and the query sequence)
- blastp: 3
- blastx: 3
- blastn: 11
- Align Views
- pairwise
Aligns your query sequence and database matches in pairs. Matches are connected with a "|" symbol. Mismatches are opposed with a spce. Gaps are introduced with a "-" symbol. - M/S with identities
The databases alignments are anchored (shown in relation to) to your query sequence.
Identities are displayed as dots (.).
Mismatches are displayed as single letter nucleotide abbreviations(c,t,a or g).
Gaps are introduced with a "-" symbol. - M/S without identities
The databases alignments are anchored (shown in relation to) to your query sequence.
Identities are shown as single letter nucleotide abbreviations.
Mismatches displayed as single letter nucleotide abbreviations(c,t,a or g).
Gaps are introduced with a "-" symbol. - Flat Query-anchored with identities
The 'flat' display shows inserts as deletions on the query.
Identities are displayed as dots.
Mismatches displayed as single letter nucleotide abbreviations (c,t,a or g).
Gaps are introduced with a "-" symbol. - Flat Query-anchored without identities
The 'flat' display shows inserts as deletions on the query.
Identities are displayed as as single letter nucleotide abbreviations (c,t,a or g).
Mismatches displayed as single letter nucleotide abbreviations (c,t,a or g).
Gaps are introduced with a "-" symbol. - Matrix
-
Use this option to set which comparison matrix
should be used when searching the database.
The default matrix for a protein blast is blosum62. You
may choose from a complete list of matrices
which should cover various evolutionary
constraints.
Default Matrices: - blastp: blosum62
- blastx: blosum62
- blastn: DNA Identity Matrix
- Expected Threshold
- The expected threshold establishes a statistical significance threshold for reporting database sequence matches. The default value is 10, meaning that 10 matches are expected to be found merely by chance. Lower expected thresholds are more stringent, leading to fewer chance matches being reported. Increasing the expected threshold shows less stringent matches and is recommended when you are performing searches with short sequences as a short query is more likely to occur by chance in the database than a longer one, so even a perfect match (no gaps) can have low statistical significance and may not be reported. Increasing the Expected threshold allows you to look farther down in the hit list and see matches that would normally be discarded because of low statistical significance. Generally a value of up to 1000 is enough to see results.
- Filter
- The filter option, if set to true, will allow you to mask out various segments of the query sequence for regions which are non-specific for sequence similarity searches. Filtering can eliminate statistically significant but biologically uninteresting reports from the output, for example hits against common acidic-, basic- or proline-rich regions, leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. Filtering is only applied to the query sequence, not to database sequences.
- Drop off
- This is the amount a score must drop before extension of word hits is halted.
- Open Gap
- The gap open penalty is the score taken away for the initiation of the gap in sequence or in structure. To make the match more significant you can try to make the gap penalty larger. It will decrease the number of gaps and if you have good alignment without many gaps, its Z-score will be higher.
- Extended Gap
- The gap extension penalty is added to the standard gap open penalty for each base or residue in the gap. This is how long gaps are penalised. If you don't like long gaps, just increase the extension gap penalty. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring.
- Gap Align
- This is a true/false answer that tells the program the perform optimised alignments within regions involving gaps. If set to true, the program will perform an alignment using gaps. Otherwise, if it is set to false, it will report only individual HSP where two sequence match each other, and thus will not produce alignments with gaps.
- Scores
-
Setting this option to any number available in
the menu allows you to set to maximum number
of reported scores in the output file. This is the -v option of the Blast command line.
- Alignments
- Select the number of alignments you want to see displayed in the ouptut file. This is the -b option of the blast command line.
- Sequence
- You can cut and paste or type a nucleotide or protein sequence into the text window. The only accepted format is a FASTA format. This format contains a one line header followed by lines of sequence data. The first line starts with a " >" symbol and is followed by the name of the sequence. The rest of the line is a description of the sequence (optional). The sequence itself is written in the remaining lines. Blanks lines , spaces are ignored. You can input more than one sequence.
- All sequences must follow the IUB/IUPAC standard codes
- All sequences will be checked before running Blast, and errors will be pointed. Please correct them before resubmitting the sequence.
And run Blast ! The next page will show a small alert window, which will provide you with a JobId. If you know the job will take a long time to execute, you can save this JobId (copy it), click on the exit button, and come back later to view the Blast Results on the Web