-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Describe the question or problem
Hi there, I wish to conduct a search using MSGF+ where the algorithm only considers EXACT matches to the peptides provided in the fasta database (with a static and a dynamic modification).
For example, there are two peptide entries in the fasta file:
>peptide_1
MDFYAMIHAFWLIAVLYRR
>peptide_2
MDFYAMIHAFWLIAVLYR
My samples were digested with trypsin, so in my database there are only tryptic peptides (with some miscleavages that I have already included).
I am using the following settings, these are the only ones that I can think of that is relevant:
#Enzyme ID
# 0 means No enzyme used
# 1 means Trypsin (Default); use this along with NTT=0 for a no-enzyme search of a tryptically digested sample
# 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: Glu-C, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: No Enzyme (for peptidomics)
EnzymeID=9
#Number of tolerable termini
# The number of peptide termini that must have been cleaved by the enzyme (default 1)
# For trypsin, 2 means fully tryptic only, 1 means partially tryptic, and 0 means no-enzyme search
NTT=2
MSGF+ would return this result:
sample.mzML controllerType=0 controllerNumber=1 scan=65059 65059 HCD 537.5329 2 1.9908882 4 DFYAM+15.995IHAFWLIAVLYR peptide_1(pre=M,post=R);peptide_2(pre=M,post=-) 101 42 2.5559657E-9 0.006229213
My problem with this result is 2-fold:
- DFYAMIHAFWLIAVLYR is not a peptide in the database, and I do not see an option in the config to TURN OFF M-terminal M cleaveage (while I appreciate that MSGF+ probably just tried both possibilities, I still wish to turn it off to not interfere with my FDR calculations).
- The "protein" column of the PSM has the name of all entries in the database that contains the peptide. In my understanding of "no enzyme" digestion, only exact matches to the peptide given in the database should be made, and even if other entries also contain the peptide (and fit the trypsin digestion pattern), they should still not be listed under "protein" because that is not the fasta entry where the match is made. This is not really a problem, but a nuisance for parsing which exact entry the peptide match came from.
Do you have any suggestions on how I could modify the params file to get cleaner results? Thanks.