Utilization of the paralog database #33

Mia1349 · 2022-08-24T07:40:22Z

Mia1349
Aug 24, 2022

Hi there,

Thank you for building such a great tool! along with many useful scripts, they have been a huge help for me.

I have a question about the paralog database. Can I use the paralog database within phylofisher to filter out the possible paralogs in my dataset through blast? I just not sure whether this is a right way to go. I used orthofinder to select the orthologroups and used tree based pruning strategy to filter out the single copy orthologs, and I blast the results against the phylofisher paralog database, and there were a lot of hits, is that mean that they are probably paralogs and should be removed from the final matrix?

Mia

Answered by atice

Aug 24, 2022

Hi @Mia1349,

We are glad to hear you are finding PhyloFisher useful in your work!

The short answer is I don't think the strategy you suggest above will be informative in determining whether or not you have paralogs remaining in your dataset.

Here is the much longer explanation as to why. 1) The sequences in the provided paralogs dataset of PhyloFisher will produce significant hits closely related sequences that in your case may be the desired ortholog. This is the reason we maintain them and manually inspect homolog trees for ortholog selection. 2) In the strategy you have taken some of our "paralogs" may be fine to maintain in your dataset because they have an orthologous relationship to…

View full answer

atice · 2022-08-24T15:54:16Z

atice
Aug 24, 2022
Maintainer

Hi @Mia1349,

We are glad to hear you are finding PhyloFisher useful in your work!

The short answer is I don't think the strategy you suggest above will be informative in determining whether or not you have paralogs remaining in your dataset.

Here is the much longer explanation as to why. 1) The sequences in the provided paralogs dataset of PhyloFisher will produce significant hits closely related sequences that in your case may be the desired ortholog. This is the reason we maintain them and manually inspect homolog trees for ortholog selection. 2) In the strategy you have taken some of our "paralogs" may be fine to maintain in your dataset because they have an orthologous relationship to one another. For example, the gene CDK5 (used in the PhyloFisher dataset) had a duplication event early in the history of eukaryotes. Some extant taxa have maintained "copy 1" and some have "copy 2." Some may even have both but I cannot remember at the moment. For the sake of conversation we will say "copy 1" is maintained as the ortholog in the PhyloFisher database and "copy 2" is maintained as the paralog. However, if we had split this tree to produce two sequence files (such as your algorithm might likely have) one containing only sequences in the "copy 1" clade and the other containing only sequences that make up the "copy 2" clade then sequences within a file have an orthologous relationship to one another and are therefore fine to use in the final analysis. Again for the sake of conversation you might well have a file in your dataset that contains only sequences of CDK5 "copy 2" which would be fine for inclusion in your final matrix but would produce highly significant hits to the paralogs database of PhyloFisher. However, you might also have a mixture of "copy 1" and "copy 2" and you would not be able to tell.

One strategy to take is to make a custom PhyloFisher database with your ortholog files (some or all depending on how thorough you want to be ) using the script build_database.py and move through the PhyloFisher workflow by recollecting sequences from all or even just a few taxa in your dataset. Then building and manually inspecting the resulting gene trees to evaluate ortholog selection in your dataset.

I hope this is helpful. Please let me know if I can further clarify anything for you or if you have additional questions I can answer. Thank you for your interest in PhyloFisher.

Alex

1 reply

Mia1349 Aug 26, 2022
Author

Dear Alex，

Thank you for your explanation! that clear things up so much! I will try and follow the phylofisher worklfow next~

Thank you again for your help!

Mia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Utilization of the paralog database #33

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Utilization of the paralog database #33

Uh oh!

Mia1349 Aug 24, 2022

Replies: 2 comments · 1 reply

Uh oh!

atice Aug 24, 2022 Maintainer

Uh oh!

Mia1349 Aug 26, 2022 Author

Mia1349
Aug 24, 2022

Replies: 2 comments 1 reply

atice
Aug 24, 2022
Maintainer

Mia1349 Aug 26, 2022
Author