You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-19Lines changed: 40 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,10 @@
1
1
# clan_check
2
-
Check trees for compatibility with defined monophyletic [edit - not right terminology ] groups - "The incontrovertible clan test"
2
+
Check trees for compatibility with defined monophyletic [edit - not right terminology ] groups - "The incontestable clan test"
3
3
4
4
## Background
5
-
###What does it do?
5
+
6
+
### What does it do?
7
+
6
8
Clan_check analyses single-copy phylogenetic trees to assess if they violate clans* defined by the user.
7
9
8
10
>*see the following paper for a definiton of a "clan"
@@ -15,29 +17,30 @@ The output is a list for all the trees of each clan using a scoring of 1 or 0 wh
15
17
16
18
The software will also return a 1 if the none of the taxa from the clan are found in the tree, or if only 1 of the taxa are found.
17
19
18
-
A "0" means that two or more of the taxa from that clan were found and they were not monophyletic.
20
+
A "0" means that two or more of the taxa from that clan were found and they were not in a clan (i.e. they were not together to the exclusion of all other taxa on the tree).
19
21
20
22
### But... why?
23
+
21
24
This is designed for large-scale phylogenomic analyses where the user may have thousands of phylogenetic trees. While every effort may have been taken to ensure that the best orthlogs have been chosen, sometimes due to hidden paralogy it is not easy to get the choice right.
22
25
23
-
In these cases, the only evidence that the gene family may be problematic is when the resulting phylogeentic tree is "incorrect".
26
+
In these cases, the only evidence that the gene family may be problematic is when the resulting phylogeentic tree is incorrect for known or "incontestable" groups.
24
27
25
-
One way to test for "problematic" gene families is to look for "incontrovertible relationships" that are not part of the question being asked in the study, but without doubt should exist if the taxa are in the tree.
28
+
This involves looking for "incontestable relationships" that are not part of the question being asked in the study, but without doubt should exist if the taxa are in the tree.
26
29
27
-
An example of this is, if I was carrying out a phylogenomic study of the fishes and used several mammals as an outgroup, then I should never expect the mammal clan to be paraphyletic [edit - whats the equivalent of paraphyly for a clan?].
30
+
An example of this is, if a phylogenomic study involved the analysis of the relationships of the birds and used several mammals as an outgroup, then mammals would always be expected to group together.
28
31
29
-
In this case the mammals are an incontrovertible clan. If the mammals are paraphyletic with the fishes, then it is very likely that one of the internal branches of the tree represents a duplication and not a speciation event, and so they are not all orthologs.
32
+
In this case the mammals are an incontestable clan. If the mammals do not group together, then it is very likely that one of the internal branches of the tree represents a duplication and not a speciation event, and so some of the genes in the family may not be orthologs.
30
33
31
-
Clan_check searches for these instances.
34
+
`Clan_check` searches for these instances.
32
35
33
-
If given many such clans to check, researchers can assess the number of these clans that are violated and decide on the weight of evidence necessary to remove or re-visit the analysis of that gene family.
36
+
If given many such clans to check, researchers can assess the number of these clans that are violated and decide on the weight of evidence necessary to remove or re-visit the analysis of any gene families.
34
37
35
-
Care must be taken choosing the clans to be tested and in the designing of the study, to include taxa that allows this test to be made.
38
+
Care must be taken choosing the clans to be tested and in the design of the study to include taxa that allows this test to be made.
36
39
37
40
You can provide trees and clans of any size and `clan_check` will search for the appropriate sub-set of the clans defined.
38
41
39
42
For example:
40
-
>if you have a tree with `(A,B,(C,D));` and a clan definition of `C D E`, clan_check will search for monophylies of`C` and `D` only.
43
+
>if you have a tree with `(A,B,(C,D));` and a clan definition of `C D E`, clan_check will search for clans containing`C` and `D` only.
41
44
42
45
If only 1 of the taxa from a clan are in the tree, clan_check will assume that the clan is not violated, and return a "1" for that test (see output files detail below).
Where `tree number` is in the same order as the input trees, `size` = the number of taxa in the tree, `Clan x` is the clan definied by the xth line of the clan file.
99
105
100
-
In this example Clan 3 defined as having the monophyly of "c d a" was violated in both tree 1 and tree 2.
106
+
### Interpreting the results
107
+
108
+
A "1" in the table means that this tree did not violate this clan.
109
+
110
+
A "0" in the table means that this tree violated this clan.
111
+
112
+
A "?" in the table means that there were not enough taxa from the Clan in this tree to carry out the test (minimum required is 2 taxa).
113
+
114
+
So in the test data:
115
+
116
+
* All three trees did not contain Clan 3, (c d a) despite all three trees containing all three taxa
117
+
118
+
* Both tree 2 and tree 3 did not contain clan 1 (c d b), despite both trees containing all three taxa
119
+
120
+
* We could not test Clan 6 (g d) against Tree 1 or Tree 3 as neither of those trees had taxon "g".
121
+
122
+
For each tree, you can express the number of Clans violated as a sum, percentage, or treat any violation as a reason to exlucde the tree from further analyses. It all depends on what question you are asking and the level of stringency you wish to apply.
101
123
102
-
In this result Tree 2 violated 2 of the clans and tree 1 violoated 1.
0 commit comments