Cross-Species Gene Symbol Mapping#
The majority of prior knowledge in OmniPath [TureiKSR16] (and similar databases) is based on human data and thus uses human gene symbols.
However, gene homology can be used to convert these symbols to those
of other organisms using decoupler.
To achieve this, it uses orthology tables extracted from the HCOP
database [YGJB21].
The following organisms are currently supported.
import decoupler as dc
dc.op.show_organisms()
['anole_lizard',
'c.elegans',
'cat',
'cattle',
'chicken',
'chimpanzee',
'dog',
'fruitfly',
'horse',
'macaque',
'mouse',
'opossum',
'pig',
'platypus',
'rat',
's.cerevisiae',
's.pombe',
'xenopus',
'zebrafish']
To demonstrate this functionality in decoupler,
here the SIGNOR database (which uses human gene symbols) is loaded.
net = dc.op.resource("SIGNOR")
net
| genesymbol | pathway | |
|---|---|---|
| 0 | ABL1 | Cell cycle: G2/M phase transition |
| 1 | ACE | Focal segmental glomerulosclerosis |
| 2 | ACTB | Axon guidance |
| 3 | ACTN1 | Axon guidance |
| 4 | ACTN1 | Glutamatergic synapse |
| ... | ... | ... |
| 2943 | YY1 | NOTCH Signaling |
| 2944 | ZAP70 | T cell activation |
| 2945 | ZAP70 | P38 Signaling |
| 2946 | ZBTB16 | Acute Myeloid Leukemia |
| 2947 | ZFYVE9 | TGF-beta Signaling |
2948 rows × 2 columns
The obtained resource can easily be converted to mouse gene symbols.
m_net = dc.op.translate(
net,
target_organism="mouse",
)
m_net
| genesymbol | pathway | |
|---|---|---|
| 0 | Abl1 | Cell cycle: G2/M phase transition |
| 1 | Ace | Focal segmental glomerulosclerosis |
| 2 | Actb | Axon guidance |
| 3 | Actg1 | Axon guidance |
| 4 | Actn1 | Axon guidance |
| ... | ... | ... |
| 1436 | Yy1 | NOTCH Signaling |
| 1437 | Zap70 | T cell activation |
| 1438 | Zap70 | P38 Signaling |
| 1439 | Zbtb16 | Acute Myeloid Leukemia |
| 1440 | Zfyve9 | TGF-beta Signaling |
1441 rows × 2 columns
Note
Homology conversion may result in the gain or loss
of certain genes when mapping between organisms.
Adjust the one_to_many parameter to make the behavior more or less strict.
Next, the conversion is performed for the fruit fly.
f_net = dc.op.translate(
net,
target_organism="fruitfly",
)
f_net
| genesymbol | pathway | |
|---|---|---|
| 0 | Abl | Cell cycle: G2/M phase transition |
| 1 | Ance | Focal segmental glomerulosclerosis |
| 2 | Acer | Focal segmental glomerulosclerosis |
| 3 | Ance-2 | Focal segmental glomerulosclerosis |
| 4 | Ance-3 | Focal segmental glomerulosclerosis |
| ... | ... | ... |
| 1024 | yki | Hippo Signaling |
| 1025 | 14-3-3zeta | SAPK/JNK Signaling |
| 1026 | pho | NOTCH Signaling |
| 1027 | phol | NOTCH Signaling |
| 1028 | Sara | TGF-beta Signaling |
1029 rows × 2 columns
Additionaly, all database functions in decoupler directly accept the parametter organism,
which under the hood it runs decoupler.op.translate.
dc.op.resource("SIGNOR", organism="zebrafish")
| genesymbol | pathway | |
|---|---|---|
| 0 | abl1 | Cell cycle: G2/M phase transition |
| 1 | ace | Focal segmental glomerulosclerosis |
| 2 | actb2 | Axon guidance |
| 3 | actb1 | Axon guidance |
| 4 | actn1 | Axon guidance |
| ... | ... | ... |
| 1981 | zap70 | T cell activation |
| 1982 | zap70 | P38 Signaling |
| 1983 | zbtb16a | Acute Myeloid Leukemia |
| 1984 | zbtb16b | Acute Myeloid Leukemia |
| 1985 | zfyve9a | TGF-beta Signaling |
1986 rows × 2 columns
dc.op.progeny(organism="anole_lizard")
| source | target | weight | padj | |
|---|---|---|---|---|
| 0 | Androgen | tmprss2 | 11.490631 | 2.384806e-47 |
| 1 | Androgen | nkx3-1 | 10.622551 | 2.205102e-44 |
| 2 | Androgen | mboat2 | 10.472733 | 4.632376e-44 |
| 3 | Androgen | SLC38A4 | 7.363805 | 1.253071e-39 |
| 4 | Androgen | mtmr9 | 6.130646 | 2.534403e-38 |
| ... | ... | ... | ... | ... |
| 43409 | p53 | enpp2 | 2.771405 | 4.993215e-02 |
| 43410 | p53 | arrdc4 | 3.494328 | 4.996747e-02 |
| 43411 | p53 | myo1b | -1.148057 | 4.997905e-02 |
| 43412 | p53 | ctsc | -1.784693 | 4.998864e-02 |
| 43413 | p53 | naa50 | -1.435013 | 4.998884e-02 |
43414 rows × 4 columns
dc.op.collectri(organism="s.cerevisiae")
| source | target | weight | resources | references | sign_decision | |
|---|---|---|---|---|---|---|
| 0 | SPT15 | TOA1 | 1.0 | ExTRI | 10078202;10523649;10581267;10617594;10675336;1... | default activation |
| 1 | TOA1 | SPT15 | 1.0 | TRRUST | 12818428 | default activation |
| 2 | MOT1 | SPT15 | -1.0 | TRRUST | 14988402;15509807;16858867;20627952 | PMID |
| 3 | SPT15 | MOT1 | 1.0 | ExTRI | 10082549 | default activation |
| 4 | MCM1 | MCM1 | 1.0 | ExTRI;NTNU.Curated | 10330138;10602487;15531578;17629633;8663310;92... | PMID |
| ... | ... | ... | ... | ... | ... | ... |
| 362 | COY1 | RDI1 | 1.0 | Pavlidis2021 | 19635798 | PMID |
| 363 | COY1 | POL1 | 1.0 | Pavlidis2021 | 12438259;12665598;18347061 | PMID |
| 364 | HAP2 | SSA4 | 1.0 | Pavlidis2021 | 24041570 | PMID |
| 365 | HAP3 | SSA4 | 1.0 | Pavlidis2021 | 24041570 | PMID |
| 366 | HAP5 | SSA4 | 1.0 | Pavlidis2021 | 24041570 | PMID |
367 rows × 6 columns