Cross-Species Gene Symbol Mapping

Cross-Species Gene Symbol Mapping#

The majority of prior knowledge in OmniPath [TureiKSR16] (and similar databases) is based on human data and thus uses human gene symbols.

However, gene homology can be used to convert these symbols to those of other organisms using decoupler. To achieve this, it uses orthology tables extracted from the HCOP database [YGJB21].

The following organisms are currently supported.

import decoupler as dc

dc.op.show_organisms()
['anole_lizard',
 'c.elegans',
 'cat',
 'cattle',
 'chicken',
 'chimpanzee',
 'dog',
 'fruitfly',
 'horse',
 'macaque',
 'mouse',
 'opossum',
 'pig',
 'platypus',
 'rat',
 's.cerevisiae',
 's.pombe',
 'xenopus',
 'zebrafish']

To demonstrate this functionality in decoupler, here the SIGNOR database (which uses human gene symbols) is loaded.

net = dc.op.resource("SIGNOR")
net
genesymbol pathway
0 ABL1 Cell cycle: G2/M phase transition
1 ACE Focal segmental glomerulosclerosis
2 ACTB Axon guidance
3 ACTN1 Axon guidance
4 ACTN1 Glutamatergic synapse
... ... ...
2943 YY1 NOTCH Signaling
2944 ZAP70 T cell activation
2945 ZAP70 P38 Signaling
2946 ZBTB16 Acute Myeloid Leukemia
2947 ZFYVE9 TGF-beta Signaling

2948 rows × 2 columns

The obtained resource can easily be converted to mouse gene symbols.

m_net = dc.op.translate(
    net,
    target_organism="mouse",
)
m_net
genesymbol pathway
0 Abl1 Cell cycle: G2/M phase transition
1 Ace Focal segmental glomerulosclerosis
2 Actb Axon guidance
3 Actg1 Axon guidance
4 Actn1 Axon guidance
... ... ...
1436 Yy1 NOTCH Signaling
1437 Zap70 T cell activation
1438 Zap70 P38 Signaling
1439 Zbtb16 Acute Myeloid Leukemia
1440 Zfyve9 TGF-beta Signaling

1441 rows × 2 columns

Note

Homology conversion may result in the gain or loss of certain genes when mapping between organisms. Adjust the one_to_many parameter to make the behavior more or less strict.

Next, the conversion is performed for the fruit fly.

f_net = dc.op.translate(
    net,
    target_organism="fruitfly",
)
f_net
genesymbol pathway
0 Abl Cell cycle: G2/M phase transition
1 Ance Focal segmental glomerulosclerosis
2 Acer Focal segmental glomerulosclerosis
3 Ance-2 Focal segmental glomerulosclerosis
4 Ance-3 Focal segmental glomerulosclerosis
... ... ...
1024 yki Hippo Signaling
1025 14-3-3zeta SAPK/JNK Signaling
1026 pho NOTCH Signaling
1027 phol NOTCH Signaling
1028 Sara TGF-beta Signaling

1029 rows × 2 columns

Additionaly, all database functions in decoupler directly accept the parametter organism, which under the hood it runs decoupler.op.translate.

dc.op.resource("SIGNOR", organism="zebrafish")
genesymbol pathway
0 abl1 Cell cycle: G2/M phase transition
1 ace Focal segmental glomerulosclerosis
2 actb2 Axon guidance
3 actb1 Axon guidance
4 actn1 Axon guidance
... ... ...
1981 zap70 T cell activation
1982 zap70 P38 Signaling
1983 zbtb16a Acute Myeloid Leukemia
1984 zbtb16b Acute Myeloid Leukemia
1985 zfyve9a TGF-beta Signaling

1986 rows × 2 columns

dc.op.progeny(organism="anole_lizard")
source target weight padj
0 Androgen tmprss2 11.490631 2.384806e-47
1 Androgen nkx3-1 10.622551 2.205102e-44
2 Androgen mboat2 10.472733 4.632376e-44
3 Androgen SLC38A4 7.363805 1.253071e-39
4 Androgen mtmr9 6.130646 2.534403e-38
... ... ... ... ...
43409 p53 enpp2 2.771405 4.993215e-02
43410 p53 arrdc4 3.494328 4.996747e-02
43411 p53 myo1b -1.148057 4.997905e-02
43412 p53 ctsc -1.784693 4.998864e-02
43413 p53 naa50 -1.435013 4.998884e-02

43414 rows × 4 columns

dc.op.collectri(organism="s.cerevisiae")
source target weight resources references sign_decision
0 SPT15 TOA1 1.0 ExTRI 10078202;10523649;10581267;10617594;10675336;1... default activation
1 TOA1 SPT15 1.0 TRRUST 12818428 default activation
2 MOT1 SPT15 -1.0 TRRUST 14988402;15509807;16858867;20627952 PMID
3 SPT15 MOT1 1.0 ExTRI 10082549 default activation
4 MCM1 MCM1 1.0 ExTRI;NTNU.Curated 10330138;10602487;15531578;17629633;8663310;92... PMID
... ... ... ... ... ... ...
362 COY1 RDI1 1.0 Pavlidis2021 19635798 PMID
363 COY1 POL1 1.0 Pavlidis2021 12438259;12665598;18347061 PMID
364 HAP2 SSA4 1.0 Pavlidis2021 24041570 PMID
365 HAP3 SSA4 1.0 Pavlidis2021 24041570 PMID
366 HAP5 SSA4 1.0 Pavlidis2021 24041570 PMID

367 rows × 6 columns