-
Notifications
You must be signed in to change notification settings - Fork 5
Description
@hammer, I put together a notebook to start exploring how well embeddings might work to infer dimensions of T cell typing, beyond protein expression and general phenotypic qualifiers (exhausted, activated, antigen-specific, etc.).
To get a basic sense of that variety, I did what they did in NormCo using the summation of token embedding vectors for noun phrases from word2vec trained on PMC/PubMed. The embedding projection here gives some interesting clustering:
PMC/PubMed T Cell Embedding Projection
Zooming in on the part I mapped out a bit with the annotations shows fairly broad categorizations like:
My take after hovering over a bunch of those groups is that these seem to be common dimensions for the descriptions:
- Expression Markers (CD4+CD8-)
- Primary Phenotype (effector, memory, helper, regulatory)
- Engineered Modifications (KO, WT, CART, transduced, induced, transgenic)
- Immunogenic Functions (tumor-infiltrating, neoplastic, anti-viral, anti-fungal, anti-inflammatory, vaccine)
- Physical Characteristics (adherent, migrating, small, large)
- Specificity (LCMV, HIV, HPV, Helicobacter, leishmaniasis, melanoma, self-reactive)
- Treatment (exposed, control, treated, untransformed, co-cultured, normal, responder, mock)
- Transplantation (allogeneic, autologous, GVHD, umbilical cord blood, bone marrow)
- Status (activated, quiescent, bystander, resting, proliferating, dead, dying, surviving, apoptotic, anergic)
- Tissue (parenchymal, tissue-resident, tonsil, spleen, skin, circulating, enteric, peripheral blood)
- Species (human, mouse, strain)
- Protocol (fresh, isolated, purified, bulk, plated, harvested)
- Age (young, aged, virgin, adult, early)
Do you think any of those make for useful characterizations we should keep in mind before trying to map the types to Cell Ontology or something like it?
