Skip to main content
The 2024 Developer Survey results are live! See the results

Using medspaCy with target rules from Metathesaurus

Created
Active
Viewed 26 times
0 replies
0

Before I start this discussion, here are some useful links that could provide some context:

https://github.com/medspacy/medspacy

https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html

I plan on using medspaCy (a package of spaCy specifically for clinical notes) to run concept extraction. MedspaCy can extract the following labels:

  • "PROBLEM"
  • "TREATMENT"
  • "TEST"

Normally, the target rules have to be manually assigned, but I think this is where Metathesaurus (from UMLS) can be useful - it can group together similar medical concepts rather than hard coding all the names, although in some cases it might be useful to use spaCy's rule-based matching system. Extracted concepts can be visualized with displacy.

The issue is, I'm not sure how exactly to implement this in Python. Metathesaurus exports files in the Rich Release Format (RRF), but turning this into something I can utilize in Python is a task in itself. I've explored using this Python UMLS source parser:

https://github.com/DATEXIS/UMLSParser

However, it doesn't seem to work. Does anyone have experience with Metathesaurus, UMLS, or MetaMap and can give me some pointers? It would be greatly appreciated.