Medical
oncept Annotation Tool (v2)
There's a number of breaking changes in MedCAT v2 compared to v1. Details are outlined here.
MedCAT(v2) can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on arXiv.
Official Docs here
Discussion Forum here
News
- MedCAT v2 beta [1. April 2025] MedCATv2 beta 0.1.5 was released 1. April 2025.
- Paper A New Public Corpus for Clinical Section Identification: MedSecId
- New Release [5. October 2022]**: Logging changes, and various small updates. Full changelog
- New Downloader [15. March 2022]: You can now download the latest SNOMED-CT and UMLS model packs via UMLS user authentication.
- New Feature and Tutorial [7. December 2021]: Exploring Electronic Health Records with MedCAT and Neo4j
- New Minor Release [20. October 2021] Introducing model packs, new faster multiprocessing for large datasets (100M+ documents) and improved MetaCAT.
- New Release [1. August 2021]: Upgraded MedCAT to use spaCy v3, new scispaCy models have to be downloaded - all old CDBs (compatble with MedCAT v1) will work without any changes.
- New Feature and Tutorial [8. July 2021]: Integrating 🤗 Transformers with MedCAT for biomedical NER+L
- General [1. April 2021]: MedCAT is upgraded to v1, unforunately this introduces breaking changes with older models (MedCAT v0.4), as well as potential problems with all code that used the MedCAT package. MedCAT v0.4 is available on the legacy branch and will still be supported until 1. July 2021 (with respect to potential bug fixes), after it will still be available but not updated anymore.
Demo
A demo application is available here.
Tutorials
Some guides on how to use MedCAT v2 are available at MedCAT Tutorials.
Related Projects
- MedCAT - the original version of MedCAT that this v2 is based one.
- MedCATtrainer - an interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model (MedCAT) for biomedical domain text.
- MedCATservice - implements the MedCAT NLP application as a service behind a REST API.
Install using PIP (Requires Python 3.10+)
Installation instructions are to follow upon a release of this version on PyPI.
Though installation is likely to be simply pip install "medcat>=2.0" at that time.
Currently the installation for the 2.0 release is simply:
pip install medcat
spacy, meta-cat, rel-cat, deid).
If you need them, they need to be specified in brackets, e.g:
pip install "medcat[spacy,meta-cat,rel-cat,deid]"
- Quickstart (MedCAT v2+):
from medcat.cat import CAT # Download the model_pack from the models section in the github repo. cat = CAT.load_model_pack('<path to downloaded zip file>') # Test it text = "My simple document with kidney failure" entities = cat.get_entities(text) print(entities) # To run unsupervised training over documents data_iterator = <your iterator> cat.train(data_iterator) #Once done, save the whole model_pack cat.create_model_pack(<save path>)
Models
SNOMED-CT and UMLS
Access to v2 models is upcoming. They will initially (probably) be converted models from v1.
Acknowledgements
Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS
The vocabulary was compiled from Wiktionary In total ~ 800K unique words
Powered By
A big thank you goes to spaCy and Hugging Face - who made life a million times easier.