medcat.components.ner.trf.deid
De-identification model.
This describes a wrapper on the regular CAT model. The idea is to simplify the use of a DeId-specific model.
It tackles two use cases 1) Creation of a deid model 2) Loading and use of a deid model
I.e for use case 1:
Instead of: cat = CAT(cdb=ner.cdb, addl_ner=ner)
You can use: deid = DeIdModel.create(ner)
And for use case 2:
Instead of: cat = CAT.load_model_pack(model_pack_path) anon_text = deid_text(cat, text)
You can use: deid = DeIdModel.load_model_pack(model_pack_path) anon_text = deid.deid_text(text)
Or if/when structured output is desired: deid = DeIdModel.load_model_pack(model_pack_path) anon_doc = deid(text) # the spacy document
The wrapper also exposes some CAT parts directly: - config - cdb
Classes:
-
DeIdModel–The DeID model.
Functions:
-
match_rules–Match a set of rules - pat / cui combos as post processing labels.
-
merge_all_preds–Conveniance method to merge predictions from rule based and deID model
-
merge_preds–Merge predictions from rule based and deID model predictions.
Attributes:
-
logger–
DeIdModel
DeIdModel(cat: CAT)
Bases: NerModel
The DeID model.
This wraps a CAT instance and simplifies its use as a de-identification model.
It provides methods for creating one from a TransformersNER as well as loading from a model pack (along with some validation).
It also exposes some useful parts of the CAT it wraps such as the config and the concept database.
Methods:
-
create– -
deid_multi_text– -
deid_multi_texts– -
deid_text–Deidentify text and potentially redact information.
-
deid_text_with_entities–Deidentify text and potentially redact information.
-
load_model_pack–Load DeId model from model pack.
-
train–
Attributes:
-
cat–
Source code in medcat-v2/medcat/components/ner/trf/deid.py
68 69 | |
cat
instance-attribute
cat = cat
create
classmethod
create(cdb: CDB, cnf: ConfigTransformersNER)
Source code in medcat-v2/medcat/components/ner/trf/deid.py
199 200 201 202 203 204 205 206 | |
deid_multi_text
deid_multi_text(texts: Iterable[str], redact: bool = False, n_process: Optional[int] = None) -> list[str]
Source code in medcat-v2/medcat/components/ner/trf/deid.py
123 124 125 126 127 128 129 130 131 | |
deid_multi_texts
deid_multi_texts(texts: Iterable[str], redact: bool = False, n_process: Optional[int] = None) -> list[str]
Source code in medcat-v2/medcat/components/ner/trf/deid.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
deid_text
Deidentify text and potentially redact information.
De-identified text.
If redaction is enabled, identifiable entities will be
replaced with starts (e.g *****).
Otherwise, the replacement will be the CUI or in other words,
the type of information that was hidden (e.g [PATIENT]).
Parameters:
-
(textstr) –The text to deidentify.
-
(redactbool, default:False) –Whether to redact the information.
Returns:
-
str(str) –The deidentified text.
Source code in medcat-v2/medcat/components/ner/trf/deid.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
deid_text_with_entities
Deidentify text and potentially redact information.
De-identified text.
If redaction is enabled, identifiable entities will be
replaced with starts (e.g *****).
Otherwise, the replacement will be the CUI or in other words,
the type of information that was hidden (e.g [PATIENT]).
Parameters:
-
(textstr) –The text to deidentify.
-
(redactbool, default:False) –Whether to redact the information.
Returns:
-
tuple[str, Entities]–Tuple[str, Entities]: A tuple containing: - The deidentified text as a string. - The entities found and linked within the text.
Source code in medcat-v2/medcat/components/ner/trf/deid.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
load_model_pack
classmethod
Load DeId model from model pack.
The method first loads the CAT instance.
It then makes sure that the model pack corresponds to a valid DeId model.
Parameters:
-
(configOptional[dict], default:None) –Config for DeId model pack (primarily for stride of overlap window)
-
(model_pack_pathstr) –The model pack path.
Raises:
-
ValueError–If the model pack does not correspond to a DeId model.
Returns:
-
DeIdModel(DeIdModel) –The resulting DeI model.
Source code in medcat-v2/medcat/components/ner/trf/deid.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
train
Source code in medcat-v2/medcat/components/ner/trf/deid.py
71 72 73 74 | |
match_rules
match_rules(rules: list[tuple[str, str]], texts: list[str], cui2preferred_name: dict[str, str]) -> list[list[Entity]]
Match a set of rules - pat / cui combos as post processing labels.
Uses a cat DeID model for pretty name mapping.
Args:
rules (list[tuple[str, str]]): List of tuples of pattern and cui
texts (list[str]): List of texts to match rules on
cui2preferred_name (dict[str, str]): Dictionary of CUI to
preferred name, likely to be cat.cdb.cui2preferred_name.
Examples:
>>> cat = CAT.load_model_pack(model_pack_path)
...
>>> rules = [
('(123) 456-7890', '134'),
('1234567890', '134'),
('123.456.7890', '134'),
('1234567890', '134'),
('1234567890', '134'),
]
>>> texts = [
'My phone number is (123) 456-7890',
'My phone number is 1234567890',
'My phone number is 123.456.7890',
'My phone number is 1234567890',
]
>>> matches = match_rules(rules, texts, cat.cdb.cui2preferred_name)
Returns:
List[List[Dict]]: List of lists of predictions from match_rules
Source code in medcat-v2/medcat/components/ner/trf/deid.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | |
merge_all_preds
merge_all_preds(model_preds_by_text: list[list[Entity]], rule_matches_per_text: list[list[Entity]], accept_preds: bool = True) -> list[list[Entity]]
Conveniance method to merge predictions from rule based and deID model predictions.
Parameters:
-
(model_preds_by_textlist[list[Entity]]) –List of predictions from
cat.get_entities(), then[list(m['entities'].values()) for m in model_preds] -
(rule_matches_per_textlist[list[Entity]]) –List of predictions from output of running
match_rules -
(accept_predsbool, default:True) –Uses the predicted label from the model, model_preds_by_text, over the rule matches if they overlap. Defaults to using model preds over rules.
Returns:
list[list[Entity]]: List of lists of predictions from merge_all_preds
Source code in medcat-v2/medcat/components/ner/trf/deid.py
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 | |
merge_preds
merge_preds(model_preds: list[Entity], rule_matches: list[Entity], accept_preds: bool = True) -> list[Entity]
Merge predictions from rule based and deID model predictions.
Args:
model_preds (list[Entity]): predictions from cat.get_entities()
rule_matches (list[Entity]): predictions from output of running
match_rules on the same text
accept_preds (bool): uses the predicted label from the model,
model_preds, over the rule matches if they overlap.
Defaults to using model preds over rules.
Examples:
>>> # a list of predictions from cat.get_entities()
>>> model_preds = [
[
{'cui': '134', 'start': 10, 'end': 20, 'acc': 1.0,
'pretty_name': 'Phone Number'},
{'cui': '134', 'start': 25, 'end': 35, 'acc': 1.0,
'pretty_name': 'Phone Number'}
]
]
>>> # a list of predictions from match_rules
>>> rule_matches = [
[
{'cui': '134', 'start': 10, 'end': 20, 'acc': 1.0,
'pretty_name': 'Phone Number'},
{'cui': '134', 'start': 25, 'end': 35, 'acc': 1.0,
'pretty_name': 'Phone Number'}
]
]
>>> merged_preds = merge_preds(model_preds, rule_matches)
Returns:
list[Entity]: List of predictions from merge_preds
Source code in medcat-v2/medcat/components/ner/trf/deid.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 | |