medcat.preprocessors.cleaners
Classes:
Functions:
-
prepare_name–Generates different forms of a name. Will edit the provided
names
LCDBMaker
LPreprocessing
NameDescriptor
dataclass
UnknownTokenVersion
UnknownTokenVersion(version: str)
Bases: ValueError
Source code in medcat-v2/medcat/preprocessors/cleaners.py
114 115 | |
prepare_name
prepare_name(raw_name: str, nlp: BaseTokenizer, names: dict[str, NameDescriptor], configs: tuple[LGeneral, LPreprocessing, LCDBMaker]) -> dict[str, NameDescriptor]
Generates different forms of a name. Will edit the provided names
dictionary and add information generated from the name.
Parameters:
-
(nlpBaseTokenizer) –The tokenizer.
-
(namesdict[str, NameDescriptor]) –Dictionary of existing names for this concept in this row of a CSV. The new generated name versions and other required information will be added here.
-
(configstuple[LGeneral, LPreprocessing, LCDBMaker]) –Applicable configs for medcat.
Returns:
-
names(dict) –The updated dictionary of prepared names.
Source code in medcat-v2/medcat/preprocessors/cleaners.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |