medcat.components.normalizing.normalizer
Classes:
-
BasicSpellChecker– -
TokenNormalizer–Will normalize all tokens in a spacy document.
Attributes:
BasicSpellChecker
Methods:
-
P–Probability of
word. -
candidates–Generate possible spelling corrections for word.
-
edits1–All edits that are one edit away from
word. -
edits2–All edits that are two edits away from
word. -
edits3–All edits that are two edits away from
word. -
fix–Most probable spelling correction for word.
-
known–The subset of
wordsthat appear in the dictionary of WORDS. -
raw_edits1– -
raw_edits2–
Attributes:
-
config– -
data_vocab– -
vocab–
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
17 18 19 20 21 | |
config
instance-attribute
config = config
data_vocab
instance-attribute
data_vocab = data_vocab
vocab
instance-attribute
vocab = cdb_vocab
P
Probability of word.
Parameters:
-
(wordstr) –The word in question.
Returns:
-
float(float) –The probability.
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
candidates
Generate possible spelling corrections for word.
Parameters:
-
(wordstr) –The word.
Returns:
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
edits1
All edits that are one edit away from word.
Parameters:
-
(wordstr) –The word.
Returns:
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
95 96 97 98 99 100 101 102 103 104 | |
edits2
All edits that are two edits away from word.
Parameters:
-
(wordstr) –The word to start from.
Returns:
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
150 151 152 153 154 155 156 157 158 159 | |
edits3
edits3(word)
All edits that are two edits away from word.
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
168 169 170 171 | |
fix
Most probable spelling correction for word.
Parameters:
-
(wordstr) –The word.
Returns:
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
known
The subset of words that appear in the dictionary of WORDS.
Parameters:
Returns:
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
84 85 86 87 88 89 90 91 92 93 | |
raw_edits1
classmethod
raw_edits1(word: str, use_diacritics: bool = False, return_ordered: Literal[False] = False) -> set[str]
raw_edits1(word: str, use_diacritics: bool = False, return_ordered: bool = False) -> Union[set[str], list[str]]
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
raw_edits2
classmethod
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
161 162 163 164 165 166 | |
TokenNormalizer
TokenNormalizer(nlp: BaseTokenizer, config: Config, cdb_vocab: dict[str, int], data_vocab: Optional[Vocab] = None)
Bases: AbstractCoreComponent
Will normalize all tokens in a spacy document.
Methods:
Attributes:
-
config– -
name– -
nlp– -
spell_checker–
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
179 180 181 182 183 184 | |
config
instance-attribute
config = config
name
class-attribute
instance-attribute
name = 'token_normalizer'
nlp
instance-attribute
nlp = nlp
create_new_component
classmethod
create_new_component(cnf: ComponentConfig, tokenizer: BaseTokenizer, cdb: CDB, vocab: Vocab, model_load_path: Optional[str]) -> TokenNormalizer
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
224 225 226 227 228 229 | |
get_type
get_type() -> CoreComponentType
Source code in medcat-v2/medcat/components/normalizing/normalizer.py
186 187 | |