medcat.tokenizing.regex_impl.tokenizer
Classes:
-
Document– -
Entity– -
RegexTokenizer– -
Token–
Document
Document(text: str, tokens: Optional[list[MutableToken]] = None)
Methods:
-
get_addon_data– -
get_available_addon_paths– -
get_tokens– -
has_addon_data– -
isupper– -
register_addon_path– -
set_addon_data–
Attributes:
-
base(BaseDocument) – -
linked_ents(list[MutableEntity]) – -
ner_ents(list[MutableEntity]) – -
text–
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
222 223 224 225 226 227 | |
text
instance-attribute
text = text
get_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
279 280 281 282 | |
get_available_addon_paths
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
284 285 286 | |
get_tokens
get_tokens(start_index: int, end_index: int) -> list[MutableToken]
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
256 257 258 259 260 261 262 263 | |
has_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
276 277 | |
isupper
isupper() -> bool
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
268 269 | |
register_addon_path
classmethod
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
288 289 290 291 292 | |
set_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
271 272 273 274 | |
Entity
Entity(document: Document, text: str, start_index: int, end_index: int, start_char_index: int, end_char_index: int)
Methods:
-
get_addon_data– -
get_available_addon_paths– -
has_addon_data– -
register_addon_path– -
set_addon_data–
Attributes:
-
ENTITY_INFO_PREFIX– -
base(BaseEntity) – -
confidence(float) – -
context_similarity(float) – -
cui– -
detected_name– -
end_char_index(int) – -
end_index(int) – -
id– -
label(int) – -
link_candidates(list[str]) – -
start_char_index(int) – -
start_index(int) – -
text(str) –
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
ENTITY_INFO_PREFIX
class-attribute
instance-attribute
ENTITY_INFO_PREFIX = 'Entity:'
cui
instance-attribute
cui = ''
detected_name
instance-attribute
detected_name = ''
id
instance-attribute
id = -1
get_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
179 180 181 182 | |
get_available_addon_paths
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
184 185 186 | |
has_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
176 177 | |
register_addon_path
classmethod
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
188 189 190 191 192 193 194 195 196 197 198 | |
set_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
171 172 173 174 | |
RegexTokenizer
Bases: BaseTokenizer
Methods:
Attributes:
-
REGEX–
REGEX
class-attribute
instance-attribute
REGEX = compile('(([^a-zA-Z0-9\\s]+|\\b\\w+\\b|\\S+)\\s?)')
create_entity
create_entity(doc: MutableDocument, token_start_index: int, token_end_index: int, label: str) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
331 332 333 334 335 336 337 | |
create_new_tokenizer
classmethod
create_new_tokenizer(config: Config) -> RegexTokenizer
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
366 367 368 | |
entity_from_tokens
entity_from_tokens(tokens: list[MutableToken]) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
342 343 344 345 346 347 348 | |
get_doc_class
get_doc_class() -> Type[MutableDocument]
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
370 371 | |
get_entity_class
get_entity_class() -> Type[MutableEntity]
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
373 374 | |
Token
Token(document: Document, text: str, _text_with_ws: str, start_index: int, token_index: int, is_punct: bool, to_skip: bool)
Attributes:
-
base(BaseToken) – -
char_index(int) – -
index(int) – -
is_digit(bool) – -
is_punctuation(bool) – -
is_stop(bool) – -
is_upper(bool) – -
lemma(str) – -
lower(str) – -
norm(str) – -
tag(Optional[str]) – -
text(str) – -
text_versions(list[str]) – -
text_with_ws(str) – -
to_skip(bool) –
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |