medcat.pipeline
Modules:
-
pipeline–
Classes:
-
Pipeline–The pipeline for the NLP process.
Pipeline
Pipeline(cdb: CDB, vocab: Optional[Vocab], model_load_path: Optional[str], old_pipe: Optional[Pipeline] = None, addon_config_dict: Optional[dict[str, dict]] = None)
The pipeline for the NLP process.
This class is responsible to initial creation of the NLP document, as well as running through of all the components and addons.
Methods:
-
add_addon– -
entity_from_tokens–Get the entity from the list of tokens.
-
get_component–Get the core component by the component type.
-
get_doc–Get the document for this text.
-
iter_addons– -
iter_all_components– -
save_components–
Attributes:
-
cdb– -
config– -
tokenizer(BaseTokenizer) –The raw tokenizer (with no components).
-
tokenizer_with_tag(BaseTokenizer) –The tokenizer with the tagging component.
-
vocab(Vocab) –
Source code in medcat-v2/medcat/pipeline/pipeline.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
cdb
instance-attribute
cdb = cdb
config
instance-attribute
config = config
tokenizer_with_tag
property
tokenizer_with_tag: BaseTokenizer
The tokenizer with the tagging component.
add_addon
add_addon(addon: AddonComponent) -> None
Source code in medcat-v2/medcat/pipeline/pipeline.py
364 365 366 367 | |
entity_from_tokens
entity_from_tokens(tokens: list[MutableToken]) -> MutableEntity
Get the entity from the list of tokens.
This effectively turns a list of (consecutive) documents into an entity.
Parameters:
-
(tokenslist[MutableToken]) –The tokens to use.
Returns:
-
MutableEntity(MutableEntity) –The resulting entity.
Source code in medcat-v2/medcat/pipeline/pipeline.py
331 332 333 334 335 336 337 338 339 340 341 342 343 | |
get_component
get_component(ctype: CoreComponentType) -> CoreComponent
Get the core component by the component type.
Parameters:
-
(ctypeCoreComponentType) –The core component type.
Raises:
-
ValueError–If no component by that type is found.
Returns:
-
CoreComponent(CoreComponent) –The corresponding core component.
Source code in medcat-v2/medcat/pipeline/pipeline.py
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 | |
get_doc
get_doc(text: str) -> MutableDocument
Get the document for this text.
This essentially runs the tokenizer over the text.
Parameters:
-
(textstr) –The input text.
Returns:
-
MutableDocument(MutableDocument) –The resulting document.
Source code in medcat-v2/medcat/pipeline/pipeline.py
311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | |
iter_addons
iter_addons() -> Iterable[AddonComponent]
Source code in medcat-v2/medcat/pipeline/pipeline.py
399 400 | |
iter_all_components
iter_all_components() -> Iterable[BaseComponent]
Source code in medcat-v2/medcat/pipeline/pipeline.py
393 394 395 396 397 | |
save_components
save_components(serialiser_type: Union[AvailableSerialisers, str], components_folder: str) -> None
Source code in medcat-v2/medcat/pipeline/pipeline.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 | |