medcat.cdb.cdb
Classes:
-
CDB–
Attributes:
-
logger–
CDB
CDB(config: Config)
Bases: AbstractSerialisable
Methods:
-
add_names–Adds a name to an existing concept.
-
add_types–Add type info to CDB.
-
filter_by_cui–Subset the core CDB fields (dictionaries/maps).
-
get_basic_info– -
get_cui2count_train– -
get_hash– -
get_init_attrs– -
get_name–Returns preferred name if it exists, otherwise it will return
-
get_name2count_train– -
has_subname–Whether the CDB has the specified subname.
-
load–Load the CDB off disk.
-
remove_cui–This function takes a CUI and removes it the CDB.
-
remove_cuis_bulk– -
reset_training–Will remove all training efforts - in other words all embeddings
-
save–Save CDB at path.
-
weighted_average_function–Get the weighted average for steop.
Attributes:
-
addl_info(dict[str, Any]) – -
config– -
cui2info(dict[str, CUIInfo]) – -
has_changed_names– -
is_dirty– -
name2info(dict[str, NameInfo]) – -
token_counts(dict[str, int]) – -
type_id2info(dict[str, TypeInfo]) –
Source code in medcat-v2/medcat/cdb/cdb.py
29 30 31 32 33 34 35 36 37 38 | |
config
instance-attribute
config = config
has_changed_names
instance-attribute
has_changed_names = False
is_dirty
instance-attribute
is_dirty = False
add_names
add_names(cui: str, names: dict[str, NameDescriptor], name_status: str = AUTOMATIC, full_build: bool = False) -> None
Adds a name to an existing concept.
Parameters:
-
(cuistr) –Concept ID or unique identifier in this database, all concepts that have the same CUI will be merged internally.
-
(namesdict[str, NameDescriptor]) –Names for this concept, or the value that if found in free text can be linked to this concept. Names is an dict like:
{name: {'tokens': tokens, 'snames': snames, 'raw_name': raw_name}, ...}Names should be generated by helper function 'medcat.preprocessing.cleaners.prepare_name' -
(name_statusstr, default:AUTOMATIC) –One of
P,N,A. Defaults to 'A'. -
(full_buildbool, default:False) –If True the dictionary self.addl_info will also be populated, contains a lot of extra information about concepts, but can be very memory consuming. This is not necessary for normal functioning of MedCAT (Default value
False).
Source code in medcat-v2/medcat/cdb/cdb.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
add_types
Add type info to CDB.
Parameters:
Source code in medcat-v2/medcat/cdb/cdb.py
104 105 106 107 108 109 110 111 | |
filter_by_cui
filter_by_cui(cuis_to_keep: Collection[str]) -> None
Subset the core CDB fields (dictionaries/maps).
Note that this will potenitally keep a bit more CUIs then in cuis_to_keep. It will first find all names that link to the cuis_to_keep and then find all CUIs that link to those names and keep all of them.
This also will not remove any data from cdb.addl_info - as this field can contain data of unknown structure.
Parameters:
-
(cuis_to_keepCollection[str]) –CUIs that will be kept, the rest will be removed (not completely, look above).
Raises:
-
Exception–If no snames and subsetting is not possible.
Source code in medcat-v2/medcat/cdb/cdb.py
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 | |
get_basic_info
get_basic_info() -> CDBInfo
Source code in medcat-v2/medcat/cdb/cdb.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 | |
get_cui2count_train
Source code in medcat-v2/medcat/cdb/cdb.py
451 452 453 454 455 | |
get_hash
get_hash() -> str
Source code in medcat-v2/medcat/cdb/cdb.py
463 464 465 466 467 468 469 470 471 472 | |
get_init_attrs
classmethod
Source code in medcat-v2/medcat/cdb/cdb.py
40 41 42 | |
get_name
Returns preferred name if it exists, otherwise it will return the longest name assigned to the concept.
Parameters:
-
(cuistr) –Concept ID or unique identifier in this database.
Returns:
-
str(str) –The name of the concept.
Source code in medcat-v2/medcat/cdb/cdb.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
get_name2count_train
Source code in medcat-v2/medcat/cdb/cdb.py
457 458 459 460 461 | |
has_subname
Whether the CDB has the specified subname.
Parameters:
-
(namestr) –The subname to check.
Returns:
-
bool(bool) –Whether the subname is present in this CDB.
Source code in medcat-v2/medcat/cdb/cdb.py
51 52 53 54 55 56 57 58 59 60 61 62 63 | |
load
classmethod
load(path: str, perform_fixes: bool = True) -> CDB
Load the CDB off disk.
This can load a legacy (v1) CDB (.dat) or a v2 CDB either in its folder format or the .zip format. The distinction is made automatically.
Parameters:
-
(pathstr) –The path to the CDB.
-
(perform_fixesbool, default:True) –Whether to perform fixes such as original names issue. Defaults to True.
Raises:
-
LegacyConversionDisabledError–If when a legacy model is found and conversion is not allowed.
-
ValueError–If the loaded object isn't a CDB.
Returns:
-
CDB(CDB) –The loaded CDB.
Source code in medcat-v2/medcat/cdb/cdb.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 | |
remove_cui
This function takes a CUI and removes it the CDB.
It also removes the CUI from name specific per_cui_status maps as well as well as removes all the names that do not correspond to any CUIs after the removal of this one.
Parameters:
-
(cuistr) –The CUI to remove.
Source code in medcat-v2/medcat/cdb/cdb.py
377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
remove_cuis_bulk
Source code in medcat-v2/medcat/cdb/cdb.py
357 358 359 360 361 | |
reset_training
reset_training() -> None
Will remove all training efforts - in other words all embeddings that are learnt for concepts in the current CDB. Please note that this does not remove synonyms (names) that were potentially added during supervised/online learning.
Source code in medcat-v2/medcat/cdb/cdb.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 | |
save
save(save_path: str, serialiser: Union[str, AvailableSerialisers] = dill, overwrite: bool = False, as_zip: Union[bool, Literal['auto']] = 'auto') -> None
Save CDB at path.
Parameters:
-
(save_pathstr) –The path to save at.
-
(serialiserUnion[str, AvailableSerialisers], default:dill) –The serialiser. Defaults to AvailableSerialisers.dill.
-
(overwritebool, default:False) –Whether to allow overwriting existing files. Defaults to False.
-
(as_zipUnion[bool, Literal['auto']], default:'auto') –Whether to serialise the CDB as a zip.
Source code in medcat-v2/medcat/cdb/cdb.py
500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 | |
weighted_average_function
Get the weighted average for steop.
Parameters:
-
(stepint) –The steop.
Returns:
-
float(float) –The weighted average.
Source code in medcat-v2/medcat/cdb/cdb.py
93 94 95 96 97 98 99 100 101 102 | |