Skip to content

medcat.config.config_transformers_ner

Classes:

ConfigTransformersNER

Bases: SerialisableBaseModel

The transformer NER config

Methods:

Attributes:

general class-attribute instance-attribute

general: General = General()

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='allow', validate_assignment=True)

get_hash

get_hash(hasher: Optional[Hasher] = None) -> str
Source code in medcat-v2/medcat/config/config_transformers_ner.py
43
44
45
46
47
48
def get_hash(self, hasher: Optional[Hasher] = None) -> str:
    if hasher is None:
        hasher = Hasher()
    for v in self.model_dump().values():
        hasher.update(v)
    return hasher.hexdigest()

General

Bases: SerialisableBaseModel

The general part of the Transformers NER config

Attributes:

chunking_overlap_window class-attribute instance-attribute

chunking_overlap_window: Optional[int] = 5

Size of the overlap window used for chunking

description class-attribute instance-attribute

description: str = 'No description'

Should provide a basic description of this MetaCAT model

last_train_on class-attribute instance-attribute

last_train_on: Optional[float] = None

model_config class-attribute instance-attribute

model_config = ConfigDict(extra='allow', validate_assignment=True, protected_namespaces=())

model_name class-attribute instance-attribute

model_name: str = 'roberta-base'

Can be path also

name class-attribute instance-attribute

name: str = 'deid'

ner_aggregation_strategy class-attribute instance-attribute

ner_aggregation_strategy: str = 'simple'

Agg strategy for HF pipeline for NER

pipe_batch_size_in_chars class-attribute instance-attribute

pipe_batch_size_in_chars: int = 20000000

How many characters are piped at once into the meta_cat class

seed class-attribute instance-attribute

seed: int = 13

test_size class-attribute instance-attribute

test_size: float = 0.2

verbose_metrics class-attribute instance-attribute

verbose_metrics: bool = False