medcat.config

Modules:

config –
config_meta_cat –
config_rel_cat –
config_transformers_ner –

Classes:

AnnotationOutput –

The annotation output part of the config
CDBMaker –

The Context Database (CDB) making part of the config
Components –
Config –
Linking –

The linking part of the config
LinkingFilters –

These describe the linking filters used alongside the model.
ModelMeta –
NLPConfig –
Ner –

The NER part of the config
Preprocessing –

The preprocessing part of the config
TrainingDescriptor –
UsageMonitor –

AnnotationOutput

Bases: SerialisableBaseModel

The annotation output part of the config

Attributes:

context_left (int) –
context_right (int) –
include_text_in_output (bool) –
lowercase_context (bool) –

context_left `class-attribute` `instance-attribute`

context_left: int = -1

context_right `class-attribute` `instance-attribute`

context_right: int = -1

include_text_in_output `class-attribute` `instance-attribute`

include_text_in_output: bool = False

lowercase_context `class-attribute` `instance-attribute`

lowercase_context: bool = True

CDBMaker

Bases: SerialisableBaseModel

The Context Database (CDB) making part of the config

Attributes:

min_letters_required (int) –

Minimum number of letters required in a name to be accepted
multi_separator (str) –

If multiple names or type_ids for a concept present in one row of a CSV,
name_versions (list) –

Name versions to be generated.
remove_parenthesis (int) –

Should preferred names with parenthesis be cleaned 0 means no,

min_letters_required `class-attribute` `instance-attribute`

min_letters_required: int = 2

Minimum number of letters required in a name to be accepted for a concept

multi_separator `class-attribute` `instance-attribute`

multi_separator: str = '|'

If multiple names or type_ids for a concept present in one row of a CSV, they are separated by the specified character.

name_versions `class-attribute` `instance-attribute`

name_versions: list = ['LOWER', 'CLEAN']

Name versions to be generated.

remove_parenthesis `class-attribute` `instance-attribute`

remove_parenthesis: int = 5

Should preferred names with parenthesis be cleaned 0 means no, else it means if longer than or equal e.g. Head (Body part) -> Head

Components

Bases: SerialisableBaseModel

Attributes:

addons (list[ComponentConfig]) –
comp_order (list[str]) –
linking (Linking) –
ner (Ner) –
tagging (ComponentConfig) –
token_normalizing (ComponentConfig) –

addons `class-attribute` `instance-attribute`

addons: list[ComponentConfig] = []

comp_order `class-attribute` `instance-attribute`

comp_order: list[str] = ['tagging', 'token_normalizing', 'ner', 'linking']

linking `class-attribute` `instance-attribute`

linking: Linking = Linking()

ner `class-attribute` `instance-attribute`

ner: Ner = Ner()

tagging `class-attribute` `instance-attribute`

tagging: ComponentConfig = ComponentConfig()

token_normalizing `class-attribute` `instance-attribute`

token_normalizing: ComponentConfig = ComponentConfig()

Config

Bases: SerialisableBaseModel

Attributes:

annotation_output (AnnotationOutput) –
cdb_maker (CDBMaker) –
components (Components) –
general (General) –
meta (ModelMeta) –
preprocessing (Preprocessing) –

annotation_output `class-attribute` `instance-attribute`

annotation_output: AnnotationOutput = AnnotationOutput()

cdb_maker `class-attribute` `instance-attribute`

cdb_maker: CDBMaker = CDBMaker()

components `class-attribute` `instance-attribute`

components: Components = Components()

general `class-attribute` `instance-attribute`

general: General = General()

meta `class-attribute` `instance-attribute`

meta: ModelMeta = Field(default_factory=ModelMeta)

preprocessing `class-attribute` `instance-attribute`

preprocessing: Preprocessing = Preprocessing()

Linking

Bases: ComponentConfig

The linking part of the config

Attributes:

additional (Optional[Any]) –

Some additional config for non-default linkers.
always_calculate_similarity (bool) –

Do we want to calculate context similarity even for concepts that are
calculate_dynamic_threshold (bool) –

Concepts below this similarity will be ignored. Type can be
context_ignore_center_tokens (bool) –

If true when the context of a concept is calculated (embedding)
context_vector_sizes (dict) –

Context vector sizes that will be calculated and used for linking
context_vector_weights (dict) –

Weight of each vector in the similarity score - make trainable at
devalue_linked_concepts (bool) –

When adding a positive example, should it also be treated as Negative
disamb_length_limit (int) –

All concepts below this will always be disambiguated
filter_before_disamb (bool) –

If True it will filter before doing disamb. Useful for the trainer.
filters (LinkingFilters) –

Filters
model_config –
negative_ignore_punct_and_num (bool) –

Do we ignore punct/num when negative sampling
negative_probability (float) –

Probability for the negative context to be added for each
optim (dict) –

Linear anneal
prefer_frequent_concepts (float) –

If >0 concepts that are more frequent will be preferred
prefer_primary_name (float) –

If >0 concepts for which a detection is its primary name
random_replacement_unsupervised (float) –

If <1 during unsupervised training the detected term will be randomly
similarity_threshold (float) –
similarity_threshold_type (str) –
subsample_after (int) –

DISABLED in code permanetly: Subsample during unsupervised
train (bool) –

Should it train or not, this is set automatically ignore in 99% of
train_count_threshold (int) –

Concepts that have seen less training examples than this will not be

additional `class-attribute` `instance-attribute`

additional: Optional[Any] = None

Some additional config for non-default linkers. E.g the 2-step linker uses this for alpha calculations and learning rate for type contexts.

always_calculate_similarity `class-attribute` `instance-attribute`

always_calculate_similarity: bool = False

Do we want to calculate context similarity even for concepts that are not ambiguous.

calculate_dynamic_threshold `class-attribute` `instance-attribute`

calculate_dynamic_threshold: bool = False

Concepts below this similarity will be ignored. Type can be static/dynamic - if dynamic each CUI has a different TH and it is calculated as the average confidence for that CUI * similarity_threshold. Take care that dynamic works only if the cdb was trained with calculate_dynamic_threshold = True.

context_ignore_center_tokens `class-attribute` `instance-attribute`

context_ignore_center_tokens: bool = False

If true when the context of a concept is calculated (embedding) the words making that concept are not taken into account

context_vector_sizes `class-attribute` `instance-attribute`

context_vector_sizes: dict = {'xlong': 27, 'long': 18, 'medium': 9, 'short': 3}

Context vector sizes that will be calculated and used for linking

context_vector_weights `class-attribute` `instance-attribute`

context_vector_weights: dict = {'xlong': 0.1, 'long': 0.4, 'medium': 0.4, 'short': 0.1}

Weight of each vector in the similarity score - make trainable at some point. Should add up to 1.

devalue_linked_concepts `class-attribute` `instance-attribute`

devalue_linked_concepts: bool = False

When adding a positive example, should it also be treated as Negative for concepts which link to the positive one via names (ambiguous names).

disamb_length_limit `class-attribute` `instance-attribute`

disamb_length_limit: int = 3

All concepts below this will always be disambiguated

filter_before_disamb `class-attribute` `instance-attribute`

filter_before_disamb: bool = False

If True it will filter before doing disamb. Useful for the trainer.

filters `class-attribute` `instance-attribute`

filters: LinkingFilters = LinkingFilters()

Filters

model_config `class-attribute` `instance-attribute`

model_config = ConfigDict(extra='allow')

negative_ignore_punct_and_num `class-attribute` `instance-attribute`

negative_ignore_punct_and_num: bool = True

Do we ignore punct/num when negative sampling

negative_probability `class-attribute` `instance-attribute`

negative_probability: float = 0.5

Probability for the negative context to be added for each positive addition

optim `class-attribute` `instance-attribute`

optim: dict = {'type': 'linear', 'base_lr': 1, 'min_lr': 5e-05}

Linear anneal

prefer_frequent_concepts `class-attribute` `instance-attribute`

prefer_frequent_concepts: float = 0.35

If >0 concepts that are more frequent will be preferred by a multiply of this amount

prefer_primary_name `class-attribute` `instance-attribute`

prefer_primary_name: float = 0.35

If >0 concepts for which a detection is its primary name will be preferred by that amount (0 to 1)

random_replacement_unsupervised `class-attribute` `instance-attribute`

random_replacement_unsupervised: float = 0.8

If <1 during unsupervised training the detected term will be randomly replaced with a probability of 1 - random_replacement_unsupervised Replaced with a synonym used for that term

similarity_threshold `class-attribute` `instance-attribute`

similarity_threshold: float = 0.25

similarity_threshold_type `class-attribute` `instance-attribute`

similarity_threshold_type: str = 'static'

subsample_after `class-attribute` `instance-attribute`

subsample_after: int = 30000

DISABLED in code permanetly: Subsample during unsupervised training if a concept has received more than

train `class-attribute` `instance-attribute`

train: bool = True

Should it train or not, this is set automatically ignore in 99% of cases and do not set manually

train_count_threshold `class-attribute` `instance-attribute`

train_count_threshold: int = 1

Concepts that have seen less training examples than this will not be used for similarity calculation and will have a similarity of -1.

LinkingFilters

LinkingFilters(**data)

Bases: SerialisableBaseModel

These describe the linking filters used alongside the model.

When no CUIs nor excluded CUIs are specified (the sets are empty), all CUIs are accepted. If there are CUIs specified then only those will be accepted. If there are excluded CUIs specified, they are excluded.

In some cases, there are extra filters as well as MedCATtrainer (MCT) export filters. These are expected to follow the following: extra_cui_filter ⊆ MCT filter ⊆ Model/config filter

While any other CUIs can be included in the the extra CUI filter or the MCT filter, they would not have any real effect.

Methods:

check_filters –

Checks is a CUI in the filters

Attributes:

cuis (set[str]) –
cuis_exclude (set[str]) –

Source code in medcat-v2/medcat/config/config.py

def __init__(self, **data):
    if 'cuis' in data:
        cuis = data['cuis']
        if isinstance(cuis, dict) and len(cuis) == 0:
            logger.warning("Loading an old model where "
                           "config.linking.filters.cuis has been "
                           "dict to an empty dict instead of an empty "
                           "set. Converting the dict to a set in memory "
                           "as that is what is expected. Please consider "
                           "saving the model again.")
            data['cuis'] = set(cuis.keys())
    super().__init__(**data)

cuis `class-attribute` `instance-attribute`

cuis: set[str] = set()

cuis_exclude `class-attribute` `instance-attribute`

cuis_exclude: set[str] = set()

check_filters

check_filters(cui: str) -> bool

Checks is a CUI in the filters

Parameters:

cui
(str) –

The CUI in question

Returns:

bool ( bool ) –

True if the CUI is allowed

Source code in medcat-v2/medcat/config/config.py

def check_filters(self, cui: str) -> bool:
    """Checks is a CUI in the filters

    Args:
        cui (str): The CUI in question

    Returns:
        bool: True if the CUI is allowed
    """
    if cui in self.cuis or not self.cuis:
        return cui not in self.cuis_exclude
    else:
        return False

ModelMeta

Bases: SerialisableBaseModel

Methods:

add_sup_training –

Add supervised training information based on data.
add_unsup_training –

Add unsupervised training information based on data.
mark_saved_now –
prepare_and_report_training –

Context manager for preparing training.

Attributes:

description (str) –
hash (str) –
history (list[str]) –
last_saved (datetime) –
location (str) –
medcat_version (str) –
ontology (list[str]) –
saved_environ (Environment) –
sup_trained (list[TrainingDescriptor]) –
unsup_trained (list[TrainingDescriptor]) –

description `class-attribute` `instance-attribute`

description: str = 'N/A'

hash `class-attribute` `instance-attribute`

hash: str = ''

history `class-attribute` `instance-attribute`

history: list[str] = Field(default_factory=list)

last_saved `class-attribute` `instance-attribute`

last_saved: datetime = Field(default_factory=now)

location `class-attribute` `instance-attribute`

location: str = 'N/A'

medcat_version `class-attribute` `instance-attribute`

medcat_version: str = ''

ontology `class-attribute` `instance-attribute`

ontology: list[str] = []

saved_environ `class-attribute` `instance-attribute`

saved_environ: Environment = Field(default_factory=get_environment_info)

sup_trained `class-attribute` `instance-attribute`

sup_trained: list[TrainingDescriptor] = []

unsup_trained `class-attribute` `instance-attribute`

unsup_trained: list[TrainingDescriptor] = []

add_sup_training

add_sup_training(start_time: datetime, num_docs: int, project_name: str) -> None

Add supervised training information based on data.

This will mark down the time taken for training by comparing

the start time to the current time.

This will be called for every project being trained separately.

So if there's a MCT export being trained with multiple projects, multiple different training instances will be recorded.

Parameters:

start_time
(datetime) –

The time at which the training was started.
num_docs
(int) –

The number of documents that were trained.
project_name
(str) –

The project name.

Source code in medcat-v2/medcat/config/config.py

def add_sup_training(self, start_time: datetime, num_docs: int,
                     project_name: str) -> None:
    """Add supervised training information based on data.

    NOTE: This will mark down the time taken for training by comparing
          the start time to the current time.

    NOTE: This will be called for every project being trained separately.
          So if there's a MCT export being trained with multiple projects,
          multiple different training instances will be recorded.

    Args:
        start_time (datetime): The time at which the training was started.
        num_docs (int): The number of documents that were trained.
        project_name (str): The project name.
    """
    self.sup_trained.append(TrainingDescriptor(
        train_time_start=start_time, train_time_end=datetime.now(),
        project_name=project_name, num_docs=num_docs, num_epochs=1
    ))

add_unsup_training

add_unsup_training(start_time: datetime, num_docs: int, num_epochs: int = 1, project_name: str = 'N/A')

Add unsupervised training information based on data.

This will mark down the time taken for training by comparing

the start time to the current time.

Parameters:

start_time
(datetime) –

The time at which the training was started.
num_docs
(int) –

The number of documents trained.
num_epochs
(int, default: 1 ) –

The number of epochs. Defaults to 1.
project_name
(str, default: 'N/A' ) –

The project name. Defaults to 'N/A'.

Source code in medcat-v2/medcat/config/config.py

def add_unsup_training(self, start_time: datetime, num_docs: int,
                       num_epochs: int = 1, project_name: str = 'N/A'):
    """Add unsupervised training information based on data.

    NOTE: This will mark down the time taken for training by comparing
          the start time to the current time.

    Args:
        start_time (datetime): The time at which the training was started.
        num_docs (int): The number of documents trained.
        num_epochs (int, optional): The number of epochs. Defaults to 1.
        project_name (str, optional): The project name. Defaults to 'N/A'.
    """
    self.unsup_trained.append(TrainingDescriptor(
        train_time_start=start_time, train_time_end=datetime.now(),
        project_name=project_name, num_docs=num_docs,
        num_epochs=num_epochs))

mark_saved_now

mark_saved_now()

Source code in medcat-v2/medcat/config/config.py

def mark_saved_now(self):
    self.last_saved = datetime.now()
    self.saved_environ = get_environment_info()
    self.medcat_version = medcat_version

prepare_and_report_training

prepare_and_report_training(data_iterator: C, num_epochs: int, supervised: bool = False, project_name: str = 'N/A') -> Iterator[C]

Context manager for preparing training.

This is used so that we can get the number of items in the data during training.

Parameters:

data_iterator
(C) –

The data to be trained.
num_epochs
(int) –

The number of epochs to be used.
supervised
(bool, default: False ) –

Whether training is supervised. Defaults to False.
project_name
(str, default: 'N/A' ) –

The project name. Defaults to 'N/A'.

Yields:

C –

Iterator[C]: The same data that was input.

Source code in medcat-v2/medcat/config/config.py

@contextmanager
def prepare_and_report_training(self,
                                data_iterator: C,
                                num_epochs: int,
                                supervised: bool = False,
                                project_name: str = 'N/A'
                                ) -> Iterator[C]:
    """Context manager for preparing training.

    This is used so that we can get the number of items in the data
    during training.

    Args:
        data_iterator (C): The data to be trained.
        num_epochs (int): The number of epochs to be used.
        supervised (bool, optional): Whether training is supervised.
            Defaults to False.
        project_name (str, optional): The project name. Defaults to 'N/A'.

    Yields:
        Iterator[C]: The same data that was input.
    """
    _names, _counts = [], [0]  # NOTE: 0 count for fallback

    def callback(name: str, count: int) -> None:
        _names.append(name)
        _counts.append(count)
    wrapped = callback_iterator(f"TRAIN-{id(data_iterator)}",
                                data_iterator, callback)
    start_time = datetime.now()
    try:
        yield cast(C, wrapped)
    finally:
        # even if something fails, log the count
        num_docs = _counts[1]
        if supervised:
            self.add_sup_training(start_time=start_time,
                                  num_docs=num_docs,
                                  project_name=project_name)
        else:
            self.add_unsup_training(start_time=start_time,
                                    num_docs=num_docs,
                                    num_epochs=num_epochs,
                                    project_name=project_name)
        if len(_names) != 1:
            logger.warning(
                "Something went wrong during %ssupervised training. "
                "The number of documents trained was unable to be "
                "clearly obtained. Counted %d names (%s) at %s",
                'un' if not supervised else '', len(_names), _names,
                _counts)

NLPConfig

Bases: SerialisableBaseModel

Attributes:

disabled_components (list) –

The list of components that will be disabled for the NLP.
faster_spacy_tokenization (bool) –

Allow skipping the spacy pipeline.
model_config –
modelname (str) –

What model will be used for tokenization.
provider (str) –

The NLP provider.

disabled_components `class-attribute` `instance-attribute`

disabled_components: list = ['ner', 'parser', 'vectors', 'textcat', 'entity_linker', 'sentencizer', 'entity_ruler', 'merge_noun_chunks', 'merge_entities', 'merge_subtokens']

The list of components that will be disabled for the NLP.

NB! For these changes to take effect, the pipe would need to be recreated.

faster_spacy_tokenization `class-attribute` `instance-attribute`

faster_spacy_tokenization: bool = False

Allow skipping the spacy pipeline.

If True, uses basic tokenization only (spacy.make_doc) for ~3-4x overall speedup. If False, uses full linguistic pipeline including POS tagging, lemmatization, and stopword detection.

Impact of fast_tokenization=True: - No part-of-speech tags: All tokens treated uniformly during normalization - No lemmatization: Words used in surface form (e.g., "running" vs "run") - No stopword detection: All tokens in multi-token spans considered; all tokens used in context vector calculation - Real world performance (in terms of precision and recall) is likely to be lower

When to use fast mode: - Processing very large datasets where speed is critical - Text is already clean/normalized - Minor drops in precision/recall (typically 1-3%) are acceptable

When to use full mode (default): - Maximum accuracy is required - Working with noisy or varied text - Proper linguistic analysis improves your specific use case

Benchmark on your data to determine if the speedup justifies the accuracy tradeoff.

PS: Only applicable for spacy based tokenizer.

NB! For these changes to take effect, the pipe would need to be recreated.

model_config `class-attribute` `instance-attribute`

model_config = ConfigDict(extra='allow', validate_assignment=True)

modelname `class-attribute` `instance-attribute`

modelname: str = 'en_core_web_md'

What model will be used for tokenization.

NB! For these changes to take effect, the pipe would need to be recreated.

provider `class-attribute` `instance-attribute`

provider: str = 'regex'

The NLP provider.

Currently only regex and spacy are natively supported.

NB! For these changes to take effect, the pipe would need to be recreated.

Ner

Bases: ComponentConfig

The NER part of the config

Attributes:

check_upper_case_names (bool) –

Check uppercase to distinguish uppercase and lowercase words that have
custom_cnf (Optional[Any]) –

The custom config for the component.
max_skip_tokens (int) –

When checking tokens for concepts you can have skipped tokens between
min_name_len (int) –

Do not detect names below this limit, skip them
model_config –
try_reverse_word_order (bool) –

Try reverse word order for short concepts (2 words max),
upper_case_limit_len (int) –

Any name shorter than this must be uppercase in the text to be

check_upper_case_names `class-attribute` `instance-attribute`

check_upper_case_names: bool = False

Check uppercase to distinguish uppercase and lowercase words that have a different meaning.

custom_cnf `class-attribute` `instance-attribute`

custom_cnf: Optional[Any] = None

The custom config for the component.

max_skip_tokens `class-attribute` `instance-attribute`

max_skip_tokens: int = 2

When checking tokens for concepts you can have skipped tokens between used ones (usually spaces, new lines etc). This number tells you how many skipped can you have.

min_name_len `class-attribute` `instance-attribute`

min_name_len: int = 3

Do not detect names below this limit, skip them

model_config `class-attribute` `instance-attribute`

model_config = ConfigDict(extra='allow')

try_reverse_word_order `class-attribute` `instance-attribute`

try_reverse_word_order: bool = False

Try reverse word order for short concepts (2 words max), e.g. heart disease -> disease heart

upper_case_limit_len `class-attribute` `instance-attribute`

upper_case_limit_len: int = 4

Any name shorter than this must be uppercase in the text to be considered. If it is not uppercase it will be skipped.

Preprocessing

Bases: SerialisableBaseModel

The preprocessing part of the config

Attributes:

do_not_normalize (set[str]) –

Should specific word types be normalized: e.g. running -> run
keep_punct (set) –

All punct will be skipped by default, here you can set what
max_document_length (int) –

Documents longer than this will be trimmed.
min_len_normalize (int) –

Nothing below this length will ever be normalized (input tokens or
skip_stopwords (bool) –

Should stopwords be skipped/ignored when processing input
stopwords (Optional[set]) –

If None the default set of stowords from spacy will be used.
words_to_skip (set) –

This words will be completely ignored from concepts and from the text

do_not_normalize `class-attribute` `instance-attribute`

do_not_normalize: set[str] = {'VBD', 'VBG', 'VBN', 'VBP', 'JJS', 'JJR'}

Should specific word types be normalized: e.g. running -> run Values are detailed part-of-speech tags. See: - https://spacy.io/usage/linguistic-features#pos-tagging - Label scheme section per model at https://spacy.io/models/en

keep_punct `class-attribute` `instance-attribute`

keep_punct: set = {'.', ':'}

All punct will be skipped by default, here you can set what will be kept

max_document_length `class-attribute` `instance-attribute`

max_document_length: int = 1000000

Documents longer than this will be trimmed.

NB! For these changes to take effect, the pipe would need to be recreated.

min_len_normalize `class-attribute` `instance-attribute`

min_len_normalize: int = 5

Nothing below this length will ever be normalized (input tokens or concept names), normalized means lemmatized in this case

skip_stopwords `class-attribute` `instance-attribute`

skip_stopwords: bool = False

Should stopwords be skipped/ignored when processing input

stopwords `class-attribute` `instance-attribute`

stopwords: Optional[set] = None

If None the default set of stowords from spacy will be used. This must be a Set.

NB! For these changes to take effect, the pipe would need to be recreated.

words_to_skip `class-attribute` `instance-attribute`

words_to_skip: set = {'nos'}

This words will be completely ignored from concepts and from the text (must be a Set)

TrainingDescriptor

Bases: SerialisableBaseModel

Attributes:

num_docs (int) –
num_epochs (int) –
project_name (Optional[str]) –
train_time_end (datetime) –
train_time_start (datetime) –

num_docs `instance-attribute`

num_docs: int

num_epochs `class-attribute` `instance-attribute`

num_epochs: int = 1

project_name `instance-attribute`

project_name: Optional[str]

train_time_end `instance-attribute`

train_time_end: datetime

train_time_start `instance-attribute`

train_time_start: datetime

UsageMonitor

Bases: SerialisableBaseModel

Attributes:

batch_size (int) –

Number of logged events to write at once.
enabled (Literal[True, False, 'auto']) –

Whether usage monitoring is enabled (True), disabled (False),
file_prefix (str) –

The prefix for logged files. The suffix will be the model hash.
log_folder (str) –

The folder which contains the usage logs. In certain situations,

batch_size `class-attribute` `instance-attribute`

batch_size: int = 100

Number of logged events to write at once.

enabled `class-attribute` `instance-attribute`

enabled: Literal[True, False, 'auto'] = False

Whether usage monitoring is enabled (True), disabled (False), or automatic ('auto'). If set to False, no logging is performed. If set to True, logs are saved in the location specified by log_folder. If set to 'auto', logs will be automatically enabled or disabled based on environmenta variable (MEDCAT_LOGS - setting it to False or 0 disabled logging) and distributed according to the OS preferred logs location (MEDCAT_LOGS_LOCATION). The defaults for the location are: - For Linux: ~/.local/share/medcat/logs/ - For Windows: C:\Users\%USERNAME%.cache\medcat\logs\

file_prefix `class-attribute` `instance-attribute`

file_prefix: str = 'usage_'

The prefix for logged files. The suffix will be the model hash.

log_folder `class-attribute` `instance-attribute`

log_folder: str = '.'

The folder which contains the usage logs. In certain situations, it may make sense to keep this separate from the overall logs. NOTE: Does not take affect if enabled is set to 'auto'

medcat.config

AnnotationOutput

context_left class-attribute instance-attribute

context_right class-attribute instance-attribute

include_text_in_output class-attribute instance-attribute

lowercase_context class-attribute instance-attribute

CDBMaker

min_letters_required class-attribute instance-attribute

multi_separator class-attribute instance-attribute

name_versions class-attribute instance-attribute

remove_parenthesis class-attribute instance-attribute

Components

addons class-attribute instance-attribute

comp_order class-attribute instance-attribute

linking class-attribute instance-attribute

ner class-attribute instance-attribute

tagging class-attribute instance-attribute

token_normalizing class-attribute instance-attribute

Config

annotation_output class-attribute instance-attribute

cdb_maker class-attribute instance-attribute

components class-attribute instance-attribute

general class-attribute instance-attribute

meta class-attribute instance-attribute

preprocessing class-attribute instance-attribute

Linking

additional class-attribute instance-attribute

always_calculate_similarity class-attribute instance-attribute

calculate_dynamic_threshold class-attribute instance-attribute

context_ignore_center_tokens class-attribute instance-attribute

context_vector_sizes class-attribute instance-attribute

context_vector_weights class-attribute instance-attribute

devalue_linked_concepts class-attribute instance-attribute

disamb_length_limit class-attribute instance-attribute

filter_before_disamb class-attribute instance-attribute

filters class-attribute instance-attribute

model_config class-attribute instance-attribute

negative_ignore_punct_and_num class-attribute instance-attribute

negative_probability class-attribute instance-attribute

optim class-attribute instance-attribute

prefer_frequent_concepts class-attribute instance-attribute

prefer_primary_name class-attribute instance-attribute

random_replacement_unsupervised class-attribute instance-attribute

similarity_threshold class-attribute instance-attribute

similarity_threshold_type class-attribute instance-attribute

subsample_after class-attribute instance-attribute

train class-attribute instance-attribute

train_count_threshold class-attribute instance-attribute

LinkingFilters

cuis class-attribute instance-attribute

cuis_exclude class-attribute instance-attribute

check_filters

cui

ModelMeta

description class-attribute instance-attribute

hash class-attribute instance-attribute

history class-attribute instance-attribute

last_saved class-attribute instance-attribute

location class-attribute instance-attribute

medcat_version class-attribute instance-attribute

ontology class-attribute instance-attribute

saved_environ class-attribute instance-attribute

sup_trained class-attribute instance-attribute

unsup_trained class-attribute instance-attribute

add_sup_training

start_time

num_docs

project_name

add_unsup_training

start_time

num_docs

num_epochs

project_name

mark_saved_now

prepare_and_report_training

data_iterator

num_epochs

supervised

project_name

NLPConfig

context_left `class-attribute` `instance-attribute`

context_right `class-attribute` `instance-attribute`

include_text_in_output `class-attribute` `instance-attribute`

lowercase_context `class-attribute` `instance-attribute`

min_letters_required `class-attribute` `instance-attribute`

multi_separator `class-attribute` `instance-attribute`

name_versions `class-attribute` `instance-attribute`

remove_parenthesis `class-attribute` `instance-attribute`

addons `class-attribute` `instance-attribute`

comp_order `class-attribute` `instance-attribute`

linking `class-attribute` `instance-attribute`

ner `class-attribute` `instance-attribute`

tagging `class-attribute` `instance-attribute`

token_normalizing `class-attribute` `instance-attribute`

annotation_output `class-attribute` `instance-attribute`

cdb_maker `class-attribute` `instance-attribute`

components `class-attribute` `instance-attribute`

general `class-attribute` `instance-attribute`

meta `class-attribute` `instance-attribute`

preprocessing `class-attribute` `instance-attribute`

additional `class-attribute` `instance-attribute`

always_calculate_similarity `class-attribute` `instance-attribute`

calculate_dynamic_threshold `class-attribute` `instance-attribute`

context_ignore_center_tokens `class-attribute` `instance-attribute`

context_vector_sizes `class-attribute` `instance-attribute`

context_vector_weights `class-attribute` `instance-attribute`

devalue_linked_concepts `class-attribute` `instance-attribute`

disamb_length_limit `class-attribute` `instance-attribute`

filter_before_disamb `class-attribute` `instance-attribute`

filters `class-attribute` `instance-attribute`

model_config `class-attribute` `instance-attribute`

negative_ignore_punct_and_num `class-attribute` `instance-attribute`

negative_probability `class-attribute` `instance-attribute`

optim `class-attribute` `instance-attribute`

prefer_frequent_concepts `class-attribute` `instance-attribute`

prefer_primary_name `class-attribute` `instance-attribute`

random_replacement_unsupervised `class-attribute` `instance-attribute`

similarity_threshold `class-attribute` `instance-attribute`

similarity_threshold_type `class-attribute` `instance-attribute`

subsample_after `class-attribute` `instance-attribute`

train `class-attribute` `instance-attribute`

train_count_threshold `class-attribute` `instance-attribute`

cuis `class-attribute` `instance-attribute`

cuis_exclude `class-attribute` `instance-attribute`

`cui`

description `class-attribute` `instance-attribute`

hash `class-attribute` `instance-attribute`

history `class-attribute` `instance-attribute`

last_saved `class-attribute` `instance-attribute`

location `class-attribute` `instance-attribute`

medcat_version `class-attribute` `instance-attribute`

ontology `class-attribute` `instance-attribute`

saved_environ `class-attribute` `instance-attribute`

sup_trained `class-attribute` `instance-attribute`

unsup_trained `class-attribute` `instance-attribute`

`start_time`

`num_docs`

`project_name`

`start_time`

`num_docs`

`num_epochs`

`project_name`

`data_iterator`

`num_epochs`

`supervised`

`project_name`

disabled_components `class-attribute` `instance-attribute`

faster_spacy_tokenization `class-attribute` `instance-attribute`

model_config `class-attribute` `instance-attribute`

modelname `class-attribute` `instance-attribute`

provider `class-attribute` `instance-attribute`

check_upper_case_names `class-attribute` `instance-attribute`

custom_cnf `class-attribute` `instance-attribute`

max_skip_tokens `class-attribute` `instance-attribute`

min_name_len `class-attribute` `instance-attribute`

model_config `class-attribute` `instance-attribute`

try_reverse_word_order `class-attribute` `instance-attribute`

upper_case_limit_len `class-attribute` `instance-attribute`

do_not_normalize `class-attribute` `instance-attribute`

keep_punct `class-attribute` `instance-attribute`