medcat.utils.regression.targeting

Classes:

FinalTarget –

The final target.
OptionSet –

The targeting option set.
PhraseChanger –

The phrase changer.
ProblematicOptionSetException –
TargetPlaceholder –

A class describing the options for a specific placeholder.
TargetedPhraseChanger –

The target phrase changer.
TranslationLayer –

The translation layer for translating:

Attributes:

logger –

logger `module-attribute`

logger = getLogger(__name__)

FinalTarget

Bases: BaseModel

The final target.

This involves the final phrase (which (potentially) has other placeholder replaced in it), the placeholder to be replaced, and the CUI and specific name being used.

Attributes:

cui (str) –
final_phrase (str) –
name (str) –
placeholder (str) –

cui `instance-attribute`

cui: str

final_phrase `instance-attribute`

final_phrase: str

name `instance-attribute`

name: str

placeholder `instance-attribute`

placeholder: str

OptionSet

Bases: BaseModel

The targeting option set.

This describes all the target placeholders and concepts needed.

Methods:

estimate_num_of_subcases –

Get the number of distinct subcases.
from_dict –

Construct a OptionSet instance from a dict.
get_preprocessors_and_targets –

Get the targeted phrase changers.
to_dict –

Convert the OptionSet to a dict.

Attributes:

allow_any_combinations (bool) –
options (list[TargetPlaceholder]) –

allow_any_combinations `class-attribute` `instance-attribute`

allow_any_combinations: bool = False

options `instance-attribute`

options: list[TargetPlaceholder]

estimate_num_of_subcases

estimate_num_of_subcases() -> int

Get the number of distinct subcases.

This includes ones that can be calculated without the knowledge of the underlying CDB. I.e it doesn't care for the number of names involved per CUI but only takes into account what is described in the option set itself.

If any combination is allowed, then the answer is the combination of the number of target concepts per option. If any combination is not allowed, then the answer is simply the number of target concepts for an option (they should all have the same number).

Returns:

int ( int ) –

Te number of subcases.

Source code in medcat-v2/medcat/utils/regression/targeting.py

def estimate_num_of_subcases(self) -> int:
    """Get the number of distinct subcases.

    This includes ones that can be calculated without the knowledge of the
    underlying CDB. I.e it doesn't care for the number of names involved
    per CUI but only takes into account what is described in the option
    set itself.

    If any combination is allowed, then the answer is the combination of
    the number of target concepts per option. If any combination is not
    allowed, then the answer is simply the number of target concepts for
    an option (they should all have the same number).

    Returns:
        int: Te number of subcases.
    """
    num_of_opts = len(self.options)
    if self.allow_any_combinations:
        total_cases = 1
        for cur_opt in self.options:
            total_cases *= len(cur_opt.target_cuis)
    else:
        total_cases = len(self.options[0].target_cuis)
    return num_of_opts * total_cases

from_dict `classmethod`

from_dict(section: dict[str, Any]) -> OptionSet

Construct a OptionSet instance from a dict.

The assumed structure is: { 'placeholders': [ { 'placeholder': , 'cuis': , 'prefname-only': 'true' }, ], 'any-combination': }

The prefname-only key is optional.

Parameters:

section
(dict[str, Any]) –

The dict to parse

Raises:

ProblematicOptionSetException –

If incorrect number of CUIs when not allowing any combination
ProblematicOptionSetException –

If placeholders not a list
ProblematicOptionSetException –

If multiple placehodlers with same place holder

Returns:

OptionSet ( OptionSet ) –

The resulting OptionSet

Source code in medcat-v2/medcat/utils/regression/targeting.py

@classmethod
def from_dict(cls, section: dict[str, Any]) -> 'OptionSet':
    """Construct a OptionSet instance from a dict.

    The assumed structure is:
    {
        'placeholders': [
            {
            'placeholder': <e.g {DIAGNOSIS}'>,
            'cuis': <the CUI>,
            'prefname-only': 'true'
            }, <potentially more>],
        'any-combination': <True or False>
    }

    The prefname-only key is optional.

    Args:
        section (dict[str, Any]): The dict to parse

    Raises:
        ProblematicOptionSetException:
            If incorrect number of CUIs when not allowing any combination
        ProblematicOptionSetException:
            If placeholders not a list
        ProblematicOptionSetException:
            If multiple placehodlers with same place holder

    Returns:
        OptionSet: The resulting OptionSet
    """
    options: list['TargetPlaceholder'] = []
    allow_any_in = section.get('any-combination', 'false')
    if isinstance(allow_any_in, str):
        allow_any_combinations = allow_any_in.lower() == 'true'
    elif isinstance(allow_any_in, bool):
        allow_any_combinations = allow_any_in
    else:
        raise ProblematicOptionSetException(
            f"Unknown 'any-combination' value: {allow_any_in}")
    if 'placeholders' not in section:
        raise ProblematicOptionSetException(
            "Misconfigured - no placeholders")
    section_placeholders = section['placeholders']
    if not isinstance(section_placeholders, list):
        raise ProblematicOptionSetException(
            "Misconfigured - placehodlers not a list "
            f"({section_placeholders})")
    used_ph = set()
    for part in section_placeholders:
        placeholder = part['placeholder']
        if not isinstance(placeholder, str):
            raise ProblematicOptionSetException(
                f"Unknown placeholder of type {type(placeholder)}. "
                "Expected a string. Perhaps you need to surrong the "
                "placeholder with single quotes (') in the yaml? "
                f"Received: {placeholder}")
        if placeholder in used_ph:
            raise ProblematicOptionSetException(
                "Misconfigured - multiple identical placeholders")
        used_ph.add(placeholder)
        target_cuis: list[str] = part['cuis']
        if not isinstance(target_cuis, list):
            raise ProblematicOptionSetException(
                f"Target CUIs not a list ({type(target_cuis)}): "
                f"{repr(target_cuis)}")
        if 'prefname-only' in part:
            opn = part['prefname-only']
            if isinstance(opn, bool):
                onlyprefnames = opn
            else:
                onlyprefnames = str(opn).lower() == 'true'
        else:
            onlyprefnames = False
        option = TargetPlaceholder(
            placeholder=placeholder, target_cuis=target_cuis,
            onlyprefnames=onlyprefnames)
        options.append(option)
    if not options:
        raise ProblematicOptionSetException(
            "Misconfigured - 0 placeholders found (empty list)")
    if not allow_any_combinations:
        # NOTE: need to have same number of target_cuis
        #       for each placeholder
        # NOTE: there needs to be at least on option / placeholder anyway
        nr_of_cuis = [len(opt.target_cuis) for opt in options]
        if not all(nr == nr_of_cuis[0] for nr in nr_of_cuis):
            raise ProblematicOptionSetException(
                "Unequal number of cuis when any-combination: false: "
                f"{nr_of_cuis}. When any-combination: false the number of "
                "CUIs for each placeholder should be equal.")
    return OptionSet(options=options,
                     allow_any_combinations=allow_any_combinations)

get_preprocessors_and_targets

get_preprocessors_and_targets(translation: TranslationLayer) -> Iterator[TargetedPhraseChanger]

Get the targeted phrase changers.

Parameters:

translation
(TranslationLayer) –

The translaton layer.

Yields:

TargetedPhraseChanger –

Iterator[TargetedPhraseChanger]: Thetarget phrase changers.

Source code in medcat-v2/medcat/utils/regression/targeting.py

def get_preprocessors_and_targets(self, translation: TranslationLayer
                                  ) -> Iterator[TargetedPhraseChanger]:
    """Get the targeted phrase changers.

    Args:
        translation (TranslationLayer): The translaton layer.

    Yields:
        Iterator[TargetedPhraseChanger]: Thetarget phrase changers.
    """
    num_of_opts = len(self.options)
    if num_of_opts == 1:
        # NOTE: when there's only 1 option, the other option doesn't work
        #       since it has nothing to iterate over regarding 'other'
        #       options
        opt = self.options[0]
        for target_cui in opt.target_cuis:
            yield TargetedPhraseChanger(changer=PhraseChanger.empty(),
                                        placeholder=opt.placeholder,
                                        cui=target_cui,
                                        onlyprefnames=opt.onlyprefnames)
        return
    for opt_nr in range(num_of_opts):
        other_opts = list(self.options)
        cur_opt = other_opts.pop(opt_nr)
        for changer, target_cui in self._get_all_combinations(
                cur_opt, other_opts, translation):
            yield TargetedPhraseChanger(
                changer=changer,
                placeholder=cur_opt.placeholder,
                cui=target_cui,
                onlyprefnames=cur_opt.onlyprefnames)

to_dict

to_dict() -> dict

Convert the OptionSet to a dict.

Returns:

dict ( dict ) –

The dict representation

Source code in medcat-v2/medcat/utils/regression/targeting.py

def to_dict(self) -> dict:
    """Convert the OptionSet to a dict.

    Returns:
        dict: The dict representation
    """
    placeholders = [
        {
            'placeholder': opt.placeholder,
            'cuis': opt.target_cuis,
            'prefname-only': str(opt.onlyprefnames),
        }
        for opt in self.options
    ]
    return {
        'placeholders': placeholders,
        'any-combination': str(self.allow_any_combinations)
    }

PhraseChanger

Bases: BaseModel

The phrase changer.

This is class used as a preprocessor for phrases with multiple placeholders. It allows swapping in the rest of the placeholders while leaving in the one that's being tested for.

Methods:

empty –

Gets the empty phrase changer.

Attributes:

preprocess_placeholders (list[tuple[str, str]]) –

preprocess_placeholders `instance-attribute`

preprocess_placeholders: list[tuple[str, str]]

empty `classmethod`

empty() -> PhraseChanger

Gets the empty phrase changer.

That is a phrase changer that makes no changes to the phrase.

Returns:

PhraseChanger ( PhraseChanger ) –

The empty phrase changer.

Source code in medcat-v2/medcat/utils/regression/targeting.py

@classmethod
def empty(cls) -> 'PhraseChanger':
    """Gets the empty phrase changer.

    That is a phrase changer that makes no changes to the phrase.

    Returns:
        PhraseChanger: The empty phrase changer.
    """
    return cls(preprocess_placeholders=[])

ProblematicOptionSetException

ProblematicOptionSetException(*args: object)

Bases: ValueError

Source code in medcat-v2/medcat/utils/regression/targeting.py

def __init__(self, *args: object) -> None:
    super().__init__(*args)

TargetPlaceholder

Bases: BaseModel

A class describing the options for a specific placeholder.

Attributes:

onlyprefnames (bool) –
placeholder (str) –
target_cuis (list[str]) –

onlyprefnames `class-attribute` `instance-attribute`

onlyprefnames: bool = False

placeholder `instance-attribute`

placeholder: str

target_cuis `instance-attribute`

target_cuis: list[str]

TargetedPhraseChanger

Bases: BaseModel

The target phrase changer.

It includes the phrase changer (for preprocessing) along with the relevant concept and the placeholder it will replace.

Attributes:

changer (PhraseChanger) –
cui (str) –
onlyprefnames (bool) –
placeholder (str) –

changer `instance-attribute`

changer: PhraseChanger

cui `instance-attribute`

cui: str

onlyprefnames `instance-attribute`

onlyprefnames: bool

placeholder `instance-attribute`

placeholder: str

TranslationLayer

TranslationLayer(cui2info: dict[str, CUIInfo], name2info: dict[str, NameInfo], cui2children: dict[str, set[str]], separator: str, whitespace: str = ' ')

The translation layer for translating: - CUIs to names - names to CUIs - type_ids to CUIs - CUIs to chil CUIs

The idea is to decouple these translations from the CDB instance in case something changes there.

Parameters:

cui2info
(dict[str, CUIInfo]) –

The map from CUI to names
name2info
(dict[str, NameInfo]) –

The map from name to CUIs
cui2type_ids
(dict[str, set[str]]) –

The map from CUI to type_ids
cui2children
(dict[str, set[str]]) –

The map from CUI to child CUIs

Methods:

from_CDB –

Construct a TranslationLayer object from a context database (CDB).
get_children_of –

Get the children of the specifeid CUI in the
get_direct_children –

Get the direct children of a concept.
get_direct_parents –

Get the direct parent(s) of a concept.
get_first_name –

Get the preprocessed (potentially) arbitrarily first name
get_names_of –

Get the preprocessed names of a CUI.
get_preferred_name –

Get the preferred name of a concept.

Attributes:

cui2children –
cui2info –
name2info –
separator –
type_id2cuis (dict[str, set[str]]) –
whitespace –

Source code in medcat-v2/medcat/utils/regression/targeting.py

def __init__(self, cui2info: dict[str, CUIInfo],
             name2info: dict[str, NameInfo],
             cui2children: dict[str, set[str]],
             separator: str, whitespace: str = ' ') -> None:
    self.cui2info = cui2info
    self.name2info = name2info
    self.separator = separator
    self.whitespace = whitespace
    self.type_id2cuis: dict[str, set[str]] = {}
    for cui, ci in self.cui2info.items():
        type_ids = ci["type_ids"]
        for type_id in type_ids:
            if type_id not in self.type_id2cuis:
                self.type_id2cuis[type_id] = set()
            self.type_id2cuis[type_id].add(cui)
    self.cui2children = cui2children
    for cui in self.cui2info:
        if cui not in cui2children:
            self.cui2children[cui] = set()

cui2children `instance-attribute`

cui2children = cui2children

cui2info `instance-attribute`

cui2info = cui2info

name2info `instance-attribute`

name2info = name2info

separator `instance-attribute`

separator = separator

type_id2cuis `instance-attribute`

type_id2cuis: dict[str, set[str]] = {}

whitespace `instance-attribute`

whitespace = whitespace

from_CDB `classmethod`

from_CDB(cdb: CDB) -> TranslationLayer

Construct a TranslationLayer object from a context database (CDB).

This translation layer will refer to the same dicts that the CDB refers to. While there is no obvious reason these should be modified, it's something to keep in mind.

Parameters:

cdb
(CDB) –

The CDB

Returns:

TranslationLayer ( TranslationLayer ) –

The subsequent TranslationLayer

Source code in medcat-v2/medcat/utils/regression/targeting.py

@classmethod
def from_CDB(cls, cdb: CDB) -> 'TranslationLayer':
    """Construct a TranslationLayer object from a context database (CDB).

    This translation layer will refer to the same dicts that the CDB
    refers to. While there is no obvious reason these should be modified,
    it's something to keep in mind.

    Args:
        cdb (CDB): The CDB

    Returns:
        TranslationLayer: The subsequent TranslationLayer
    """
    if 'pt2ch' not in cdb.addl_info:
        logger.warning(
            "No parent to child information presented so "
            "they cannot be used")
        parent2child = {}
    else:
        parent2child = cdb.addl_info['pt2ch']
    return TranslationLayer(
        cui2info=cdb.cui2info,
        name2info=cdb.name2info,
        cui2children=parent2child,
        separator=cdb.config.general.separator)

get_children_of

get_children_of(found_cuis: Iterable[str], cui: str, depth: int = 1) -> list[str]

Get the children of the specifeid CUI in the listed CUIs (if they exist).

Parameters:

found_cuis
(Iterable[str]) –

The list of CUIs to look in
cui
(str) –

The target parent CUI
depth
(int, default: 1 ) –

The depth to carry out the search for

Returns:

list[str] –

list[str]: The list of children found

Source code in medcat-v2/medcat/utils/regression/targeting.py

def get_children_of(self, found_cuis: Iterable[str],
                    cui: str, depth: int = 1) -> list[str]:
    """Get the children of the specifeid CUI in the
    listed CUIs (if they exist).

    Args:
        found_cuis (Iterable[str]): The list of CUIs to look in
        cui (str): The target parent CUI
        depth (int): The depth to carry out the search for

    Returns:
        list[str]: The list of children found
    """
    if cui not in self.cui2children:
        return []  # no children
    children = self.cui2children[cui]
    found_children = []
    for child in children:
        if child in found_cuis:
            found_children.append(child)
    if depth > 1:
        for child in children:
            found_children.extend(self.get_children_of(
                found_cuis, child, depth - 1))
    return found_children

get_direct_children

get_direct_children(cui: str) -> list[str]

Get the direct children of a concept.

This means only the children, but not grandchildren.

If the underlying CDB doesn't list children for this CUI, an empty list is returned.

Parameters:

cui
(str) –

The concept in question.

Returns:

list[str] –

list[str]: The (potentially empty) list of direct children.

Source code in medcat-v2/medcat/utils/regression/targeting.py

def get_direct_children(self, cui: str) -> list[str]:
    """Get the direct children of a concept.

    This means only the children, but not grandchildren.

    If the underlying CDB doesn't list children for this CUI,
    an empty list is returned.

    Args:
        cui (str): The concept in question.

    Returns:
        list[str]: The (potentially empty) list of direct children.
    """
    return list(self.cui2children.get(cui, []))

get_direct_parents `cached`

get_direct_parents(cui: str) -> list[str]

Get the direct parent(s) of a concept.

This method can be quite a CPU heavy one since it relies

on running through all the parent-children relationships since the child->parent(s) relationship isn't normally kept track of.

Parameters:

cui
(str) –

description

Returns:

list[str] –

list[str]: description

Source code in medcat-v2/medcat/utils/regression/targeting.py

@lru_cache(maxsize=10_000)
def get_direct_parents(self, cui: str) -> list[str]:
    """Get the direct parent(s) of a concept.

    PS: This method can be quite a CPU heavy one since it relies
        on running through all the parent-children relationships
        since the child->parent(s) relationship isn't normally
        kept track of.

    Args:
        cui (str): _description_

    Returns:
        list[str]: _description_
    """
    parents = []
    for pot_parent, children in self.cui2children.items():
        if cui in children:
            parents.append(pot_parent)
    return parents

get_first_name

get_first_name(cui: str) -> str

Get the preprocessed (potentially) arbitrarily first name of the given concept.

If the concept does not exist, the CUI itself is returned.

PS: The "first" name may not be consistent across runs since it relies on set order.

Parameters:

cui
(str) –

The concept ID.

Returns:

str ( str ) –

The first name.

Source code in medcat-v2/medcat/utils/regression/targeting.py

def get_first_name(self, cui: str) -> str:
    """Get the preprocessed (potentially) arbitrarily first name
    of the given concept.

    If the concept does not exist, the CUI itself is returned.

    PS: The "first" name may not be consistent across runs since it
    relies on set order.

    Args:
        cui (str): The concept ID.

    Returns:
        str: The first name.
    """
    for name in self.cui2info[cui]["names"]:
        return name.replace(self.separator, self.whitespace)
    return cui

get_names_of

get_names_of(cui: str, only_prefnames: bool) -> list[str]

Get the preprocessed names of a CUI.

This method preporcesses the names by replacing the separator (generally ~) with the appropriate whitespace ().

If the concept is not in the underlying CDB, an empty list is returned.

Parameters:

cui
(str) –

The concept in question.
only_prefnames
(bool) –

Whether to only return a preferred name.

Returns:

list[str] –

list[str]: The list of names.

Source code in medcat-v2/medcat/utils/regression/targeting.py

def get_names_of(self, cui: str, only_prefnames: bool) -> list[str]:
    """Get the preprocessed names of a CUI.

    This method preporcesses the names by replacing the separator
    (generally `~`) with the appropriate whitespace (` `).

    If the concept is not in the underlying CDB, an empty list is returned.

    Args:
        cui (str): The concept in question.
        only_prefnames (bool): Whether to only return a preferred name.

    Returns:
        list[str]: The list of names.
    """
    if cui not in self.cui2info:
        logger.warning(
            "CUI %s Is not defined in CDB / translation layer", cui)
        return []
    if only_prefnames:
        return [self.get_preferred_name(cui).replace(
            self.separator, self.whitespace)]
    return [name.replace(self.separator, self.whitespace)
            # NOTE: sorting the order here in case we're using
            #       edits in which case the order of the names
            #       needs to be the same, otherwise different
            #       edits will be used across runs
            for name in sorted(self.cui2info[cui]["names"])]

get_preferred_name

get_preferred_name(cui: str) -> str

Get the preferred name of a concept.

If no preferred name is found, the random 'first' name is selected.

Parameters:

cui
(str) –

The concept ID.

Returns:

str ( str ) –

The preferred name.

Source code in medcat-v2/medcat/utils/regression/targeting.py

def get_preferred_name(self, cui: str) -> str:
    """Get the preferred name of a concept.

    If no preferred name is found, the random 'first' name is selected.

    Args:
        cui (str): The concept ID.

    Returns:
        str: The preferred name.
    """
    if cui not in self.cui2info:
        logger.warning(
            "CUI %s Is not defined in CDB / translation layer", cui)
        return cui
    pref_name = self.cui2info[cui]["preferred_name"]
    if pref_name is None:
        logger.warning("CUI %s does not have a preferred name. "
                       "Using a random 'first' name of all the names", cui)
        return self.get_first_name(cui)
    return pref_name

medcat.utils.regression.targeting

logger module-attribute

FinalTarget

cui instance-attribute

final_phrase instance-attribute

name instance-attribute

placeholder instance-attribute

OptionSet

allow_any_combinations class-attribute instance-attribute

options instance-attribute

estimate_num_of_subcases

from_dict classmethod

section

get_preprocessors_and_targets

translation

to_dict

PhraseChanger

preprocess_placeholders instance-attribute

empty classmethod

ProblematicOptionSetException

TargetPlaceholder

onlyprefnames class-attribute instance-attribute

placeholder instance-attribute

target_cuis instance-attribute

TargetedPhraseChanger

changer instance-attribute

cui instance-attribute

onlyprefnames instance-attribute

placeholder instance-attribute

TranslationLayer

cui2info

name2info

cui2type_ids

cui2children

cui2children instance-attribute

cui2info instance-attribute

name2info instance-attribute

separator instance-attribute

type_id2cuis instance-attribute

whitespace instance-attribute

from_CDB classmethod

cdb

get_children_of

found_cuis

cui

depth

get_direct_children

cui

get_direct_parents cached

cui

get_first_name

cui

get_names_of

cui

only_prefnames

get_preferred_name

cui

logger `module-attribute`

cui `instance-attribute`

final_phrase `instance-attribute`

name `instance-attribute`

placeholder `instance-attribute`

allow_any_combinations `class-attribute` `instance-attribute`

options `instance-attribute`

from_dict `classmethod`

`section`

`translation`

preprocess_placeholders `instance-attribute`

empty `classmethod`

onlyprefnames `class-attribute` `instance-attribute`

placeholder `instance-attribute`

target_cuis `instance-attribute`

changer `instance-attribute`

cui `instance-attribute`

onlyprefnames `instance-attribute`

placeholder `instance-attribute`

`cui2info`

`name2info`

`cui2type_ids`

`cui2children`

cui2children `instance-attribute`

cui2info `instance-attribute`

name2info `instance-attribute`

separator `instance-attribute`

type_id2cuis `instance-attribute`

whitespace `instance-attribute`

from_CDB `classmethod`

`cdb`

`found_cuis`

`cui`

`depth`

`cui`

get_direct_parents `cached`

`cui`

`cui`

`cui`

`only_prefnames`

`cui`