Skip to content

medcat.utils.regression.regression_checker

Functions:

Attributes:

DEFAULT_TEST_SUITE_PATH module-attribute

DEFAULT_TEST_SUITE_PATH = Path('configs', 'default_regression_tests.yml')

args module-attribute

args = parse_args()

logger module-attribute

logger = getLogger(__name__)

parser module-attribute

parser = ArgumentParser()

main

Check test suite against the specifeid model pack.

Parameters:

  • model_pack_dir

    (Path) –

    The path to the model pack

  • test_suite_file

    (Path) –

    The path to the test suite YAML

  • phrases

    (bool, default: False ) –

    Whether to show per-phrase information in a report

  • hide_empty

    (bool, default: False ) –

    Whether to hide empty cases in a report

  • examples_strictness_str

    (str, default: 'STRICTEST' ) –

    The example strictness string. Defaults to STRICTEST. NOTE: If you set this to 'None', examples will be omitted.

  • jsonpath

    (Optional[Path], default: None ) –

    The json path to save the report to (if specified)

  • overwrite

    (bool, default: False ) –

    Whether to overwrite the file if it exists. Defaults to False

  • jsonindent

    (int, default: None ) –

    The indentation for json objects. Defaults to 0

  • strictness_str

    (str, default: 'NORMAL' ) –

    The strictness name. Defaults to NORMAL.

  • max_phrase_length

    (int, default: 80 ) –

    The maximum phrase length in examples. Defaults to 80.

  • use_mct_export

    (bool, default: False ) –

    Whether to use a MedCATtrainer export as input. Defaults to False.

  • mct_export_yaml_path

    (str, default: None ) –

    The (optional) path the converted MCT export should be saved as YAML at. If not set (or None), the MCT export is not saved in YAML format. Defaults to None.

  • only_mct_export_conversion

    (bool, default: False ) –

    Whether to only deal with the MCT export conversion. I.e exit when MCT export conversion is done. Defaults to False.

  • only_describe

    (bool, default: False ) –

    Whether to only describe the finding options and exit. Defaults to False.

  • require_fully_correct

    (bool, default: False ) –

    Whether all cases are required to be correct. If set to True, an exit-status of 1 is returned unless all (sub)cases are correct. Defaults to False.

  • edit_distance

    (tuple[int, int, int], default: (0, 0, 0) ) –

    The edit distance, the random seed, and the number of edited names to pick for each of the names. If set to non-0, the specified number of splits, deletes, transposes, replaces, or inserts are done to the each name. This can be useful for looking at the capability of identifying typos in text. However, this can make hte process a lot slower as a result. Defaults to (0, 0, 0).

Raises:

  • ValueError

    If unable to overwrite file or folder does not exist.

Source code in medcat-v2/medcat/utils/regression/regression_checker.py
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
def main(model_pack_dir: Path, test_suite_file: Path,
         phrases: bool = False, hide_empty: bool = False,
         examples_strictness_str: str = 'STRICTEST',
         jsonpath: Optional[Path] = None, overwrite: bool = False,
         jsonindent: Optional[int] = None,
         strictness_str: str = 'NORMAL',
         max_phrase_length: int = 80,
         use_mct_export: bool = False,
         mct_export_yaml_path: Optional[str] = None,
         only_mct_export_conversion: bool = False,
         only_describe: bool = False,
         require_fully_correct: bool = False,
         edit_distance: tuple[int, int, int] = (0, 0, 0)) -> None:
    """Check test suite against the specifeid model pack.

    Args:
        model_pack_dir (Path): The path to the model pack
        test_suite_file (Path): The path to the test suite YAML
        phrases (bool): Whether to show per-phrase information in a report
        hide_empty (bool): Whether to hide empty cases in a report
        examples_strictness_str (str): The example strictness string.
            Defaults to STRICTEST.
            NOTE: If you set this to 'None', examples will be omitted.
        jsonpath (Optional[Path]): The json path to save the report to
            (if specified)
        overwrite (bool): Whether to overwrite the file if it exists.
            Defaults to False
        jsonindent (int): The indentation for json objects. Defaults to 0
        strictness_str (str): The strictness name. Defaults to NORMAL.
        max_phrase_length (int): The maximum phrase length in examples.
            Defaults to 80.
        use_mct_export (bool): Whether to use a MedCATtrainer export as input.
            Defaults to False.
        mct_export_yaml_path (str): The (optional) path the converted
            MCT export should be saved as YAML at. If not set (or None),
            the MCT export is not saved in YAML format. Defaults to None.
        only_mct_export_conversion (bool): Whether to only deal with the
            MCT export conversion. I.e exit when MCT export conversion is
            done. Defaults to False.
        only_describe (bool): Whether to only describe the finding options
            and exit. Defaults to False.
        require_fully_correct (bool): Whether all cases are required to
            be correct. If set to True, an exit-status of 1 is returned
            unless all (sub)cases are correct. Defaults to False.
        edit_distance (tuple[int, int, int]): The edit distance, the random
            seed, and the number of edited names to pick for each of the
            names. If set to non-0, the specified number of splits, deletes,
            transposes, replaces, or inserts are done to the each name. This
            can be useful for looking at the capability of identifying typos
            in text. However, this can make hte process a lot slower as a
            result. Defaults to (0, 0, 0).

    Raises:
        ValueError: If unable to overwrite file or folder does not exist.
    """
    if only_describe:
        show_description()
        return
    if jsonpath and jsonpath.exists() and not overwrite:
        # check before doing anything so as to not waste time on the tests
        raise ValueError(
            f'Unable to write to existing file {str(jsonpath)} pass '
            '--overwrite to overwrite the file')
    if jsonpath and not jsonpath.parent.exists():
        raise ValueError(
            'Need to specify a file in an existing directory, folder not '
            f'found: {str(jsonpath)}')
    logger.info('Loading RegressionChecker from yaml: %s', test_suite_file)
    if not use_mct_export:
        rc = RegressionSuite.from_yaml(str(test_suite_file))
    else:
        rc = RegressionSuite.from_mct_export(str(test_suite_file))
        if mct_export_yaml_path:
            logger.info('Writing MCT export in YAML to %s',
                        str(mct_export_yaml_path))
            with open(mct_export_yaml_path, 'w') as f:
                f.write(rc.to_yaml())
            if only_mct_export_conversion:
                logger.info("Done with conversion - exiting")
                return
    logger.info('Loading model pack from file: %s', model_pack_dir)
    cat: CAT = CAT.load_model_pack(str(model_pack_dir))
    logger.info('Checking the current status')
    res = rc.check_model(cat, TranslationLayer.from_CDB(cat.cdb),
                         edit_distance=edit_distance,
                         use_diacritics=cat.config.general.diacritics)
    cat.config.general
    strictness = Strictness[strictness_str]
    if examples_strictness_str in ("None", "N/A"):
        examples_strictness = None
    else:
        examples_strictness = Strictness[examples_strictness_str]
    if jsonpath:
        logger.info('Writing to %s', str(jsonpath))
        dumped = res.model_dump(strictness=examples_strictness)
        jsonpath.write_text(json.dumps(dumped, indent=jsonindent))
    else:
        logger.info(res.get_report(phrases_separately=phrases,
                    hide_empty=hide_empty,
                    examples_strictness=examples_strictness,
                    strictness=strictness, phrase_max_len=max_phrase_length))
    if require_fully_correct:
        total, success = res.calculate_report(
            phrases_separately=phrases, hide_empty=hide_empty,
            examples_strictness=examples_strictness, strictness=strictness,
            phrase_max_len=max_phrase_length)[:2]
        if total != success:
            exit(1)

show_description

show_description()
Source code in medcat-v2/medcat/utils/regression/regression_checker.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def show_description():
    logger.info('The various findings and their descriptions:')
    logger.info('')
    logger.info('Class description:')
    logger.info('')
    logger.info(Finding.__doc__.replace("\n    ", "\n"))
    logger.info('')
    for f in Finding:
        logger.info('%s :', f.name)
        logger.info(f.__doc__.replace("\n    ", "\n"))
        logger.info('')
    logger.info('The strictnesses we have available:')
    logger.info('')
    for strictness in Strictness:
        allows = [s.name for s in STRICTNESS_MATRIX[strictness]]
        logger.info('%s: allows %s', strictness.name, allows)
        logger.info('')
    logger.info(
        'NOTE: When using --example-strictness, anything described above '
        'will be omitted from examples (since the are considered correct)')

tuple3_parser

tuple3_parser(arg: str) -> tuple[int, int, int]
Source code in medcat-v2/medcat/utils/regression/regression_checker.py
151
152
153
154
155
156
157
158
def tuple3_parser(arg: str) -> tuple[int, int, int]:
    parts = arg.strip("()").split(',')
    if len(parts) != 3:
        raise argparse.ArgumentTypeError("Tuple must be in the form (x, y, z)")
    try:
        return (int(parts[0]), int(parts[1]), int(parts[2]))
    except ValueError:
        raise argparse.ArgumentTypeError("Tuple must be in the form (x, y, z)")