medcat.utils.regression.regression_checker

Functions:

main –

Check test suite against the specifeid model pack.
show_description –
tuple3_parser –

Attributes:

DEFAULT_TEST_SUITE_PATH –
args –
logger –
parser –

DEFAULT_TEST_SUITE_PATH `module-attribute`

DEFAULT_TEST_SUITE_PATH = Path('configs', 'default_regression_tests.yml')

args `module-attribute`

args = parse_args()

logger `module-attribute`

logger = getLogger(__name__)

parser `module-attribute`

parser = ArgumentParser()

main

main(model_pack_dir: Path, test_suite_file: Path, phrases: bool = False, hide_empty: bool = False, examples_strictness_str: str = 'STRICTEST', jsonpath: Optional[Path] = None, overwrite: bool = False, jsonindent: Optional[int] = None, strictness_str: str = 'NORMAL', max_phrase_length: int = 80, use_mct_export: bool = False, mct_export_yaml_path: Optional[str] = None, only_mct_export_conversion: bool = False, only_describe: bool = False, require_fully_correct: bool = False, edit_distance: tuple[int, int, int] = (0, 0, 0)) -> None

Check test suite against the specifeid model pack.

Parameters:

model_pack_dir
(Path) –

The path to the model pack
test_suite_file
(Path) –

The path to the test suite YAML
phrases
(bool, default: False ) –

Whether to show per-phrase information in a report
hide_empty
(bool, default: False ) –

Whether to hide empty cases in a report
examples_strictness_str
(str, default: 'STRICTEST' ) –

The example strictness string. Defaults to STRICTEST. NOTE: If you set this to 'None', examples will be omitted.
jsonpath
(Optional[Path], default: None ) –

The json path to save the report to (if specified)
overwrite
(bool, default: False ) –

Whether to overwrite the file if it exists. Defaults to False
jsonindent
(int, default: None ) –

The indentation for json objects. Defaults to 0
strictness_str
(str, default: 'NORMAL' ) –

The strictness name. Defaults to NORMAL.
max_phrase_length
(int, default: 80 ) –

The maximum phrase length in examples. Defaults to 80.
use_mct_export
(bool, default: False ) –

Whether to use a MedCATtrainer export as input. Defaults to False.
mct_export_yaml_path
(str, default: None ) –

The (optional) path the converted MCT export should be saved as YAML at. If not set (or None), the MCT export is not saved in YAML format. Defaults to None.
only_mct_export_conversion
(bool, default: False ) –

Whether to only deal with the MCT export conversion. I.e exit when MCT export conversion is done. Defaults to False.
only_describe
(bool, default: False ) –

Whether to only describe the finding options and exit. Defaults to False.
require_fully_correct
(bool, default: False ) –

Whether all cases are required to be correct. If set to True, an exit-status of 1 is returned unless all (sub)cases are correct. Defaults to False.
edit_distance
(tuple[int, int, int], default: (0, 0, 0) ) –

The edit distance, the random seed, and the number of edited names to pick for each of the names. If set to non-0, the specified number of splits, deletes, transposes, replaces, or inserts are done to the each name. This can be useful for looking at the capability of identifying typos in text. However, this can make hte process a lot slower as a result. Defaults to (0, 0, 0).

Raises:

ValueError –

If unable to overwrite file or folder does not exist.

Source code in medcat-v2/medcat/utils/regression/regression_checker.py

def main(model_pack_dir: Path, test_suite_file: Path,
         phrases: bool = False, hide_empty: bool = False,
         examples_strictness_str: str = 'STRICTEST',
         jsonpath: Optional[Path] = None, overwrite: bool = False,
         jsonindent: Optional[int] = None,
         strictness_str: str = 'NORMAL',
         max_phrase_length: int = 80,
         use_mct_export: bool = False,
         mct_export_yaml_path: Optional[str] = None,
         only_mct_export_conversion: bool = False,
         only_describe: bool = False,
         require_fully_correct: bool = False,
         edit_distance: tuple[int, int, int] = (0, 0, 0)) -> None:
    """Check test suite against the specifeid model pack.

    Args:
        model_pack_dir (Path): The path to the model pack
        test_suite_file (Path): The path to the test suite YAML
        phrases (bool): Whether to show per-phrase information in a report
        hide_empty (bool): Whether to hide empty cases in a report
        examples_strictness_str (str): The example strictness string.
            Defaults to STRICTEST.
            NOTE: If you set this to 'None', examples will be omitted.
        jsonpath (Optional[Path]): The json path to save the report to
            (if specified)
        overwrite (bool): Whether to overwrite the file if it exists.
            Defaults to False
        jsonindent (int): The indentation for json objects. Defaults to 0
        strictness_str (str): The strictness name. Defaults to NORMAL.
        max_phrase_length (int): The maximum phrase length in examples.
            Defaults to 80.
        use_mct_export (bool): Whether to use a MedCATtrainer export as input.
            Defaults to False.
        mct_export_yaml_path (str): The (optional) path the converted
            MCT export should be saved as YAML at. If not set (or None),
            the MCT export is not saved in YAML format. Defaults to None.
        only_mct_export_conversion (bool): Whether to only deal with the
            MCT export conversion. I.e exit when MCT export conversion is
            done. Defaults to False.
        only_describe (bool): Whether to only describe the finding options
            and exit. Defaults to False.
        require_fully_correct (bool): Whether all cases are required to
            be correct. If set to True, an exit-status of 1 is returned
            unless all (sub)cases are correct. Defaults to False.
        edit_distance (tuple[int, int, int]): The edit distance, the random
            seed, and the number of edited names to pick for each of the
            names. If set to non-0, the specified number of splits, deletes,
            transposes, replaces, or inserts are done to the each name. This
            can be useful for looking at the capability of identifying typos
            in text. However, this can make hte process a lot slower as a
            result. Defaults to (0, 0, 0).

    Raises:
        ValueError: If unable to overwrite file or folder does not exist.
    """
    if only_describe:
        show_description()
        return
    if jsonpath and jsonpath.exists() and not overwrite:
        # check before doing anything so as to not waste time on the tests
        raise ValueError(
            f'Unable to write to existing file {str(jsonpath)} pass '
            '--overwrite to overwrite the file')
    if jsonpath and not jsonpath.parent.exists():
        raise ValueError(
            'Need to specify a file in an existing directory, folder not '
            f'found: {str(jsonpath)}')
    logger.info('Loading RegressionChecker from yaml: %s', test_suite_file)
    if not use_mct_export:
        rc = RegressionSuite.from_yaml(str(test_suite_file))
    else:
        rc = RegressionSuite.from_mct_export(str(test_suite_file))
        if mct_export_yaml_path:
            logger.info('Writing MCT export in YAML to %s',
                        str(mct_export_yaml_path))
            with open(mct_export_yaml_path, 'w') as f:
                f.write(rc.to_yaml())
            if only_mct_export_conversion:
                logger.info("Done with conversion - exiting")
                return
    logger.info('Loading model pack from file: %s', model_pack_dir)
    cat: CAT = CAT.load_model_pack(str(model_pack_dir))
    logger.info('Checking the current status')
    res = rc.check_model(cat, TranslationLayer.from_CDB(cat.cdb),
                         edit_distance=edit_distance,
                         use_diacritics=cat.config.general.diacritics)
    cat.config.general
    strictness = Strictness[strictness_str]
    if examples_strictness_str in ("None", "N/A"):
        examples_strictness = None
    else:
        examples_strictness = Strictness[examples_strictness_str]
    if jsonpath:
        logger.info('Writing to %s', str(jsonpath))
        dumped = res.model_dump(strictness=examples_strictness)
        jsonpath.write_text(json.dumps(dumped, indent=jsonindent))
    else:
        logger.info(res.get_report(phrases_separately=phrases,
                    hide_empty=hide_empty,
                    examples_strictness=examples_strictness,
                    strictness=strictness, phrase_max_len=max_phrase_length))
    if require_fully_correct:
        total, success = res.calculate_report(
            phrases_separately=phrases, hide_empty=hide_empty,
            examples_strictness=examples_strictness, strictness=strictness,
            phrase_max_len=max_phrase_length)[:2]
        if total != success:
            exit(1)

show_description

show_description()

Source code in medcat-v2/medcat/utils/regression/regression_checker.py

def show_description():
    logger.info('The various findings and their descriptions:')
    logger.info('')
    logger.info('Class description:')
    logger.info('')
    logger.info(Finding.__doc__.replace("\n    ", "\n"))
    logger.info('')
    for f in Finding:
        logger.info('%s :', f.name)
        logger.info(f.__doc__.replace("\n    ", "\n"))
        logger.info('')
    logger.info('The strictnesses we have available:')
    logger.info('')
    for strictness in Strictness:
        allows = [s.name for s in STRICTNESS_MATRIX[strictness]]
        logger.info('%s: allows %s', strictness.name, allows)
        logger.info('')
    logger.info(
        'NOTE: When using --example-strictness, anything described above '
        'will be omitted from examples (since the are considered correct)')

tuple3_parser

tuple3_parser(arg: str) -> tuple[int, int, int]

Source code in medcat-v2/medcat/utils/regression/regression_checker.py

def tuple3_parser(arg: str) -> tuple[int, int, int]:
    parts = arg.strip("()").split(',')
    if len(parts) != 3:
        raise argparse.ArgumentTypeError("Tuple must be in the form (x, y, z)")
    try:
        return (int(parts[0]), int(parts[1]), int(parts[2]))
    except ValueError:
        raise argparse.ArgumentTypeError("Tuple must be in the form (x, y, z)")

medcat.utils.regression.regression_checker

DEFAULT_TEST_SUITE_PATH `module-attribute`

args `module-attribute`

logger `module-attribute`

parser `module-attribute`

main

`model_pack_dir`

`test_suite_file`

`phrases`

`hide_empty`

`examples_strictness_str`

`jsonpath`

`overwrite`

`jsonindent`

`strictness_str`

`max_phrase_length`

`use_mct_export`

`mct_export_yaml_path`

`only_mct_export_conversion`

`only_describe`

`require_fully_correct`

`edit_distance`

show_description

tuple3_parser

medcat.utils.regression.regression_checker

DEFAULT_TEST_SUITE_PATH module-attribute

args module-attribute

logger module-attribute

parser module-attribute

main

model_pack_dir

test_suite_file

phrases

hide_empty

examples_strictness_str

jsonpath

overwrite

jsonindent

strictness_str

max_phrase_length

use_mct_export

mct_export_yaml_path

only_mct_export_conversion

only_describe

require_fully_correct

edit_distance

show_description

tuple3_parser

DEFAULT_TEST_SUITE_PATH `module-attribute`

args `module-attribute`

logger `module-attribute`

parser `module-attribute`

`model_pack_dir`

`test_suite_file`

`phrases`

`hide_empty`

`examples_strictness_str`

`jsonpath`

`overwrite`

`jsonindent`

`strictness_str`

`max_phrase_length`

`use_mct_export`

`mct_export_yaml_path`

`only_mct_export_conversion`

`only_describe`

`require_fully_correct`

`edit_distance`