Skip to content

medcat.utils.cdb_state

Functions:

Attributes:

CDBState module-attribute

CDBState = TypedDict('CDBState', {'name2info': dict[str, NameInfo], 'cui2info': dict[str, CUIInfo], 'token_counts': dict[str, int], '_subnames': set[str], 'config.meta': ModelMeta})

CDB State.

This is a dictionary of the parts of the CDB that change during (supervised) training. It can be used to store and restore the state of a CDB after modifying it.

Currently, the following fields are saved: - name2info - cui2info - token_counts - _subnames - config.meta

logger module-attribute

logger = getLogger(__name__)

apply_cdb_state

apply_cdb_state(cdb, state: CDBState) -> None

Apply the specified state to the specified CDB.

This overwrites the current state of the CDB with one provided.

Parameters:

  • cdb

    The CDB to apply the state to.

  • state

    (CDBState) –

    The state to use.

Source code in medcat-v2/medcat/utils/cdb_state.py
 96
 97
 98
 99
100
101
102
103
104
105
106
def apply_cdb_state(cdb, state: CDBState) -> None:
    """Apply the specified state to the specified CDB.

    This overwrites the current state of the CDB with one provided.

    Args:
        cdb: The CDB to apply the state to.
        state (CDBState): The state to use.
    """
    _clear_state(cdb)
    _reapply_state(cdb, state)

captured_state_cdb

captured_state_cdb(cdb, save_state_to_disk: bool = False)

A context manager that captures and re-applies the initial CDB state.

The context manager captures/copies the initial state of the CDB when entering. It then allows the user to modify the state (i.e training). Upon exit re-applies the initial CDB state.

If RAM is an issue, it is recommended to use save_state_to_disk. Otherwise the copy of the original state will be held in memory. If saved on disk, a temporary file is used and removed afterwards.

Parameters:

  • cdb

    The CDB to use.

  • save_state_to_disk

    (bool, default: False ) –

    Whether to save state on disk or hold in memory. Defaults to False.

Yields:

  • None

Source code in medcat-v2/medcat/utils/cdb_state.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
@contextlib.contextmanager
def captured_state_cdb(cdb, save_state_to_disk: bool = False):
    """A context manager that captures and re-applies the initial CDB state.

    The context manager captures/copies the initial state of the CDB when
    entering. It then allows the user to modify the state (i.e training).
    Upon exit re-applies the initial CDB state.

    If RAM is an issue, it is recommended to use `save_state_to_disk`.
    Otherwise the copy of the original state will be held in memory.
    If saved on disk, a temporary file is used and removed afterwards.

    Args:
        cdb: The CDB to use.
        save_state_to_disk (bool): Whether to save state on disk or hold
            in memory. Defaults to False.

    Yields:
        None
    """
    if save_state_to_disk:
        with on_disk_memory_capture(cdb):
            yield
    else:
        with in_memory_state_capture(cdb):
            yield

copy_cdb_state

copy_cdb_state(cdb) -> CDBState

Creates a (deep) copy of the CDB state.

Grabs the fields that correspond to the state, creates deep copies, and returns the copies.

Parameters:

  • cdb

    The CDB from which to grab the state.

Returns:

  • CDBState ( CDBState ) –

    The copied state.

Source code in medcat-v2/medcat/utils/cdb_state.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def copy_cdb_state(cdb) -> CDBState:
    """Creates a (deep) copy of the CDB state.

    Grabs the fields that correspond to the state,
    creates deep copies, and returns the copies.

    Args:
        cdb: The CDB from which to grab the state.

    Returns:
        CDBState: The copied state.
    """
    return cast(CDBState, {
        k: deepcopy(_get_attr(cdb, k)) for k in CDBState.__annotations__
    })

in_memory_state_capture

in_memory_state_capture(cdb)

Capture the CDB state in memory.

Parameters:

  • cdb

    The CDB to use.

Yields:

  • None

Source code in medcat-v2/medcat/utils/cdb_state.py
195
196
197
198
199
200
201
202
203
204
205
206
207
@contextlib.contextmanager
def in_memory_state_capture(cdb):
    """Capture the CDB state in memory.

    Args:
        cdb: The CDB to use.

    Yields:
        None
    """
    state = copy_cdb_state(cdb)
    yield
    apply_cdb_state(cdb, state)

load_and_apply_cdb_state

load_and_apply_cdb_state(cdb, file_path: str) -> None

Delete current CDB state and apply CDB state from file.

This first deletes the current state of the CDB. This is to save memory. The idea is that saving the staet on disk will save on RAM usage. But it wouldn't really work too well if upon load, two instances were still in memory.

Parameters:

  • cdb

    The CDB to apply the state to.

  • file_path

    (str) –

    The file where the state has been saved to.

Source code in medcat-v2/medcat/utils/cdb_state.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
def load_and_apply_cdb_state(cdb, file_path: str) -> None:
    """Delete current CDB state and apply CDB state from file.

    This first deletes the current state of the CDB.
    This is to save memory. The idea is that saving the staet
    on disk will save on RAM usage. But it wouldn't really
    work too well if upon load, two instances were still in
    memory.

    Args:
        cdb: The CDB to apply the state to.
        file_path (str): The file where the state has been saved to.
    """
    # clear existing data on CDB
    # this is so that we don't occupy the memory for both the loaded
    # and the on-CDB data
    logger.debug("Clearing CDB state in memory")
    _clear_state(cdb)
    logger.debug("Loading CDB state from disk from '%s'", file_path)
    with open(file_path, 'rb') as f:
        state: CDBState = dill.load(f)
    _reapply_state(cdb, state)

on_disk_memory_capture

on_disk_memory_capture(cdb)

Capture the CDB state in a temporary file.

Parameters:

  • cdb

    The CDB to use

Yields:

  • None

Source code in medcat-v2/medcat/utils/cdb_state.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
@contextlib.contextmanager
def on_disk_memory_capture(cdb):
    """Capture the CDB state in a temporary file.

    Args:
        cdb: The CDB to use

    Yields:
        None
    """
    # NOTE: using temporary directory so that it also works on Windows
    #       otherwise you can't reopen a temporary file in Windows (apparently)
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_file_name = os.path.join(temp_dir, "cdb_state.dat")
        save_cdb_state(cdb, temp_file_name)
        yield
        load_and_apply_cdb_state(cdb, temp_file_name)

save_cdb_state

save_cdb_state(cdb, file_path: str) -> None

Saves CDB state in a file.

Currently uses dill.dump to save the relevant fields/values.

Parameters:

  • cdb

    The CDB from which to grab the state.

  • file_path

    (str) –

    The file to dump the state.

Source code in medcat-v2/medcat/utils/cdb_state.py
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def save_cdb_state(cdb, file_path: str) -> None:
    """Saves CDB state in a file.

    Currently uses `dill.dump` to save the relevant fields/values.

    Args:
        cdb: The CDB from which to grab the state.
        file_path (str): The file to dump the state.
    """
    # NOTE: The difference is that we don't create a copy here.
    #       That is so that we don't have to occupy the memory for
    #       both copies
    the_dict = {
        k: _get_attr(cdb, k) for k in CDBState.__annotations__
    }
    logger.debug("Saving CDB state on disk at: '%s'", file_path)
    with open(file_path, 'wb') as f:
        dill.dump(the_dict, f)