medcat.utils.cdb_state
Functions:
-
apply_cdb_state–Apply the specified state to the specified CDB.
-
captured_state_cdb–A context manager that captures and re-applies the initial CDB state.
-
copy_cdb_state–Creates a (deep) copy of the CDB state.
-
in_memory_state_capture–Capture the CDB state in memory.
-
load_and_apply_cdb_state–Delete current CDB state and apply CDB state from file.
-
on_disk_memory_capture–Capture the CDB state in a temporary file.
-
save_cdb_state–Saves CDB state in a file.
Attributes:
CDBState
module-attribute
CDBState = TypedDict('CDBState', {'name2info': dict[str, NameInfo], 'cui2info': dict[str, CUIInfo], 'token_counts': dict[str, int], '_subnames': set[str], 'config.meta': ModelMeta})
CDB State.
This is a dictionary of the parts of the CDB that change during (supervised) training. It can be used to store and restore the state of a CDB after modifying it.
Currently, the following fields are saved: - name2info - cui2info - token_counts - _subnames - config.meta
apply_cdb_state
Apply the specified state to the specified CDB.
This overwrites the current state of the CDB with one provided.
Parameters:
-
–cdbThe CDB to apply the state to.
-
(stateCDBState) –The state to use.
Source code in medcat-v2/medcat/utils/cdb_state.py
96 97 98 99 100 101 102 103 104 105 106 | |
captured_state_cdb
captured_state_cdb(cdb, save_state_to_disk: bool = False)
A context manager that captures and re-applies the initial CDB state.
The context manager captures/copies the initial state of the CDB when entering. It then allows the user to modify the state (i.e training). Upon exit re-applies the initial CDB state.
If RAM is an issue, it is recommended to use save_state_to_disk.
Otherwise the copy of the original state will be held in memory.
If saved on disk, a temporary file is used and removed afterwards.
Parameters:
-
–cdbThe CDB to use.
-
(save_state_to_diskbool, default:False) –Whether to save state on disk or hold in memory. Defaults to False.
Yields:
-
–
None
Source code in medcat-v2/medcat/utils/cdb_state.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
copy_cdb_state
Creates a (deep) copy of the CDB state.
Grabs the fields that correspond to the state, creates deep copies, and returns the copies.
Parameters:
-
–cdbThe CDB from which to grab the state.
Returns:
-
CDBState(CDBState) –The copied state.
Source code in medcat-v2/medcat/utils/cdb_state.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
in_memory_state_capture
in_memory_state_capture(cdb)
Capture the CDB state in memory.
Parameters:
-
–cdbThe CDB to use.
Yields:
-
–
None
Source code in medcat-v2/medcat/utils/cdb_state.py
195 196 197 198 199 200 201 202 203 204 205 206 207 | |
load_and_apply_cdb_state
Delete current CDB state and apply CDB state from file.
This first deletes the current state of the CDB. This is to save memory. The idea is that saving the staet on disk will save on RAM usage. But it wouldn't really work too well if upon load, two instances were still in memory.
Parameters:
-
–cdbThe CDB to apply the state to.
-
(file_pathstr) –The file where the state has been saved to.
Source code in medcat-v2/medcat/utils/cdb_state.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |
on_disk_memory_capture
on_disk_memory_capture(cdb)
Capture the CDB state in a temporary file.
Parameters:
-
–cdbThe CDB to use
Yields:
-
–
None
Source code in medcat-v2/medcat/utils/cdb_state.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
save_cdb_state
Saves CDB state in a file.
Currently uses dill.dump to save the relevant fields/values.
Parameters:
-
–cdbThe CDB from which to grab the state.
-
(file_pathstr) –The file to dump the state.
Source code in medcat-v2/medcat/utils/cdb_state.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |