medcat.storage.serialisers
Classes:
-
AvailableSerialisers–Describes the available serialisers.
-
DillSerialiser–The dill based serialiser.
-
Serialiser–The abstract serialiser base class.
Functions:
-
deserialise–Deserialise contents of a folder.
-
get_serialiser–Get the serialiser based on the type specified.
-
get_serialiser_from_folder–Get the serialiser that was used to serialise the data in the folder.
-
get_serialiser_type_from_folder–Get the serialiser type that was used to serialise data in the folder.
-
serialise–Serialise an object based on the specified serialiser type.
Attributes:
MANUAL_SERIALISED_RE
module-attribute
MANUAL_SERIALISED_RE = compile(escape(MANUAL_SERIALISED_TAG) + '(.*)')
MANUAL_SERIALISED_TAG
module-attribute
MANUAL_SERIALISED_TAG = 'MANUALLY_SERIALISED:'
SER_TYPE_FILE
module-attribute
SER_TYPE_FILE = '.serialised_by'
AvailableSerialisers
Bases: Enum
Describes the available serialisers.
Methods:
Attributes:
from_file
classmethod
from_file(file_path: str) -> AvailableSerialisers
Source code in medcat-v2/medcat/storage/serialisers.py
243 244 245 246 | |
write_to
write_to(file_path: str) -> None
Source code in medcat-v2/medcat/storage/serialisers.py
239 240 241 | |
DillSerialiser
Bases: Serialiser
The dill based serialiser.
Methods:
-
deserialise– -
serialise–
Attributes:
-
ser_type–
deserialise
Source code in medcat-v2/medcat/storage/serialisers.py
257 258 259 | |
Serialiser
Bases: ABC
The abstract serialiser base class.
This class is responsible for both serialising and deserialising.
Methods:
-
check_ser_type–Check that the folder contains data serialised by this serialiser.
-
deserialise–Deserialise data written to the specified file.
-
deserialise_all–Deserialise contents of folder.
-
deserialise_manually– -
get_manually_serialised_path– -
get_ser_type_file– -
save_ser_type_file–Save the serialiser type into the specified folder.
-
serialise–Serialise the raw attributes / objects.
-
serialise_all–Serialise the entire object into the target folder.
Attributes:
-
RAW_FILE– -
ser_type(AvailableSerialisers) –The serialiser type.
RAW_FILE
class-attribute
instance-attribute
RAW_FILE = 'raw_dict.dat'
check_ser_type
Check that the folder contains data serialised by this serialiser.
Parameters:
-
(folderstr) –Target folder.
Raises:
-
TypeError–If the folder was not serialised by this serialiser.
Source code in medcat-v2/medcat/storage/serialisers.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
deserialise
abstractmethod
deserialise(target_file: str) -> dict[str, Any]
Deserialise data written to the specified file.
Parameters:
-
(target_filestr) –The file to read from.
Returns:
Source code in medcat-v2/medcat/storage/serialisers.py
51 52 53 54 55 56 57 58 59 60 61 | |
deserialise_all
deserialise_all(folder_path: str, ignore_folders_prefix: set[str] = set(), ignore_folders_suffix: set[str] = set(), **kwargs) -> Serialisable
Deserialise contents of folder.
Additional initialisation keyword arguments can be provided if needed.
This loads both the raw attributes for this object as well as the serialisable parts / attributes recursively.
Parameters:
-
(folder_pathstr) –The folder path.
-
(ignore_folders_prefixset[str], default:set()) –The prefixes of folders to ignore.
-
(ignore_folders_suffixset[str], default:set()) –The suffixes of folders to ignore.
Returns:
-
Serialisable(Serialisable) –The resulting object.
Source code in medcat-v2/medcat/storage/serialisers.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
deserialise_manually
classmethod
deserialise_manually(folder_path: str, man_cls_path: str, **init_kwargs) -> Serialisable
Source code in medcat-v2/medcat/storage/serialisers.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
get_manually_serialised_path
classmethod
Source code in medcat-v2/medcat/storage/serialisers.py
76 77 78 79 80 81 82 83 84 | |
get_ser_type_file
classmethod
Source code in medcat-v2/medcat/storage/serialisers.py
63 64 65 | |
save_ser_type_file
Save the serialiser type into the specified folder.
Parameters:
-
(folderstr) –The folder to use.
Source code in medcat-v2/medcat/storage/serialisers.py
67 68 69 70 71 72 73 74 | |
serialise
abstractmethod
Serialise the raw attributes / objects.
Parameters:
-
(raw_partsdict[str, Any]) –The raw objects to serialise.
-
(target_filestr) –The file name to write to.
Source code in medcat-v2/medcat/storage/serialisers.py
41 42 43 44 45 46 47 48 49 | |
serialise_all
serialise_all(obj: Serialisable, target_folder: str, overwrite: bool = False) -> None
Serialise the entire object into the target folder.
This finds the serialisable parts (attributes) of the object and calls the same method on them recursively. It also finds the raw attributes (if any) and serialises them.
Parameters:
-
(objSerialisable) –The object to serialise.
-
(target_folderstr) –The target folder.
-
(overwritebool, default:False) –Whether to allow overwriting. Defaults to False.
Raises:
-
IllegalSchemaException–If there's multiple parts with the same name or a file already exists.
Source code in medcat-v2/medcat/storage/serialisers.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
deserialise
deserialise(folder_path: str, ignore_folders_prefix: set[str] = set(), ignore_folders_suffix: set[str] = set(), **init_kwargs) -> Serialisable
Deserialise contents of a folder.
Extra init keyword arguments can be provided if needed. These are generally: - cnf: The config relevant to the components - tokenizer (BaseTokenizer): The base tokenizer for the model - cdb (CDB): The CDB for the model - vocab (Vocab): The Vocab for the model - model_load_path (Optional[str]): The model load path, but not the component load path
This method finds the serialiser to be used based on the files on disk.
Parameters:
-
(folder_pathstr) –The folder to serialise.
-
(ignore_folders_prefixset[str], default:set()) –The prefixes of folders to ignore.
-
(ignore_folders_suffixset[str], default:set()) –The suffixes of folders to ignore.
Returns:
-
Serialisable(Serialisable) –The deserialised object.
Source code in medcat-v2/medcat/storage/serialisers.py
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | |
get_serialiser
get_serialiser(serialiser_type: Union[str, AvailableSerialisers] = _DEF_SER) -> Serialiser
Get the serialiser based on the type specified.
Parameters:
-
(serialiser_typeUnion[str, AvailableSerialisers], default:_DEF_SER) –The required type. Defaults to 'dill'.
Raises:
-
ValueError–If no serialiser is found.
Returns:
-
Serialiser(Serialiser) –The appropriate serialiser.
Source code in medcat-v2/medcat/storage/serialisers.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 | |
get_serialiser_from_folder
get_serialiser_from_folder(folder_path: str) -> Serialiser
Get the serialiser that was used to serialise the data in the folder.
Parameters:
-
(folder_pathstr) –The folder in question.
Returns:
-
Serialiser(Serialiser) –The appropriate serialiser.
Source code in medcat-v2/medcat/storage/serialisers.py
304 305 306 307 308 309 310 311 312 313 314 315 316 | |
get_serialiser_type_from_folder
get_serialiser_type_from_folder(folder_path: str) -> AvailableSerialisers
Get the serialiser type that was used to serialise data in the folder.
Parameters:
-
(folder_pathstr) –The folder in question.
Returns:
-
AvailableSerialisers(AvailableSerialisers) –The serialiser type.
Source code in medcat-v2/medcat/storage/serialisers.py
291 292 293 294 295 296 297 298 299 300 301 | |
serialise
serialise(serialiser_type: Union[str, AvailableSerialisers], obj: Serialisable, target_folder: str, overwrite: bool = False) -> None
Serialise an object based on the specified serialiser type.
Parameters:
-
(serialiser_typeUnion[str, AvailableSerialisers]) –The serialiser type.
-
(objSerialisable) –The object to serialise.
-
(target_folderstr) –The folder to serialise into.
-
(overwritebool, default:False) –Whether to allow overwriting. Defaults to False.
Source code in medcat-v2/medcat/storage/serialisers.py
319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 | |