medcat.utils.data_utils
Classes:
Functions:
-
get_false_positives–Get the false positives within a trainer export.
-
make_mc_train_test–Make train set.
TestTrainSplitter
TestTrainSplitter(data: MedCATTrainerExport, cdb: CDB, test_size: float = 0.2)
Methods:
-
split–
Attributes:
-
MAX_TEST_FRACTION– -
MIN_CNT_FOR_TEST– -
cdb– -
data– -
test_size–
Source code in medcat-v2/medcat/utils/data_utils.py
15 16 17 18 19 20 | |
MAX_TEST_FRACTION
class-attribute
instance-attribute
MAX_TEST_FRACTION = 0.3
MIN_CNT_FOR_TEST
class-attribute
instance-attribute
MIN_CNT_FOR_TEST = 10
cdb
instance-attribute
cdb = cdb
data
instance-attribute
data = data
test_size
instance-attribute
test_size = test_size
split
split() -> tuple[MedCATTrainerExport, MedCATTrainerExport, int, int]
Source code in medcat-v2/medcat/utils/data_utils.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
get_false_positives
get_false_positives(doc: MedCATTrainerExportDocument, spacy_doc: MutableDocument) -> list[MutableEntity]
Get the false positives within a trainer export.
Parameters:
-
(docMedCATTrainerExportDocument) –The trainer export.
-
(spacy_docMutableDocument) –The annotated document.
Returns:
-
list[MutableEntity]–list[MutableEntity]: The list of false positive entities.
Source code in medcat-v2/medcat/utils/data_utils.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
make_mc_train_test
Make train set.
This is a disaster.
Parameters:
-
(dataMedCATTrainerExport) –The data.
-
(cdbCDB) –The concept database.
-
(test_sizefloat, default:0.2) –The test size. Defaults to 0.2.
Returns:
-
tuple(tuple) –The train set, the test set, the test annotations, and the total annotations
Source code in medcat-v2/medcat/utils/data_utils.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |