medcat.model_creation.preprocess_umls
Classes:
-
UMLS–Pre-process UMLS release files:
Attributes:
-
all_vals– -
df– -
medcat_csv_mapper(dict) – -
pt2ch– -
random_4_keys– -
save_file– -
to_ICD10– -
to_ICD10_man– -
to_snomed– -
umls–
df
module-attribute
df = to_concept_df()
medcat_csv_mapper
module-attribute
medcat_csv_mapper: dict = {'CUI': 'cui', 'STR': 'name', 'SAB': 'ontologies', 'ISPREF': 'name_status', 'TUI': 'type_ids'}
pt2ch
module-attribute
pt2ch = get_pt2ch()
save_file
module-attribute
save_file = 'preprocessed_umls.csv'
to_ICD10
module-attribute
to_ICD10 = map_umls2icd10()
to_ICD10_man
module-attribute
to_ICD10_man = map_umls2source(sources=['ICD10'])
to_snomed
module-attribute
to_snomed = map_umls2snomed()
UMLS
Pre-process UMLS release files: Args: main_file_name (str): Path to the main file name (probably MRCONSO.RRF) sem_types_file (str): Path to the semantic types file name (probably MRSTY.RRF) allow_langugages (list): Languages to filter out. Defaults to just English (['ENG']). sep (str): The separator used within the files. Defaults to '|'.
Methods:
-
get_pt2ch–Generates a parent to children dict.
-
map_umls2icd10–Map to ICD-10.
-
map_umls2snomed–Map to SNOMED-CT.
-
map_umls2source–Allows mapping to an arbitrary
-
to_concept_df–Create a concept DataFrame.
Attributes:
-
allow_langugages– -
main_columns– -
main_file_name– -
mrhier_columns– -
sem_types_columns– -
sem_types_file– -
sep–
Source code in medcat-v2/medcat/model_creation/preprocess_umls.py
72 73 74 75 76 77 78 79 80 81 82 | |
allow_langugages
instance-attribute
allow_langugages = list(allow_languages) if allow_languages else allow_languages
main_file_name
instance-attribute
main_file_name = main_file_name
sem_types_file
instance-attribute
sem_types_file = sem_types_file
sep
instance-attribute
sep = sep
get_pt2ch
get_pt2ch() -> dict
Generates a parent to children dict.
It goes through all the < # TODO
The resulting dictionary maps a CUI to a list of CUIs that consider that CUI as their parent.
PS: This expects the MRHIER.RRF file to also exist in the same folder as the MRCONSO.RRF file.
Raises:
-
ValueError–If the MRHIER.RRF file wasn't found
Returns:
-
dict(dict) –The dictionary of parent CUI and their children.
Source code in medcat-v2/medcat/model_creation/preprocess_umls.py
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 | |
map_umls2icd10
map_umls2icd10() -> DataFrame
Map to ICD-10.
Available SAB's that contain 'ICD10': - CCSR_ICD10CM - CCSR_ICD10CM (Clinical Classifications Software Refined for ICD-10-CM) - Synopsis - CCSR_ICD10PCS - CCSR_ICD10PCS (Clinical Classifications Software Refined for ICD-10-PCS) - Synopsis - DMDICD10 - DMDICD10 (ICD-10 German) - Statistics - ICD10AE - ICD10AE (ICD-10, American English Equivalents) - Synopsis - ICD10AMAE - ICD10AMAE (ICD-10, Australian Modification, Americanized English Equivalents) - Synopsis - ICD10AM - ICD10AM (ICD-10, Australian Modification) - Synopsis - ICD10DUT - ICD10DUT (ICD10, Dutch Translation) - Synopsis - ICD10PCS - ICD10PCS (ICD-10 Procedure Coding System) - Synopsis - ICD10 - ICD10 (International Classification of Diseases and Related Health Problems, Tenth Revision) - Synopsis - ICPC2ICD10DUT - ICPC2ICD10DUT (ICPC2-ICD10 Thesaurus, Dutch Translation) - Synopsis - ICPC2ICD10ENG - ICPC2ICD10ENG (ICPC2-ICD10 Thesaurus) - Synopsis - MTHICPC2ICD10AE - MTHICPC2ICD10AE (ICPC2E-ICD10 Thesaurus, American English Equivalents) - Synopsis
Currently only using 'ICD10'. But others may be relevant as well.
If one wants to use one of the other sources listed above, they would need to use the map_umls2source method.
Returns:
-
DataFrame–pd.DataFrame: DataFrame that has the ICD-10 codes
Source code in medcat-v2/medcat/model_creation/preprocess_umls.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | |
map_umls2snomed
map_umls2snomed() -> DataFrame
Map to SNOMED-CT.
Currently, uses the SCUI column. At the time of writing, this is equal to the CODE column. But this may not be the case in the future.
Returns:
-
DataFrame–pd.DataFrame: Dataframe that contains the SCUI (source CUI) as well as the UMLS CUI for each applicable concept
Source code in medcat-v2/medcat/model_creation/preprocess_umls.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
map_umls2source
Allows mapping to an arbitrary
Parameters:
Returns:
-
DataFrame–pd.DataFrame: DataFrame that has the target source codes
Source code in medcat-v2/medcat/model_creation/preprocess_umls.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
to_concept_df
to_concept_df() -> DataFrame
Create a concept DataFrame. The default column names are expected.
Returns:
-
DataFrame–pd.DataFrame: The resulting DataFrame
Source code in medcat-v2/medcat/model_creation/preprocess_umls.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |