medcat.tokenizing.spacy_impl.tokenizers
Classes:
Functions:
Attributes:
-
logger–
SpacyTokenizer
SpacyTokenizer(spacy_model_name: str, spacy_disabled_components: list[str], use_diacritics: bool, max_document_length: int, tokenizer_getter: Callable[[Language, bool], Tokenizer] = spacy_split_all, stopwords: Optional[set[str]] = None, avoid_pipe: bool = False)
Bases: BaseTokenizer
Methods:
-
create_entity– -
create_new_tokenizer– -
entity_from_tokens– -
get_doc_class– -
get_entity_class– -
load_internals_from– -
save_internals_to–
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
create_entity
create_entity(doc: MutableDocument, token_start_index: int, token_end_index: int, label: str) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
72 73 74 75 76 77 | |
create_new_tokenizer
classmethod
create_new_tokenizer(config: Config) -> SpacyTokenizer
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
94 95 96 97 98 99 100 101 102 103 | |
entity_from_tokens
entity_from_tokens(tokens: list[MutableToken]) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
79 80 81 82 83 84 85 | |
get_doc_class
get_doc_class() -> Type[MutableDocument]
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
105 106 | |
get_entity_class
get_entity_class() -> Type[MutableEntity]
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
108 109 | |
load_internals_from
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
123 124 | |
save_internals_to
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
113 114 115 116 117 118 119 120 121 | |
spacy_split_all
spacy_split_all(nlp: Language, use_diacritics: bool) -> Tokenizer
Source code in medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |