Skip to content

medcat.utils.hasher

Classes:

  • Hasher

    A consistent hasher.

Functions:

  • dumps

    Dump the content of an object to bytes.

Hasher

Hasher(dumper: Callable[[Any, bool], bytes] = dumps)

A consistent hasher.

This class is able to hash the same object(s) to the same value every time. This is in contrast to the normal hashing in python that does not guarantee identical results over multiple runs.

Parameters:

Methods:

  • hexdigest

    Get the hex for the current hash state.

  • update

    Update the hasher with the object in question.

  • update_bytes

    Update the hasher with a byte array.

Attributes:

  • m
Source code in medcat-v2/medcat/utils/hasher.py
47
48
49
def __init__(self, dumper: Callable[[Any, bool], bytes] = dumps):
    self.m = xxhash.xxh64()
    self._dumper = dumper

m instance-attribute

m = xxh64()

hexdigest

hexdigest() -> str

Get the hex for the current hash state.

Returns:

  • str ( str ) –

    The hex representation of the hashed objects.

Source code in medcat-v2/medcat/utils/hasher.py
74
75
76
77
78
79
80
def hexdigest(self) -> str:
    """Get the hex for the current hash state.

    Returns:
        str: The hex representation of the hashed objects.
    """
    return self.m.hexdigest()

update

update(obj: Any, length: bool = False) -> None

Update the hasher with the object in question.

If length = True is passed, only the length of the byte array corresponding to the data is considered Otherwise the entire byte array is used.

Parameters:

  • obj

    (Any) –

    The object to be added / hashed.

  • length

    (bool, default: False ) –

    Whether to only dump the length of the file array. Defaults to False.

Source code in medcat-v2/medcat/utils/hasher.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def update(self, obj: Any, length: bool = False) -> None:
    """Update the hasher with the object in question.

    If `length = True` is passed, only the length of
    the byte array corresponding to the data is considered
    Otherwise the entire byte array is used.

    Args:
        obj (Any): The object to be added / hashed.
        length (bool, optional):
            Whether to only dump the length of the file array.
            Defaults to False.
    """
    self.m.update(self._dumper(obj, length))

update_bytes

update_bytes(b: bytes) -> None

Update the hasher with a byte array.

Parameters:

  • b

    (bytes) –

    The byte array to update with.

Source code in medcat-v2/medcat/utils/hasher.py
66
67
68
69
70
71
72
def update_bytes(self, b: bytes) -> None:
    """Update the hasher with a byte array.

    Args:
        b (bytes): The byte array to update with.
    """
    self.m.update(b)

dumps

dumps(obj: Any, length: bool = False) -> bytes

Dump the content of an object to bytes.

This method uses dill to dump the contents of an object into a BytesIO object and then either reads its bytes or (or length == True) simply reruns the process on the length of the byte array.

Parameters:

  • obj

    (Any) –

    The object to dump.

  • length

    (bool, default: False ) –

    Whether to only dump the length of the file array. Defaults to False.

Returns:

  • bytes ( bytes ) –

    The resulting byte array.

Source code in medcat-v2/medcat/utils/hasher.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def dumps(obj: Any, length: bool = False) -> bytes:
    """Dump the content of an object to bytes.

    This method uses dill to dump the contents of an
    object into a BytesIO object and then either
    reads its bytes or (or length == True) simply
    reruns the process on the length of the byte array.

    Args:
        obj (Any): The object to dump.
        length (bool, optional):
            Whether to only dump the length of the file array.
            Defaults to False.

    Returns:
        bytes: The resulting byte array.
    """
    with StringIO() as file:
        dill.dump(obj, file, recurse=True)
        if length:
            return dumps(len(file.getvalue()), length=False)
        else:
            return file.getvalue()