deduplicationdict#

Package Contents#

Classes#

DeDuplicationDict

A dictionary that de-duplicates values.

Attributes#

deduplicationdict.__package__ = 'deduplicationdict'[source]#
deduplicationdict.__author__ = 'Vivswan Shah (vivswanshah@pitt.edu)'[source]#
deduplicationdict.__version__[source]#
class deduplicationdict.DeDuplicationDict(*args, _value_dict: dict = None, **kwargs)[source]#

Bases: collections.abc.MutableMapping

A dictionary that de-duplicates values.

A dictionary-like class that deduplicates values by storing them in a separate dictionary and replacing them with their corresponding hash values. This class is particularly useful for large dictionaries with repetitive entries, as it can save memory by storing values only once and substituting recurring values with their hash representations.

This class supports nested structures by automatically converting nested dictionaries into DeDuplicationDict instances. It also provides various conversion methods to convert between regular dictionaries and DeDuplicationDict instances.

Variables
  • hash_length (int) – The length of the hash value used for deduplication.

  • auto_clean_up (bool) – Whether to automatically clean up unused hash values when deleting items.

  • skip_update_on_setitem (bool) – Whether to skip updating the value dictionary when setting an item.

  • key_dict (dict) – A dictionary that maps hash values to their corresponding values.

  • value_dict (dict) – A dictionary that maps values to their corresponding hash values.

_set_value_dict(value_dict: dict, skip_update: bool = False) DeDuplicationDict[source]#

Update the value dictionary and propagate the changes to nested DeDuplicationDict instances.

Parameters
  • value_dict (dict) – The new value dictionary to use for deduplication.

  • skip_update (bool) – Whether to skip updating the value dictionary of nested

Returns

self

Return type

DeDuplicationDict

__setitem__(key: KT, value: VT) None[source]#

Set the value for the given key, deduplicating the value if necessary.

Parameters
  • key (KT) – The key to set the value for.

  • value (VT) – The value to set for the given key.

__getitem__(key: KT) VT_co[source]#

Get the value for the given key.

Parameters

key (KT) – The key to get the value for.

Returns

The value for the given key.

Return type

VT_co

Raises
  • KeyError – If the key is not found in the dictionary.

  • TypeError – If the value type is not supported.

all_hashes_in_use() set[source]#

Get all hash values currently in use.

Returns

A set of all hash values in use.

Return type

set

clean_up() DeDuplicationDict[source]#

Remove unused hash values from the value dictionary.

Returns

self

Return type

DeDuplicationDict

detach() DeDuplicationDict[source]#

Detach the DeDuplicationDict instance from its value dictionary, creating a standalone instance.

Returns

A new DeDuplicationDict instance with its own value dictionary.

Return type

DeDuplicationDict

__deepcopy__(memo: dict) DeDuplicationDict[source]#

Create a deep copy of the DeDuplicationDict instance.

Parameters

memo (dict) – A dictionary of memoized values.

Returns

A new DeDuplicationDict instance with its own value dictionary.

Return type

DeDuplicationDict

_del_detach() DeDuplicationDict[source]#

Detach the DeDuplicationDict instance from its value dictionary and clean up unused hash values.

Returns

self

Return type

DeDuplicationDict

__delitem__(key: KT) None[source]#

Delete the item with the given key.

Parameters

key (KT) – The key of the item to delete.

Raises

KeyError – If the key is not found in the dictionary.

__len__() int[source]#

Get the number of items in the dictionary.

Returns

The number of items in the dictionary.

Return type

int

__iter__() Iterator[T_co][source]#

Get an iterator over the keys in the dictionary.

Returns

An iterator over the keys in the dictionary.

Return type

Iterator[T_co]

__repr__() str[source]#

Get a string representation of the DeDuplicationDict instance.

Returns

A string representation of the DeDuplicationDict instance.

Return type

str

to_dict() dict[source]#

Convert the DeDuplicationDict instance to a regular dictionary.

Returns

A regular dictionary with the same key-value pairs as the DeDuplicationDict instance.

Return type

dict

classmethod from_dict(d: dict) DeDuplicationDict[source]#

Create a DeDuplicationDict instance from a regular dictionary.

Parameters

d (dict) – The dictionary to create the DeDuplicationDict instance from.

Returns

A new DeDuplicationDict instance with the same key-value pairs as the given dictionary.

Return type

DeDuplicationDict

_get_key_dict() dict[source]#

Get the key dictionary of the DeDuplicationDict instance in a normal dictionary format.

Returns

The key dictionary of the DeDuplicationDict instance.

Return type

dict

to_json_save_dict() dict[source]#

Convert the DeDuplicationDict instance to a dictionary that can be saved to a JSON file.

Returns

A dictionary that can be saved to a JSON file.

Return type

dict

_set_key_dict(key_dict: dict) DeDuplicationDict[source]#

Set the key dictionary of the DeDuplicationDict instance from a normal dictionary format.

Parameters

key_dict (dict) – The key dictionary to set.

Returns

self

Return type

DeDuplicationDict

classmethod from_json_save_dict(d: dict, _v: dict = None) DeDuplicationDict[source]#

Create a DeDuplicationDict instance from a dictionary that was saved to a JSON file.

Parameters
  • d (dict) – The dictionary that was saved to a JSON file.

  • _v (dict, optional) – The value dictionary to use. Defaults to None.

Returns

A new DeDuplicationDict instance with the same key-value pairs as the given dictionary.

Return type

DeDuplicationDict