Utils#
Various functions and classes.
Common#
All-purpose functions.
- class gismap.utils.common.Data(data)[source]#
Easy-going converter of dict to dataclass. Useful when you want to use attribute access and do not care about giving a full description.
Examples
>>> data = Data({ ... 'name': 'Alice', ... 'age': 30, ... 'address': {'street': '123 Main', 'city': 'Paris'}, ... 'hobbies': [{'name': 'jazz', 'level': 5}, {'name': 'code'}]}) >>> data Data(name='Alice', age=30, address=Data(street='123 Main', city='Paris'), hobbies=[Data(name='jazz', level=5), Data(name='code')]) >>> data.hobbies[0].name 'jazz' >>> data.todict() {'name': 'Alice', 'age': 30, 'address': {'street': '123 Main', 'city': 'Paris'}, 'hobbies': [{'name': 'jazz', 'level': 5}, {'name': 'code'}]}
- class gismap.utils.common.LazyRepr[source]#
MixIn that provides a clean repr for dataclasses.
Hides empty fields and fields in HIDDEN_KEYS from the repr string. Private attributes (starting with ‘_’) are also hidden.
- gismap.utils.common.get_classes(root, key='name', recurse=False)[source]#
- Parameters:
- Returns:
Dictionaries of all subclasses that have a key attribute (as in class attribute key).
- Return type:
Examples
>>> from gismap.sources.models import DB >>> subclasses = get_classes(DB, key='db_name') >>> dict(sorted(subclasses.items())) {'dblp': <class 'gismap.sources.dblp.DBLP'>, 'hal': <class 'gismap.sources.hal.HAL'>, 'ldb': <class 'gismap.sources.ldb.LDB'>}
- gismap.utils.common.list_of_objects(clss, dico, default=None)[source]#
Versatile way to enter a list of objects referenced by a dico.
- Parameters:
- Returns:
Proper list of objects.
- Return type:
Examples
>>> from gismap.sources.models import DB >>> subclasses = get_classes(DB, key='db_name') >>> from gismap import HAL, DBLP, LDB >>> list_of_objects([HAL, 'ldb'], subclasses) [<class 'gismap.sources.hal.HAL'>, <class 'gismap.sources.ldb.LDB'>] >>> list_of_objects(None, subclasses, [DBLP]) [<class 'gismap.sources.dblp.DBLP'>] >>> list_of_objects(LDB, subclasses) [<class 'gismap.sources.ldb.LDB'>] >>> list_of_objects('hal', subclasses) [<class 'gismap.sources.hal.HAL'>]
Requests#
Functions related to the requests.
- gismap.utils.requests.get(url, params=None, n_trials=10, verify=True, encoding=None)[source]#
- Parameters:
url (
str) – Entry point to fetch.params (
dict, optional) – Get arguments (appended to URL).n_trials (
int, default=10) – Number of attempts to fetch URL.verify (
bool, default=True) – Verify certificates.encoding (
str, optional) – Force response encoding (e.g."utf-8"). Useful when the server does not declare the charset andrequestsfalls back to ISO-8859-1.
- Returns:
Result.
- Return type:
Logger#
Keep track of things.
- gismap.utils.logger.logger = <Logger GisMap (INFO)>#
Default logging interface.
Zlist#
Convert a list into a succession of compressed frames. Reduces memory footprint at the price of slower random access (sequential access is unaffected).
- class gismap.utils.zlist.ZList(frame_size=1000)[source]#
Compressed list with frame-based storage.
Stores elements in compressed frames, allowing efficient memory usage while maintaining random access. Uses zstandard compression.
Use as a context manager for building:
- with ZList(frame_size=1000) as z:
- for item in data:
z.append(item)
- Parameters:
frame_size (
int, default=1000) – Number of elements per compressed frame.
Text#
Text manipulation tools.
- class gismap.utils.text.Corrector(voc, score_cutoff=20, min_length=3)[source]#
A simple word corrector base on input vocabulary. Short words are discarded.
- Parameters:
Examples
>>> vocabulary = ['My Taylor Swift is Rich'] >>> phrase = "How riche ise Tailor Swyft" >>> cor = Corrector(vocabulary, min_length=4) >>> cor(phrase) 'How rich ise taylor swift' >>> cor = Corrector(vocabulary, min_length=2) >>> cor(phrase) 'How rich is taylor swift'
- gismap.utils.text.asciify(text)[source]#
- Parameters:
text (
str) – Some text (typically names) with annoying accents.- Returns:
Same text simplified into ascii.
- Return type:
Examples
>>> asciify('Ana Bušić') 'Ana Busic' >>> asciify("Thomas Deiß") 'Thomas Deiss'
- gismap.utils.text.normalized_name(txt)[source]#
Try to normalize names for facilitating comparisons. Name is lowered, split, asciified, sorted, and filtered.
Examples
>>> normalized_name("Thomas Deiß") 'deiss thomas' >>> normalized_name("Dario Rossi 001") 'dario rossi' >>> normalized_name("James W. Roberts") 'james roberts'