Utils#

Various functions and classes.

Common#

All-purpose functions.

class gismap.utils.common.Data(data)[source]#

Easy-going converter of dict to dataclass. Useful when you want to use attribute access and do not care about giving a full description.

Examples

>>> data = Data({
... 'name': 'Alice',
... 'age': 30,
... 'address': {'street': '123 Main', 'city': 'Paris'},
... 'hobbies': [{'name': 'jazz', 'level': 5}, {'name': 'code'}]})
>>> data 
Data(name='Alice', age=30, address=Data(street='123 Main', city='Paris'),
hobbies=[Data(name='jazz', level=5), Data(name='code')])
>>> data.hobbies[0].name
'jazz'
>>> data.todict()  
{'name': 'Alice', 'age': 30, 'address': {'street': '123 Main', 'city': 'Paris'},
'hobbies': [{'name': 'jazz', 'level': 5}, {'name': 'code'}]}
class gismap.utils.common.LazyRepr[source]#

MixIn that provides a clean repr for dataclasses.

Hides empty fields and fields in HIDDEN_KEYS from the repr string. Private attributes (starting with ‘_’) are also hidden.

gismap.utils.common.get_classes(root, key='name', recurse=False)[source]#
Parameters:
  • root (class) – Starting class (can be abstract).

  • key (str, default=’name’) – Attribute to look-up

  • recurse (bool, default=False) – Recursively traverse subclasses.

Returns:

Dictionaries of all subclasses that have a key attribute (as in class attribute key).

Return type:

dict

Examples

>>> from gismap.sources.models import DB
>>> subclasses = get_classes(DB, key='db_name')
>>> dict(sorted(subclasses.items())) 
{'dblp': <class 'gismap.sources.dblp.DBLP'>,
'hal': <class 'gismap.sources.hal.HAL'>,
'ldb': <class 'gismap.sources.ldb.LDB'>}
gismap.utils.common.list_of_objects(clss, dico, default=None)[source]#

Versatile way to enter a list of objects referenced by a dico.

Parameters:
  • clss (object) – Object or reference to an object or list of objects / references to objects.

  • dico (dict) – Dictionary of references to objects.

  • default (list, optional) – Default list to return if clss is None.

Returns:

Proper list of objects.

Return type:

list

Examples

>>> from gismap.sources.models import DB
>>> subclasses = get_classes(DB, key='db_name')
>>> from gismap import HAL, DBLP, LDB
>>> list_of_objects([HAL, 'ldb'], subclasses)
[<class 'gismap.sources.hal.HAL'>, <class 'gismap.sources.ldb.LDB'>]
>>> list_of_objects(None, subclasses, [DBLP])
[<class 'gismap.sources.dblp.DBLP'>]
>>> list_of_objects(LDB, subclasses)
[<class 'gismap.sources.ldb.LDB'>]
>>> list_of_objects('hal', subclasses)
[<class 'gismap.sources.hal.HAL'>]
gismap.utils.common.unlist(x)[source]#
Parameters:

x (str or list or int) – Something.

Returns:

x – If it’s a list, make it flat.

Return type:

str or int

Requests#

Functions related to the requests.

gismap.utils.requests.get(url, params=None, n_trials=10, verify=True, encoding=None)[source]#
Parameters:
  • url (str) – Entry point to fetch.

  • params (dict, optional) – Get arguments (appended to URL).

  • n_trials (int, default=10) – Number of attempts to fetch URL.

  • verify (bool, default=True) – Verify certificates.

  • encoding (str, optional) – Force response encoding (e.g. "utf-8"). Useful when the server does not declare the charset and requests falls back to ISO-8859-1.

Returns:

Result.

Return type:

str

Logger#

Keep track of things.

gismap.utils.logger.logger = <Logger GisMap (INFO)>#

Default logging interface.

Zlist#

Convert a list into a succession of compressed frames. Reduces memory footprint at the price of slower random access (sequential access is unaffected).

class gismap.utils.zlist.ZList(frame_size=1000)[source]#

Compressed list with frame-based storage.

Stores elements in compressed frames, allowing efficient memory usage while maintaining random access. Uses zstandard compression.

Use as a context manager for building:

with ZList(frame_size=1000) as z:
for item in data:

z.append(item)

Parameters:

frame_size (int, default=1000) – Number of elements per compressed frame.

append(entry)[source]#

Add an element to the list.

Parameters:

entry – Element to add.

Text#

Text manipulation tools.

class gismap.utils.text.Corrector(voc, score_cutoff=20, min_length=3)[source]#

A simple word corrector base on input vocabulary. Short words are discarded.

Parameters:
  • voc (list) – Words (each entry may contain multiple words).

  • score_cutoff (int, default=20) – Threshold for correction.

  • min_length (int, default=3) – Minimal number of characters for correction to kick in.

Examples

>>> vocabulary = ['My Taylor Swift is Rich']
>>> phrase = "How riche ise Tailor Swyft"
>>> cor = Corrector(vocabulary, min_length=4)
>>> cor(phrase)
'How rich ise taylor swift'
>>> cor = Corrector(vocabulary, min_length=2)
>>> cor(phrase)
'How rich is taylor swift'
gismap.utils.text.asciify(text)[source]#
Parameters:

text (str) – Some text (typically names) with annoying accents.

Returns:

Same text simplified into ascii.

Return type:

str

Examples

>>> asciify('Ana Bušić')
'Ana Busic'
>>> asciify("Thomas Deiß")
'Thomas Deiss'
gismap.utils.text.clean_aliases(name, alias_list)[source]#
Parameters:
  • name (str) – Main name.

  • alias_list (list or set) – Aliases.

Returns:

Aliases deduped, sorted, and with main name removed.

Return type:

list

gismap.utils.text.normalized_name(txt)[source]#

Try to normalize names for facilitating comparisons. Name is lowered, split, asciified, sorted, and filtered.

Parameters:

txt (str)

Return type:

str

Examples

>>> normalized_name("Thomas Deiß")
'deiss thomas'
>>> normalized_name("Dario Rossi 001")
'dario rossi'
>>> normalized_name("James W. Roberts")
'james roberts'
gismap.utils.text.reduce_keywords(kws)[source]#

Remove redundant subparts.

Parameters:

kws (list) – List of words / co-locations.

Returns:

Reduced list

Return type:

list

Examples

>>> reduce_keywords(['P2P', 'Millimeter Waves', 'Networks', 'P2P Networks', 'Waves'])
['Millimeter Waves', 'P2P Networks']