Database#

Classes and functions to interact with databases of publications.

Blueprint#

Abstract description of GISMAP’ DB interface.

class gismap.database.blueprint.DBAuthor(name: str, id: str = None, aliases: list = <factory>)[source]#

Blueprint for DB-specific author management.

aliases: list#

Alternative names for the author.

db_name: ClassVar[str] = None#

Name of the database.

id: str = None#

Id of author in DB.

property is_set#

bool Is the author identified in DB?

iter_keys()[source]#
Yields:

str or int – Key of author (typically name, alias, or internal key).

name: str#

Author name.

populate_id(s=None)[source]#

Try to automatically fill-in DB information. If one unique match is found, the data is integrated.

Otherwise, a warning is issued, with some suggestions/URLS.

Parameters:

s (Session, optional) – Session.

Returns:

Number of matches found (i.e. 1 means success).

Return type:

int

query_id(s=None)[source]#
Parameters:

s (Session, optional) – Session.

Returns:

Potential matches.

Return type:

list

query_id_backoff: ClassVar[float] = 0.0#

Time to wait between 2 query id calls.

query_publications(s=None)[source]#
Parameters:

s (Session, optional) – Session.

Returns:

Papers available in DB.

Return type:

list

query_publications_backoff: ClassVar[float] = 0.0#

Time to wait between 2 query publications calls.

update_values(author)[source]#
Parameters:

author (DBAuthor) – External author info to inject in current instance.

Return type:

None

property url#

str URL associated with the author in the DB.

gismap.database.blueprint.clean_aliases(name, alias_list)[source]#
Parameters:
  • name (str) – Main name.

  • alias_list (list) – Aliases.

Returns:

Aliases deduped, sorted, and with main name removed.

Return type:

list

DBLP#

Interface for dblp computer science bibliography (https://dblp.org/).

class gismap.database.dblp.DBLPAuthor(name: str, id: str = None, aliases: list = <factory>)[source]#
db_name: ClassVar[str] = 'dblp'#

Name of the database.

query_id(s=None)[source]#
Parameters:

s (Session, optional) – Session.

Returns:

Potential matches.

Return type:

list

query_id_backoff: ClassVar[float] = 7.0#

Time to wait between 2 query id calls.

query_publications(s=None)[source]#
Parameters:

s (Session, optional) – Session.

Returns:

Papers available in DB.

Return type:

list

query_publications_backoff: ClassVar[float] = 2.0#

Time to wait between 2 query publications calls.

property url#

str URL associated with the author in the DB.

HAL#

Interface for HyperArticles en Ligne (https://hal.science/).

class gismap.database.hal.HALAuthor(name: str, id: str = None, aliases: list = <factory>, pid: int = None, alt_pids: list = <factory>)[source]#
alt_pids: list#

One author has one unique hal-id but possibly multiple Personal Ids. Extra pids should be put here.

db_name: ClassVar[str] = 'hal'#

Name of the database.

property is_set#

bool Is the author identified in DB?

static parse_entry(r)[source]#
Parameters:

r (dict) – Raw dict of a result (paper).

Returns:

The paper as a sanitized dictionary.

Return type:

dict

pid: int = None#

Personal Id, an integer that can be used when hal-id is not available.

query_id(s=None)[source]#
Parameters:

s (Session, optional) – Session.

Returns:

Potential matches.

Return type:

list

Examples

>>> fabien = HALAuthor("Fabien Mathieu")
>>> fabien
HALAuthor(name='Fabien Mathieu')
>>> fabien.url
'https://hal.science/search/index?q=Fabien+Mathieu'
>>> fabien.populate_id()
1
>>> fabien
HALAuthor(name='Fabien Mathieu', id='fabien-mathieu')
>>> fabien.url
'https://hal.science/search/index/?q=*&authIdHal_s=fabien-mathieu'
>>> laurent = HALAuthor("Laurent Viennot")
>>> laurent.query_id()
[HALAuthor(name='Laurent Viennot', id='laurentviennot')]
>>> unknown = HALAuthor("NotaSearcherName")
>>> unknown
HALAuthor(name='NotaSearcherName')
>>> unknown.populate_id()
0
>>> unknown
HALAuthor(name='NotaSearcherName')
>>> ana = HALAuthor("Ana Busic")
>>> ana.populate_id()
1
>>> ana
HALAuthor(name='Ana Busic', id='anabusic')
>>> diego = HALAuthor("Diego Perino") 
>>> diego.query_id()
[HALAuthor(name='Diego Perino', pid=847558), HALAuthor(name='Diego Perino', pid=978810)]
>>> HALAuthor(name='Diego Perino', pid=978810).url
'https://hal.science/search/index/?q=*&authIdPerson_i=978810'
query_publications(s=None)[source]#
Parameters:

s (Session, optional) – Session.

Returns:

Papers available in HAL.

Return type:

list

Examples

>>> fabien = HALAuthor(name='Fabien', id='fabien-mathieu')
>>> publications = sorted(fabien.query_publications(),
...                 key=lambda p: p['title'])
>>> publications[2] 
{'title': 'Achievable Catalog Size in Peer-to-Peer Video-on-Demand Systems',
'abstract': 'We analyze a system where $n$ set-top boxes with same upload and storage capacities collaborate to
serve $r$ videos simultaneously (a typical value is $r=n$). We give upper and lower bounds on the catalog size
of the system, i.e. the maximal number of distinct videos that can be stored in such a system so that any demand
of at most $r$ videos can be served. Besides $r/n$, the catalog size is constrained by the storage capacity, the
upload capacity, and the maximum number of simultaneous connections a box can open. We show that the achievable
catalog size drastically increases when the upload capacity of the boxes becomes strictly greater than the
playback rate of videos.', 'key': '471724',
'conference': 'Proceedings of the 7th Internnational Workshop on Peer-to-Peer Systems (IPTPS)',
'type': 'conference', 'year': 2008, 'url': 'https://inria.hal.science/inria-00471724v1',
'authors': [HALAuthor(name='Yacine Boufkhad', id='yacine-boufkhad', pid=7352),
HALAuthor(name='Fabien Mathieu', id='fabien-mathieu', pid=446),
HALAuthor(name='Fabien de Montgolfier', pid=949013), HALAuthor(name='Diego Perino'),
HALAuthor(name='Laurent Viennot', id='laurentviennot', pid=1841)],
'venue': 'Proceedings of the 7th Internnational Workshop on Peer-to-Peer Systems (IPTPS)', 'origin': 'hal'}
>>> publications[-7] 
{'title': 'Upper bounds for stabilization in acyclic preference-based systems',
'abstract': 'Preference-based systems (p.b.s.) describe interactions between nodes of a system that can rank
their neighbors. Previous work has shown that p.b.s. converge to a unique locally stable matching if an
acyclicity property is verified. In the following we analyze acyclic p.b.s. with respect to the
self-stabilization theory. We prove that the round complexity is bounded by n/2 for the adversarial daemon.
The step complexity is equivalent to (n^2)/4 for the round robin daemon, and exponential for the general
adversarial daemon.', 'key': '668356',
'conference': "SSS'07 - 9th international conference on Stabilization, Safety,
and Security of Distributed Systems",
'type': 'conference', 'year': 2007, 'url': 'https://inria.hal.science/hal-00668356v1',
'authors': [HALAuthor(name='Fabien Mathieu', id='fabien-mathieu', pid=446)],
'venue': "SSS'07 - 9th international conference on Stabilization, Safety,
and Security of Distributed Systems", 'origin': 'hal'}

Case of someone with multiple ids one want to cumulate:

>>> emilios = HALAuthor('Emilio Calvanese').query_id()
>>> emilios 
[HALAuthor(name='Emilio Calvanese', pid=911234)]
>>> len(emilios[0].query_publications())
69

Note: an error is raised if not enough data is provided

>>> HALAuthor('Fabien Mathieu').query_publications()
Traceback (most recent call last):
...
ValueError: HALAuthor(name='Fabien Mathieu') must have id or pid for publications to be fetched.
update_values(author)[source]#
Parameters:

author (DBAuthor) – External author info to inject in current instance.

Return type:

None

property url#

str URL associated with the author in the DB.

gismap.database.hal.parse_facet_author(a)[source]#
Parameters:

a (str) – Formatted APH author string from HAL API.

Returns:

Sanitized version.

Return type:

HALAuthor