Database#

Classes and functions to interact with databases of publications.

Blueprint#

Abstract description of GISMAP’ DB interface.

class gismap.database.blueprint.DBAuthor(name: str, id: str = None, aliases: list = <factory>)[source]#

Blueprint for DB-specific author management.

aliases: list#: Alternative names for the author.

db_name: ClassVar[str] = None#: Name of the database.

id: str = None#: Id of author in DB.

property is_set#: bool Is the author identified in DB?

iter_keys()[source]#

Yields:: str or int – Key of author (typically name, alias, or internal key).

name: str#: Author name.

populate_id(s=None)[source]#

Try to automatically fill-in DB information. If one unique match is found, the data is integrated.

Otherwise, a warning is issued, with some suggestions/URLS.

Parameters:: s (Session, optional) – Session.
Returns:: Number of matches found (i.e. 1 means success).
Return type:: int

query_id(s=None)[source]#

Parameters:: s (Session, optional) – Session.
Returns:: Potential matches.
Return type:: list

query_id_backoff: ClassVar[float] = 0.0#: Time to wait between 2 query id calls.

query_publications(s=None)[source]#

Parameters:: s (Session, optional) – Session.
Returns:: Papers available in DB.
Return type:: list

query_publications_backoff: ClassVar[float] = 0.0#: Time to wait between 2 query publications calls.

update_values(author)[source]#

Parameters:: author (DBAuthor) – External author info to inject in current instance.
Return type:: None

property url#: str URL associated with the author in the DB.

gismap.database.blueprint.clean_aliases(name, alias_list)[source]#

Parameters:

name (str) – Main name.
alias_list (list) – Aliases.

Returns:

Aliases deduped, sorted, and with main name removed.

Return type:

list

DBLP#

Interface for dblp computer science bibliography (https://dblp.org/).

class gismap.database.dblp.DBLPAuthor(name: str, id: str = None, aliases: list = <factory>)[source]#

db_name: ClassVar[str] = 'dblp'#: Name of the database.

query_id(s=None)[source]#

Parameters:: s (Session, optional) – Session.
Returns:: Potential matches.
Return type:: list

query_id_backoff: ClassVar[float] = 7.0#: Time to wait between 2 query id calls.

query_publications(s=None)[source]#

Parameters:: s (Session, optional) – Session.
Returns:: Papers available in DB.
Return type:: list

query_publications_backoff: ClassVar[float] = 2.0#: Time to wait between 2 query publications calls.

property url#: str URL associated with the author in the DB.

HAL#

Interface for HyperArticles en Ligne (https://hal.science/).

class gismap.database.hal.HALAuthor(name: str, id: str = None, aliases: list = <factory>, pid: int = None, alt_pids: list = <factory>)[source]#

alt_pids: list#: One author has one unique hal-id but possibly multiple Personal Ids. Extra pids should be put here.

db_name: ClassVar[str] = 'hal'#: Name of the database.

property is_set#: bool Is the author identified in DB?

static parse_entry(r)[source]#

Parameters:: r (dict) – Raw dict of a result (paper).
Returns:: The paper as a sanitized dictionary.
Return type:: dict

pid: int = None#: Personal Id, an integer that can be used when hal-id is not available.

query_id(s=None)[source]#

Parameters:: s (Session, optional) – Session.
Returns:: Potential matches.
Return type:: list

Examples

>>> fabien = HALAuthor("Fabien Mathieu")
>>> fabien
HALAuthor(name='Fabien Mathieu')
>>> fabien.url
'https://hal.science/search/index?q=Fabien+Mathieu'
>>> fabien.populate_id()
1
>>> fabien
HALAuthor(name='Fabien Mathieu', id='fabien-mathieu')
>>> fabien.url
'https://hal.science/search/index/?q=*&authIdHal_s=fabien-mathieu'
>>> laurent = HALAuthor("Laurent Viennot")
>>> laurent.query_id()
[HALAuthor(name='Laurent Viennot', id='laurentviennot')]
>>> unknown = HALAuthor("NotaSearcherName")
>>> unknown
HALAuthor(name='NotaSearcherName')
>>> unknown.populate_id()
0
>>> unknown
HALAuthor(name='NotaSearcherName')
>>> ana = HALAuthor("Ana Busic")
>>> ana.populate_id()
1
>>> ana
HALAuthor(name='Ana Busic', id='anabusic')
>>> diego = HALAuthor("Diego Perino") 
>>> diego.query_id()
[HALAuthor(name='Diego Perino', pid=847558), HALAuthor(name='Diego Perino', pid=978810)]
>>> HALAuthor(name='Diego Perino', pid=978810).url
'https://hal.science/search/index/?q=*&authIdPerson_i=978810'

query_publications(s=None)[source]#

Parameters:: s (Session, optional) – Session.
Returns:: Papers available in HAL.
Return type:: list

Examples

>>> fabien = HALAuthor(name='Fabien', id='fabien-mathieu')
>>> publications = sorted(fabien.query_publications(),
...                 key=lambda p: p['title'])
>>> publications[2] 
{'title': 'Achievable Catalog Size in Peer-to-Peer Video-on-Demand Systems',
'abstract': 'We analyze a system where $n$ set-top boxes with same upload and storage capacities collaborate to
serve $r$ videos simultaneously (a typical value is $r=n$). We give upper and lower bounds on the catalog size
of the system, i.e. the maximal number of distinct videos that can be stored in such a system so that any demand
of at most $r$ videos can be served. Besides $r/n$, the catalog size is constrained by the storage capacity, the
upload capacity, and the maximum number of simultaneous connections a box can open. We show that the achievable
catalog size drastically increases when the upload capacity of the boxes becomes strictly greater than the
playback rate of videos.', 'key': '471724',
'conference': 'Proceedings of the 7th Internnational Workshop on Peer-to-Peer Systems (IPTPS)',
'type': 'conference', 'year': 2008, 'url': 'https://inria.hal.science/inria-00471724v1',
'authors': [HALAuthor(name='Yacine Boufkhad', id='yacine-boufkhad', pid=7352),
HALAuthor(name='Fabien Mathieu', id='fabien-mathieu', pid=446),
HALAuthor(name='Fabien de Montgolfier', pid=949013), HALAuthor(name='Diego Perino'),
HALAuthor(name='Laurent Viennot', id='laurentviennot', pid=1841)],
'venue': 'Proceedings of the 7th Internnational Workshop on Peer-to-Peer Systems (IPTPS)', 'origin': 'hal'}
>>> publications[-7] 
{'title': 'Upper bounds for stabilization in acyclic preference-based systems',
'abstract': 'Preference-based systems (p.b.s.) describe interactions between nodes of a system that can rank
their neighbors. Previous work has shown that p.b.s. converge to a unique locally stable matching if an
acyclicity property is verified. In the following we analyze acyclic p.b.s. with respect to the
self-stabilization theory. We prove that the round complexity is bounded by n/2 for the adversarial daemon.
The step complexity is equivalent to (n^2)/4 for the round robin daemon, and exponential for the general
adversarial daemon.', 'key': '668356',
'conference': "SSS'07 - 9th international conference on Stabilization, Safety,
and Security of Distributed Systems",
'type': 'conference', 'year': 2007, 'url': 'https://inria.hal.science/hal-00668356v1',
'authors': [HALAuthor(name='Fabien Mathieu', id='fabien-mathieu', pid=446)],
'venue': "SSS'07 - 9th international conference on Stabilization, Safety,
and Security of Distributed Systems", 'origin': 'hal'}

Case of someone with multiple ids one want to cumulate:

>>> emilios = HALAuthor('Emilio Calvanese').query_id()
>>> emilios 
[HALAuthor(name='Emilio Calvanese', pid=911234)]
>>> len(emilios[0].query_publications())
69

Note: an error is raised if not enough data is provided

>>> HALAuthor('Fabien Mathieu').query_publications()
Traceback (most recent call last):
...
ValueError: HALAuthor(name='Fabien Mathieu') must have id or pid for publications to be fetched.

update_values(author)[source]#

Parameters:: author (DBAuthor) – External author info to inject in current instance.
Return type:: None

property url#: str URL associated with the author in the DB.

gismap.database.hal.parse_facet_author(a)[source]#

Parameters:: a (str) – Formatted APH author string from HAL API.
Returns:: Sanitized version.
Return type:: HALAuthor

Database#

Blueprint#

DBLP#

HAL#

This Page