Database#
Classes and functions to interact with databases of publications.
Models#
Abstract description of GisMap DB interface.
LDB (Local DBLP)#
Interface for dblp computer science bibliography (https://dblp.org/) using a local copy of the database.
- class gismap.sources.ldb.LDB[source]#
Browse DBLP from a local copy of the database.
- dump(*args, **kwargs)[source]#
Save instance to file.
- Parameters:
Examples
>>> import tempfile >>> v1 = ToyClass(42) >>> v2 = ToyClass() >>> v2.value 0 >>> with tempfile.TemporaryDirectory() as tmpdirname: ... v1.dump(filename='myfile', compress=True, path=tmpdirname) ... dir_content = [file.name for file in Path(tmpdirname).glob('*')] ... v2 = ToyClass.load(filename='myfile', path=Path(tmpdirname)) ... v1.dump(filename='myfile', compress=True, path=tmpdirname) # doctest.ELLIPSIS File ...myfile.pkl.zst already exists! Use overwrite option to overwrite. >>> dir_content ['myfile.pkl.zst'] >>> v2.value 42
>>> with tempfile.TemporaryDirectory() as tmpdirname: ... v1.dump(filename='myfile', compress=False, path=tmpdirname) ... v1.dump(filename='myfile', compress=False, path=tmpdirname) # doctest.ELLIPSIS File ...myfile.pkl already exists! Use overwrite option to overwrite.
>>> v1.value = 51 >>> with tempfile.TemporaryDirectory() as tmpdirname: ... v1.dump(filename='myfile', path=tmpdirname, compress=False) ... v1.dump(filename='myfile', path=tmpdirname, overwrite=True, compress=False) ... v2 = ToyClass.load(filename='myfile', path=tmpdirname) ... dir_content = [file.name for file in Path(tmpdirname).glob('*')] >>> dir_content ['myfile.pkl'] >>> v2.value 51
>>> with tempfile.TemporaryDirectory() as tmpdirname: ... v2 = ToyClass.load(filename='thisfilenamedoesnotexist') Traceback (most recent call last): ... FileNotFoundError: [Errno 2] No such file or directory: ...
- class gismap.sources.ldb.LDBPublication(title: str, authors: list, venue: str, type: str, year: int, key: str, metadata: dict = <factory>)[source]#
- gismap.sources.dblp_ttl.get_stream(source, chunk_size=65536)[source]#
- Parameters:
- Yields:
iterable – Chunk iterator that streams the content.
int– Source size (used later to compute ETA).
- gismap.sources.dblp_ttl.parse_block(dblp_block)[source]#
- Parameters:
dblp_block (
str) – A DBLP publication, turtle format.- Returns:
key (
str) – DBLP key.title (
str) – Publication title.type (
str) – Type of publication.authors (
dict) – Publication authors (key -> name)url (
strorNoneType) – Publication URL.stream (
listorNoneType) – Publication streams (normalized journal/conf).pages (
strorNoneType) – Publication pages.venue (
strorNoneType) – Publication venue (conf/journal).year (
int) – Year of publication.
DBLP (online)#
Interface for dblp computer science bibliography (https://dblp.org/).
- class gismap.sources.dblp.DBLP[source]#
-
- classmethod search_author(name, wait=True)[source]#
- Parameters:
- Returns:
Potential matches.
- Return type:
Examples
>>> fabien = DBLP.search_author("Fabien Mathieu") >>> fabien [DBLPAuthor(name='Fabien Mathieu', key='66/2077')] >>> fabien[0].url 'https://dblp.org/pid/66/2077.html' >>> manu = DBLP.search_author("Manuel Barragan") >>> manu [DBLPAuthor(name='Manuel Barragan', key='07/10587'), DBLPAuthor(name='Manuel Barragan', key='83/3865'), DBLPAuthor(name='Manuel Barragan', key='188/0198')] >>> DBLP.search_author("NotaSearcherName", wait=False) []
- class gismap.sources.dblp.DBLPAuthor(name: str, key: str, aliases: list = <factory>)[source]#
Examples
>>> fabien = DBLPAuthor('Fabien Mathieu', key='66/2077') >>> publications = sorted(fabien.get_publications(), ... key=lambda p: p.title) >>> publications[0].url 'https://dblp.org/rec/conf/iptps/BoufkhadMMPV08.html' >>> publications[-1] DBLPPublication(title='Upper Bounds for Stabilization in Acyclic Preference-Based Systems.', authors=[DBLPAuthor(name='Fabien Mathieu', key='66/2077')], venue='SSS', type='conference', year=2007, key='conf/sss/Mathieu07')
HAL#
Interface for HyperArticles en Ligne (https://hal.science/).
- class gismap.sources.hal.HAL[source]#
- classmethod from_author(a)[source]#
-
Examples
>>> fabien = HAL.search_author("Fabien Mathieu")[0] >>> publications = sorted(fabien.get_publications(), key=lambda p: p.title) >>> publications[2] HALPublication(title='Achievable Catalog Size in Peer-to-Peer Video-on-Demand Systems', authors=[HALAuthor(name='Yacine Boufkhad', key='yacine-boufkhad'), HALAuthor(name='Fabien Mathieu', key='fabien-mathieu'), HALAuthor(name='Fabien de Montgolfier', key='949013', key_type='pid'), HALAuthor(name='Diego Perino', key='Diego Perino', key_type='fullname'), HALAuthor(name='Laurent Viennot', key='laurentviennot')], venue='Proceedings of the 7th Internnational Workshop on Peer-to-Peer Systems (IPTPS)', type='conference', year=2008, key='471724') >>> diego = publications[2].authors[3] >>> diego HALAuthor(name='Diego Perino', key='Diego Perino', key_type='fullname') >>> len(diego.get_publications()) > 28 True >>> publications[-7] HALPublication(title='Upper bounds for stabilization in acyclic preference-based systems', authors=[HALAuthor(name='Fabien Mathieu', key='fabien-mathieu')], venue="SSS'07 - 9th international conference on Stabilization, Safety, and Security of Distributed Systems", type='conference', year=2007, key='668356')
Case of someone with multiple ids one want to cumulate:
>>> maria = HAL.search_author('Maria Potop-Butucaru') >>> maria [HALAuthor(name='Maria Potop-Butucaru', key='841868', key_type='pid')] >>> n_pubs = len(HAL.from_author(maria[0])) >>> n_pubs > 200 True >>> n_pubs == len(maria[0].get_publications()) True
Note: an error is raised if not enough data is provided
>>> HAL.from_author(HALAuthor('Fabien Mathieu')) Traceback (most recent call last): ... ValueError: HALAuthor(name='Fabien Mathieu') must have a key for publications to be fetched.
- classmethod search_author(name)[source]#
-
Examples
>>> fabien = HAL.search_author("Fabien Mathieu") >>> fabien [HALAuthor(name='Fabien Mathieu', key='fabien-mathieu')] >>> fabien = fabien[0] >>> fabien.url 'https://hal.science/search/index/?q=*&authIdHal_s=fabien-mathieu' >>> HAL.search_author("Laurent Viennot")[0] HALAuthor(name='Laurent Viennot', key='laurentviennot') >>> HAL.search_author("NotaSearcherName") [] >>> HAL.search_author("Ana Busic") [HALAuthor(name='Ana Busic', key='anabusic')] >>> HAL.search_author("Potop-Butucaru Maria") [HALAuthor(name='Potop-Butucaru Maria', key='841868', key_type='pid')] >>> diego = HAL.search_author("Diego Perino") >>> diego [HALAuthor(name='Diego Perino', key='847558', key_type='pid'), HALAuthor(name='Diego Perino', key='978810', key_type='pid')] >>> diego[1].url 'https://hal.science/search/index/?q=*&authIdPerson_i=978810'
- class gismap.sources.hal.HALAuthor(name: str, key: str | int = None, key_type: str = None, aliases: list = <factory>, _url: str = None, _img: str = None, _cv: bool = None)[source]#
Multi-source#
Interface for handling multiple sources at once.
- class gismap.sources.multi.SourcedPublication(title: str, authors: list, venue: str, type: str, year: int, sources: list = <factory>)[source]#
- gismap.sources.multi.regroup_authors(auth_dict, pub_dict)[source]#
Replace authors of publications with matching authors. Typical use: upgrade DB-specific authors to multisource authors.
Replacement is in place.