LabMaps#

Note

By default, we query both HAL and LDB. Don’t hesitate to adapt depending on the use case.

Lab authors#

Lab authors are the main ingredient to analyse a single lab (i.e. a group of researchers). You can create one just with a name and then automatically ask to retrieve the DB endpoints for this author.

[1]:
from gismap.lab import LabAuthor

maria = LabAuthor("Maria Potop")
maria.auto_sources()

Let’s look at the sources:

[2]:
maria.sources
[2]:
[HALAuthor(name='Maria Potop', key='841868', key_type='pid'),
 LDBAuthor(name='Maria Gradinariu', key='p/MariaPotopButucaru')]

Note that an author can have many names.

[3]:
maria.aliases
[3]:
['Maria Gradinariu',
 'Maria Gradinariu Potop-Butucaru',
 'Maria Potop Butucaru',
 'Maria Potop-Butucaru']

When using auto_source, you can tell which DBs should be used (LDB, HAL, or DBLP).

[4]:
celine = LabAuthor("Céline Comte")
celine.auto_sources(dbs=["hal"])
celine.sources
[4]:
[HALAuthor(name='Céline Comte', key='celine-comte')]
[5]:
celine = LabAuthor("Céline Comte")
celine.auto_sources(dbs="dblp")
celine.sources
[5]:
[DBLPAuthor(name='Céline Comte', key='179/2173')]
[6]:
celine = LabAuthor("Céline Comte")
celine.auto_sources(dbs="ldb")
celine.sources
[6]:
[LDBAuthor(name='Céline Comte', key='179/2173')]

When the sources of an author are set one can retrieve her publications.

[7]:
[p for p in celine.get_publications().values() if p.year == 2021]
[7]:
[SourcedPublication(title='Load Balancing in Heterogeneous Server Clusters: Insights From a Product-Form Queueing Model.', authors=[LDBAuthor(name='Mark van der Boor', key='202/1688'), LabAuthor(name='Céline Comte')], venue='IWQoS', type='conference', year=2021),
 SourcedPublication(title='Performance Evaluation of Stochastic Bipartite Matching Models.', authors=[LabAuthor(name='Céline Comte'), LDBAuthor(name='Jan-Pieter L. Dorsman', key='135/1486')], venue='EPEW', type='conference', year=2021),
 SourcedPublication(title='Pass-and-swap queues.', authors=[LabAuthor(name='Céline Comte'), LDBAuthor(name='Jan-Pieter L. Dorsman', key='135/1486')], venue='Queueing Syst. Theory Appl.', type='journal', year=2021),
 SourcedPublication(title='Stochastic dynamic matching: A mixed graph-theory and linear-algebra approach.', authors=[LabAuthor(name='Céline Comte'), LDBAuthor(name='Fabien Mathieu', key='66/2077'), LDBAuthor(name='Ana Busic', key='57/3580')], venue='CoRR', type='journal', year=2021)]

Note

Lab authors can have metadata that can be used for display and further analysis. This is not covered in this tutorial.

Your first lab#

In GisMap, a LabMap is a class whose instances have three methods:

  • update_authors automatically refresh the members of the lab. It is useful at creation or when a lab evolves.

  • update_publications makes a full refresh of the publications of a lab. All publications from lab members are considered (temporal filtering may be enabled later).

  • expand adds moons, i.e. additional researchers that gravitate around the lab.

The simplest usable subclass of Lab is ListMap, which uses a list of names. For example, consider the former team Gangsta from my Bell Labs days.

[8]:
from gismap.lab import ListMap

lab = ListMap(
    author_list=[
        "Fabien Mathieu",
        "Philippe Jacquet",
        "Alonso Silva",
        "Anne Bouillard",
        "François Durand (hal: fradurand, ldb:38/11269)",
        "Amira Alloum",
        "Marc-Olivier Buob",
        "Mohamed Lamine Lamali (hal:mohamed-lamine-lamali, ldb: 43/11358)",
    ],
    name="Gangsta",
)
lab.update_authors()
lab.update_publis()
INFO:GisMap:Multiple entries for Philippe Jacquet in hal
INFO:GisMap:Multiple entries for Alonso Silva in hal
INFO:GisMap:Multiple entries for Anne Bouillard in hal

Maps can be saved with the dump method so you don’t have to re-update them all the time.

When you have a populated lab, you can produce a standalone HTML of the collaboration graph with save_html. That graph is a standalone HTML that can be displayed in a notebook or saved for inclusion in a web page (e.g. with iframe).

You can also display it directly inside a notebook. Using options, you can perform some customization if you want:

[9]:
groups = {"Gangsta": {"color": "rgb(255, 0, 255)"}}
lab.show_html(groups=groups)

Let’s add some context with a few moons.

[10]:
lab.expand(target=4)
[11]:
groups["moon"] = {"display": "Usual Suspects", "color": "rgb(0,255,0)"}
lab.show_html(groups=groups)

Few things about the generated graph:

  • Authors are represented with their initials unless some picture url is provided (implicitly or explicitly).

  • Comets are singletons (authors with no co-publications with the other nodes). They are hidden by default. For example, if you only show the moons / usual suspects, Bernard becomes a comet and is hidden.

  • You can hover an author to get her name. If you click, you get a modal with the list of publications.

  • The width and length of an edge depend on the number of co-publications. If you click you get a modal with the list of co-publications.

  • The menu (top-left ☰) groups graph-level actions: Redraw, Full Screen, Show/Hide Legend, Download <lab>.bib (whole-lab BibTeX), Download PNG, Copy PNG to clipboard. The bottom-right expand/compress icon is a shortcut for Full Screen.

  • Inside any modal, each publication has a [.bib] toggle (and [abstract] when available — typically HAL entries) revealing inline content with a hover-to-copy button. The modal header carries a Download .bib button that exports just the listed publications (one author’s, or one author-pair’s joint output).

Exporting a lab#

Beyond the interactive HTML, a populated lab can be serialized for downstream use: BibTeX for citation managers, JSON for any structured pipeline, CSV for spreadsheets and light analytics.

Every export takes an optional name= argument. When omitted, the lab’s own name= (set at construction) is used as the filename stem, and the file lands in the current working directory — i.e. a real session would just call lab.to_bib() and find Gangsta.bib next to the notebook. To keep this tutorial’s workspace clean we redirect everything to a TemporaryDirectory instead:

[12]:
import tempfile
from pathlib import Path

# TemporaryDirectory cleans itself when the object is garbage-collected;
# we'll also call .cleanup() explicitly at the end of the section.
tmp = tempfile.TemporaryDirectory()
out = Path(tmp.name)
[13]:
# Whole-lab BibTeX, written to our tempdir as Gangsta.bib
lab.to_bib(name=out / "Gangsta")
len(lab.publications)
[13]:
835

Both to_bib and the Download <lab>.bib menu entry can be restricted via query=. A string is matched by exact key or fuzzy title similarity; a callable is used as a predicate f(pub) -> bool. The same logic powers LabMap.select_publications, which is handy to preview what will be exported:

[14]:
# Filtered BibTeX — recent publications, written as Gangsta_recent.bib
recent = lambda p: p.year is not None and p.year >= 2015
lab.to_bib(query=recent, name=out / "Gangsta_recent")
[p.short_str() for p in lab.select_publications(recent)][:10]
INFO:GisMap:354 publications found.
INFO:GisMap:354 publications found.
[14]:
['"torus packing for multisets." (2024, misc) - https://doi.org/10.4230/ARTIFACTS.22479',
 '"Tree Walks and the Spectrum of Random Graphs." (2024, conference) - https://doi.org/10.4230/LIPICS.AOFA.2024.11',
 '"Combinatorics of nondeterministic walks of the Dyck and Motzkin type" (2019, conference) - https://hal.science/hal-01910727v1',
 '"Graphs with degree constraints." (2016, conference) - https://doi.org/10.1137/1.9781611974324.4',
 '"Phase transition of random non-uniform hypergraphs." (2015, journal) - https://doi.org/10.1016/J.JDA.2015.01.009',
 '"2-Xor Revisited: Satisfiability and Probabilities of Functions." (2016, journal) - https://doi.org/10.1007/S00453-016-0119-X',
 '"Active clustering for labeling training data." (2021, conference) - https://proceedings.neurips.cc/paper/2021/hash/47841cc9e552bd5c40164db7073b817b-Abstract.html',
 '"Of Kernels and Queues: when network calculus meets analytic combinatorics" (2018, conference) - https://hal.science/hal-01889101v1',
 '"Robot Positioning Using Torus Packing for Multisets." (2024, conference) - https://doi.org/10.4230/LIPICS.ICALP.2024.43',
 '"Threshold functions for small subgraphs in simple graphs and multigraphs." (2020, journal) - https://doi.org/10.1016/J.EJC.2020.103113']

JSON serializes authors and publications in a structured form (see Author.to_dict() / Publication.to_dict()). CSV produces two files, <name>_authors.csv and <name>_publications.csv, with | as a separator for multi-valued cells:

[15]:
lab.to_json(name=out / "Gangsta")  # Gangsta.json
lab.to_csv(name=out / "Gangsta")   # Gangsta_authors.csv and Gangsta_publications.csv

And of course, the HTML export:

[16]:
lab.save_html(name=out / "Gangsta")  # Gangsta.html

Listing what we just produced, then wiping the tempdir. The explicit tmp.cleanup() keeps things tidy if the kernel stays alive for a while; even if you forget it, garbage collection of tmp would trigger the same cleanup.

[17]:
sorted(p.name for p in out.iterdir())
[17]:
['Gangsta.bib',
 'Gangsta.html',
 'Gangsta.json',
 'Gangsta_authors.csv',
 'Gangsta_publications.csv',
 'Gangsta_recent.bib']
[18]:
tmp.cleanup()

How-to: Adding informal “publications”#

Since version 0.5.2, you can use the add_publication method to manually add publications to a lab. Author names are automatically resolved to known authors using fuzzy matching.

First we need to build a lab.

[19]:
from gismap.lab import ListMap

lab = ListMap(author_list=["Fabien Mathieu", "Céline Comte"], name="Dream Team")
lab.update_authors()
lab.update_publis()

Then we just call add_publication with a title and a list of author names. Known authors are matched automatically; unknown ones become Outsiders.

[20]:
lab.add_publication(
    title="Informal discussions on GisMap",
    authors=["Fabien Mathieu", "Céline Comte", "John Doe"],
    year=2026,
    venue="Zoom meetings",
)
lab.show_html()

Make your own LabMap#

GisMap is intended to make easy the creation of LabMaps in many contexts.

The easiest way to manage a lab, apart from using ListMaps as shown above, is to specify an internal method _author_iterator that returns Lab authors. When it’s done, you can create/refresh LabMaps as you see fit.

How the iterator works is 100% up to you. Most of the time, this is done by scrapping some Web page(s) (see the gallery for examples), but many other options exist, e.g. read authors from a file, from a LDAP…

For example, this is the entire code required for handling the Solace team.