FAQ#

I don’t want to install GisMap on my computer#

No worries, you can play with it using https://mybinder.org/.

For example:

LabMaps#

DB, Lab, Author, Maps… What does that mean?#

The internal structure of GisMap interleaves simple concepts, like author, DB, publication, and lab.

  • An Author is a researcher that has published papers. Internally, it’s just a name.

  • A DB is where the information comes from, like the HAL portal or the The DBLP Computer Science Bibliography.

  • A DB should provide a method to search authors and retrieve author’s publications. The results of these methods are lists of DBAuthor and DBPublication.

  • A given author/publication can be present in multiple DBs, and quite frequently multiple times in the same DB. GisMap uses SourcedAuthor and SourcedPublication to regroup multiple entries under a single banner.

  • A LabMap is made of authors (LabAuthor, i.e. SourcedAuthor with additional metadata) and publications (SourcedPublication).

  • An Egomap is a special LabMap dedicated to one single researcher, showing her co-authors and possibly their co-authors.

Why are there errors in the authors / publications I get?#

GisMap operates on real-life databases. The list of things that can go wrong is veeerrryyyy long. For example:

  • One DB may be temporarily down (DBLP can have issues sometimes);

  • Papers may be entered with bad author information;

  • A paper may exist in multiple versions (e.g. report, conference, journal), possibly with different spellings;

  • One given author may have multiple identity keys in the same DB;

  • Multiple authors may share the same name (and sometimes identity key) in the same DB.

We do what we can to smooth things out automatically without any manual intervention, but sometimes it’s not enough. However, 80% of the issues you may encounter can probably be solved by manual adjustment (see below).

How are publications grouped together?#

We use Bag of Factors, a package base on Joint Complexity.

If the titles are close enough, then we consider that it is the same paper. In case of multiple sources, one main source is selected (based on year, type, and other things).

Note

We are aware that merging from title similarities has its flaws. But after comparing with other options (e.g. strict equality on title, authors, year, and possibly venue), we decided that it was a good compromise.

How to enter a researcher?#

When a researcher is uniquely defined by her name, you can just enter that name, e.g.

[1]:
from gismap.lab import LabAuthor

cc = LabAuthor("Céline Comte")
cc.auto_sources()
cc.sources
[1]:
[HALAuthor(name='Céline Comte', key='celine-comte'),
 LDBAuthor(name='Céline Comte', key='179/2173')]

However, sometimes, the name is ambiguous, and you may have multiple keys that corresponds to multiple authors.

[2]:
fd = LabAuthor("François Durand")
fd.auto_sources()
fd.sources
INFO:GisMap:Multiple entries for François Durand in hal
[2]:
[HALAuthor(name='François Durand', key='francois-durand'),
 HALAuthor(name='François Durand', key='francois-durand-dastes'),
 HALAuthor(name='François Durand', key='fradurand'),
 HALAuthor(name='François Durand', key='1254455', key_type='pid'),
 HALAuthor(name='François Durand', key='1370418', key_type='pid'),
 HALAuthor(name='François Durand', key='1296653', key_type='pid'),
 HALAuthor(name='François Durand', key='1551635', key_type='pid'),
 HALAuthor(name='François Durand', key='1524491', key_type='pid'),
 HALAuthor(name='François Durand', key='1276470', key_type='pid'),
 LDBAuthor(name='François Durand', key='38/11269')]

When that happens, you need to manually enter the desired keys. This can be done by specifying the sources:

[3]:
from gismap import HALAuthor, LDBAuthor

fd = LabAuthor.from_sources(
    [
        HALAuthor(name="François Durand", key="fradurand"),
        LDBAuthor(name="François Durand", key="38/11269"),
    ]
)
fd.sources
[3]:
[HALAuthor(name='François Durand', key='fradurand'),
 LDBAuthor(name='François Durand', key='38/11269')]

For convenience, you can also enter keys in parentheses. For example, an exact equivalent of the above is:

[4]:
fd = LabAuthor("François Durand (hal: fradurand, ldb: 38/11269)")
fd.sources
[4]:
[HALAuthor(name='François Durand', key='fradurand'),
 LDBAuthor(name='François Durand', key='38/11269')]

Note

  • The parenthesis notation can also be used to specify other metadata (URL, image, group) with the same “key: value, …”format

  • For proper parsing, a name should contain no parenthesis, a key should not contain “,” or “:”, and a value should not contain “,”

  • You can enter multiple keys for the same DB, e.g. “My name (hal: key1, hal: key2)”

There is a publication that should not be there in my LabMap#

The first thing to check is if the publication has one external author with the same name as one of the lab’s authors. If that is the case, try to manually specify the sources of the author to avoid this.

If the homonym pollution comes from inside a DB itself (e.g. a single DBLP pid shared by several people), the cleanest fix is to drop the offending entries when constructing the LabMap — see the next subsection. You can also preview which publications a single LabAuthor returns, e.g. to confirm a homonym pollution before deciding how to filter it:

[6]:
fd = LabAuthor("François Durand (dblp: 38/11269)")
pubs = fd.get_publications()
print(len(pubs))
[p.title for p in pubs.values() if "Horse Locomotion" in p.title]
17
[6]:
['Detection of Horse Locomotion Modifications Due to Training with Inertial Measurement Units: A Proof-of-Concept.']

It will associate wrong co-authors. The solution is then to add a publication filter.

[7]:
no_horse = lambda p: "Horse Locomotion" not in p.title
pubs = fd.get_publications(selector=no_horse)
print(len(pubs))
[p.title for p in pubs.values() if "Horse Locomotion" in p.title]
16
[7]:
[]

There is a publication that should be there in my LabMap#

If a publication exists when you check it with the get_publications method of a LabAuthor, but is not registered in the lab publications, that means it has been rejected by one of the default filters.

Default rules (all must pass for a publication to be kept):

  • at most 9 authors (collegial papers in my field are rare; large author lists most often signal a database error);

  • title with at least 2 words;

  • title without taboo words like Editorial, Foreword, Brief Announcement, Preface.

Simple way: tune the defaults via constructor kwargs. LabMap (and therefore EgoMap / ListMap / any subclass) accepts:

  • max_co_authors= (default 9)

  • min_title_words= (default 2)

  • taboo_words= (extends the default taboo list)

  • taboo_authors= (drops publications whose author list contains one of these names — useful against homonym pollution)

For example, to build the EgoMap of François Durand, requiring at least 4 title words and dropping the Horse Locomotion homonym:

[8]:
from gismap.lab import EgoMap

fd = EgoMap(
    "François Durand (hal: fradurand, dblp: 38/11269)",
    min_title_words=4,
    taboo_words=["Horse Locomotion"],
)
fd.build(target=30)
fd.show_html()

Advanced way: bypass or extend ``publication_selectors``. Each LabMap stores its filter chain in publication_selectors (a list of f(pub) -> bool callables; a publication is kept iff every callable returns True). For predicates that depend on multiple fields (year, venue, exact author list…), append your own:

[9]:
no_horse = lambda p: "Horse Locomotion" not in p.title
recent_only = lambda p: p.year is None or p.year >= 2010

fd = EgoMap("François Durand (hal: fradurand, dblp: 38/11269)")
fd.publication_selectors.extend([no_horse, recent_only])
fd.build(target=30)
fd.show_html()

How to specify the DBs to use?#

For LabMap (including EgoMap), you can specify a dbs parameter. It can be a string or a DB subclass, or a list of strings or DB subclasses.

For SourcedAuthor/LabAuthor, you can specify dbs for the auto_sources method.

Graphical representation#

Stars, planets, moons, comets?#

The Maps, i.e., the graphical representation of the collaboration graph, use vocabulary borrowed from astronomy.

  • Stars and Planets: For EgoMaps, you are the star, e.g., the center of your own universe. Planets are people that revolve around you, i.e., your co-authors.

  • Moons: A moon is a researcher connected to a planet or a lab.

  • Comets: A comet has no direct link with other displayed entities.

I don’t want images!#

By default, GisMap will retrieve pictures from the DB if it is available. Some Lab also fetch image data.

If you don’t want any image in your map, the best is probably to remove them from the authors’ metadata:

[10]:
for author in fd.authors.values():
    author.metadata.img = None
fd.show_html()

The graph is spinning forever#

It’s a glitch in the physical engine that can happen from time to time. Open the menu (top-left ☰ icon) and pick Redraw — that usually does the trick.

It’s too small, I cannot see anything#

Sadly, for very large graphs, the quantity of information to embed is just too important. However, a few tricks can help a bit:

  • Full Screen — bottom-right expand icon, or via the menu (top-left ☰).

  • Zoom inside the graph (mouse wheel).

  • Move nodes around, or hit Redraw in the menu to settle the layout.

  • Hover a node to see the full name — very useful when navigating a crowded zone you mostly recognize.

How to embed a Map on my website?#

  • For a simple and static embedding, you can use Binder to manually generate the Map.

  • For something more advanced, it is recommended to write a Python script that generates the Map.

databases#

What databases are available, and what are their advantages?#

Database

Pros

Cons

HAL

Fast; Rich metadata (e.g. abstracts)

France-based research only; Errors and gaps exist

DBLP

Highly accurate; Unified venue names

Computer Science only; Very slow

LDB

Fast; Highly accurate; Unified venue names

Computer Science only; 3GB RAM recommended

Why doesn’t GisMap use database X?#

I wanted to rely on publication databases that:

  • have a public API;

  • are relatively clean and up to date;

  • do not require an API key.

To date, only HAL and DBLP seem to meet these specifications.

That said, GisMap is designed to be multi-source, so if a contributor wants to add an interface to another database (Google Scholar, ORCID, …), they are encouraged to write it and make a PR!

How to manage LDB (Local DBLP)?#

LDB is a local mirror of DBLP that provides fast, accurate access to Computer Science publications without rate limiting.

First use: LDB is automatically downloaded the first time you use it (e.g., ListMap(names, dbs="ldb")). You can also trigger the download manually:

from gismap import LDB
LDB.retrieve()

After upgrading gismap: Each gismap release may update the LDB format. If you see a warning about version incompatibility, simply run:

LDB.retrieve()

Useful commands:

Command

Purpose

LDB.retrieve()

Download the latest compatible LDB

LDB.retrieve(force=True)

Force re-download (e.g. to get fresh data)

LDB.db_info()

Show installed version, date, and size

LDB.check_update()

Check if a newer compatible version is available

LDB.delete_db()

Remove the local LDB file

Note

LDB requires approximately 3 GB of RAM. On memory-constrained environments (e.g. free-tier Binder), consider using HAL instead.