FAQ#
I don’t want to install GisMap on my computer#
No worries, you can play with it using https://mybinder.org/.
For example:
LabMaps#
DB, Lab, Author, Maps… What does that mean?#
The internal structure of GisMap interleaves simple concepts, like author, DB, publication, and lab.
An
Authoris a researcher that has published papers. Internally, it’s just a name.A
DBis where the information comes from, like the HAL portal or the The DBLP Computer Science Bibliography.A DB should provide a method to search authors and retrieve author’s publications. The results of these methods are lists of
DBAuthorandDBPublication.A given author/publication can be present in multiple DBs, and quite frequently multiple times in the same DB. GisMap uses
SourcedAuthorandSourcedPublicationto regroup multiple entries under a single banner.A
LabMapis made of authors (LabAuthor, i.e.SourcedAuthorwith additional metadata) and publications (SourcedPublication).An
Egomapis a specialLabMapdedicated to one single researcher, showing her co-authors and possibly their co-authors.
How are publications grouped together?#
We use Bag of Factors, a package base on Joint Complexity.
If the titles are close enough, then we consider that it is the same paper. In case of multiple sources, one main source is selected (based on year, type, and other things).
Note
We are aware that merging from title similarities has its flaws. But after comparing with other options (e.g. strict equality on title, authors, year, and possibly venue), we decided that it was a good compromise.
How to enter a researcher?#
When a researcher is uniquely defined by her name, you can just enter that name, e.g.
[1]:
from gismap.lab import LabAuthor
cc = LabAuthor("Céline Comte")
cc.auto_sources()
cc.sources
[1]:
[HALAuthor(name='Céline Comte', key='celine-comte'),
LDBAuthor(name='Céline Comte', key='179/2173')]
However, sometimes, the name is ambiguous, and you may have multiple keys that corresponds to multiple authors.
[2]:
fd = LabAuthor("François Durand")
fd.auto_sources()
fd.sources
INFO:GisMap:Multiple entries for François Durand in hal
[2]:
[HALAuthor(name='François Durand', key='francois-durand'),
HALAuthor(name='François Durand', key='francois-durand-dastes'),
HALAuthor(name='François Durand', key='fradurand'),
HALAuthor(name='François Durand', key='1254455', key_type='pid'),
HALAuthor(name='François Durand', key='1370418', key_type='pid'),
HALAuthor(name='François Durand', key='1296653', key_type='pid'),
HALAuthor(name='François Durand', key='1551635', key_type='pid'),
HALAuthor(name='François Durand', key='1524491', key_type='pid'),
HALAuthor(name='François Durand', key='1276470', key_type='pid'),
LDBAuthor(name='François Durand', key='38/11269')]
When that happens, you need to manually enter the desired keys. This can be done by specifying the sources:
[3]:
from gismap import HALAuthor, LDBAuthor
fd = LabAuthor.from_sources(
[
HALAuthor(name="François Durand", key="fradurand"),
LDBAuthor(name="François Durand", key="38/11269"),
]
)
fd.sources
[3]:
[HALAuthor(name='François Durand', key='fradurand'),
LDBAuthor(name='François Durand', key='38/11269')]
For convenience, you can also enter keys in parentheses. For example, an exact equivalent of the above is:
[4]:
fd = LabAuthor("François Durand (hal: fradurand, ldb: 38/11269)")
fd.sources
[4]:
[HALAuthor(name='François Durand', key='fradurand'),
LDBAuthor(name='François Durand', key='38/11269')]
Note
The parenthesis notation can also be used to specify other metadata (URL, image, group) with the same “key: value, …”format
For proper parsing, a name should contain no parenthesis, a key should not contain “,” or “:”, and a value should not contain “,”
You can enter multiple keys for the same DB, e.g. “My name (hal: key1, hal: key2)”
Forcing a HAL fullname search#
Some researchers have a HAL pid that only covers part of their publications (e.g. no unified HAL-ID). Use hal:fullname to force a name-based search instead of relying on the pid:
LabAuthor("Élie de Panafieu (hal:fullname)")
By default, specifying a source for one DB does not prevent the others from being searched automatically. For example, with dbs=["hal", "ldb"], the above will search HAL by fullname and LDB automatically.
If you want to restrict to only the specified sources (e.g. to avoid homonyme pollution from other DBs), add the no_auto flag:
LabAuthor("John Smith (hal:fullname, no_auto)")
[5]:
dummy = LabAuthor(
"My Name(img: https://my.url.img, group:me,url:https://mysite.org,hal:key1,hal:123456,dblp:toto,badkey:hello,no_colon_separator)"
)
print(dummy.metadata)
print(dummy.sources)
WARNING:GisMap:I don't know what to do with badkey:hello.
WARNING:GisMap:I don't know what to do with no_colon_separator.
AuthorMetadata(url='https://mysite.org', img='https://my.url.img', group='me')
[HALAuthor(name='My Name', key='key1'), HALAuthor(name='My Name', key='123456', key_type='pid'), DBLPAuthor(name='My Name', key='toto')]
There is a publication that should not be there in my LabMap#
The first thing to check is if the publication has one external author with the same name as one of the lab’s authors. If that is the case, try to manually specify the sources of the author to avoid this.
If the homonym pollution comes from inside a DB itself (e.g. a single DBLP pid shared by several people), the cleanest fix is to drop the offending entries when constructing the LabMap — see the next subsection. You can also preview which publications a single LabAuthor returns, e.g. to confirm a homonym pollution before deciding how to filter it:
[6]:
fd = LabAuthor("François Durand (dblp: 38/11269)")
pubs = fd.get_publications()
print(len(pubs))
[p.title for p in pubs.values() if "Horse Locomotion" in p.title]
17
[6]:
['Detection of Horse Locomotion Modifications Due to Training with Inertial Measurement Units: A Proof-of-Concept.']
It will associate wrong co-authors. The solution is then to add a publication filter.
[7]:
no_horse = lambda p: "Horse Locomotion" not in p.title
pubs = fd.get_publications(selector=no_horse)
print(len(pubs))
[p.title for p in pubs.values() if "Horse Locomotion" in p.title]
16
[7]:
[]
There is a publication that should be there in my LabMap#
If a publication exists when you check it with the get_publications method of a LabAuthor, but is not registered in the lab publications, that means it has been rejected by one of the default filters.
Default rules (all must pass for a publication to be kept):
at most 9 authors (collegial papers in my field are rare; large author lists most often signal a database error);
title with at least 2 words;
title without taboo words like Editorial, Foreword, Brief Announcement, Preface.
Simple way: tune the defaults via constructor kwargs. LabMap (and therefore EgoMap / ListMap / any subclass) accepts:
max_co_authors=(default9)min_title_words=(default2)taboo_words=(extends the default taboo list)taboo_authors=(drops publications whose author list contains one of these names — useful against homonym pollution)
For example, to build the EgoMap of François Durand, requiring at least 4 title words and dropping the Horse Locomotion homonym:
[8]:
from gismap.lab import EgoMap
fd = EgoMap(
"François Durand (hal: fradurand, dblp: 38/11269)",
min_title_words=4,
taboo_words=["Horse Locomotion"],
)
fd.build(target=30)
fd.show_html()
Advanced way: bypass or extend ``publication_selectors``. Each LabMap stores its filter chain in publication_selectors (a list of f(pub) -> bool callables; a publication is kept iff every callable returns True). For predicates that depend on multiple fields (year, venue, exact author list…), append your own:
[9]:
no_horse = lambda p: "Horse Locomotion" not in p.title
recent_only = lambda p: p.year is None or p.year >= 2010
fd = EgoMap("François Durand (hal: fradurand, dblp: 38/11269)")
fd.publication_selectors.extend([no_horse, recent_only])
fd.build(target=30)
fd.show_html()
How to specify the DBs to use?#
For LabMap (including EgoMap), you can specify a dbs parameter. It can be a string or a DB subclass, or a list of strings or DB subclasses.
For SourcedAuthor/LabAuthor, you can specify dbs for the auto_sources method.
Graphical representation#
Stars, planets, moons, comets?#
The Maps, i.e., the graphical representation of the collaboration graph, use vocabulary borrowed from astronomy.
Stars and Planets: For EgoMaps, you are the star, e.g., the center of your own universe. Planets are people that revolve around you, i.e., your co-authors.
Moons: A moon is a researcher connected to a planet or a lab.
Comets: A comet has no direct link with other displayed entities.
I don’t want images!#
By default, GisMap will retrieve pictures from the DB if it is available. Some Lab also fetch image data.
If you don’t want any image in your map, the best is probably to remove them from the authors’ metadata:
[10]:
for author in fd.authors.values():
author.metadata.img = None
fd.show_html()
The graph is spinning forever#
It’s a glitch in the physical engine that can happen from time to time. Open the menu (top-left ☰ icon) and pick Redraw — that usually does the trick.
It’s too small, I cannot see anything#
Sadly, for very large graphs, the quantity of information to embed is just too important. However, a few tricks can help a bit:
Full Screen — bottom-right expand icon, or via the menu (top-left ☰).
Zoom inside the graph (mouse wheel).
Move nodes around, or hit Redraw in the menu to settle the layout.
Hover a node to see the full name — very useful when navigating a crowded zone you mostly recognize.
How to embed a Map on my website?#
For a simple and static embedding, you can use Binder to manually generate the Map.
For something more advanced, it is recommended to write a Python script that generates the Map.
databases#
What databases are available, and what are their advantages?#
Database |
Pros |
Cons |
|---|---|---|
HAL |
Fast; Rich metadata (e.g. abstracts) |
France-based research only; Errors and gaps exist |
DBLP |
Highly accurate; Unified venue names |
Computer Science only; Very slow |
LDB |
Fast; Highly accurate; Unified venue names |
Computer Science only; 3GB RAM recommended |
Why doesn’t GisMap use database X?#
I wanted to rely on publication databases that:
have a public API;
are relatively clean and up to date;
do not require an API key.
To date, only HAL and DBLP seem to meet these specifications.
That said, GisMap is designed to be multi-source, so if a contributor wants to add an interface to another database (Google Scholar, ORCID, …), they are encouraged to write it and make a PR!
How to manage LDB (Local DBLP)?#
LDB is a local mirror of DBLP that provides fast, accurate access to Computer Science publications without rate limiting.
First use: LDB is automatically downloaded the first time you use it (e.g., ListMap(names, dbs="ldb")). You can also trigger the download manually:
from gismap import LDB
LDB.retrieve()
After upgrading gismap: Each gismap release may update the LDB format. If you see a warning about version incompatibility, simply run:
LDB.retrieve()
Useful commands:
Command |
Purpose |
|---|---|
|
Download the latest compatible LDB |
|
Force re-download (e.g. to get fresh data) |
|
Show installed version, date, and size |
|
Check if a newer compatible version is available |
|
Remove the local LDB file |
Note
LDB requires approximately 3 GB of RAM. On memory-constrained environments (e.g. free-tier Binder), consider using HAL instead.