Laboratory#

Management of a group of people and their publications is made with the LabMap abstract class.

LabMaps#

class gismap.lab.labmap.LabMap(name=None, dbs=None)[source]#

Abstract class for labs.

Actual Lab classes can be created by implementing the _author_iterator method.

Labs can be saved with the dump method and loaded with the load method.

Parameters:
  • name (str) – Name of the lab. Can be set as class or instance attribute.

  • dbs (list, default=[HAL, LDB]) – List of DB sources to use.

author_selectors#

Author filters. Default: minimal filtering.

Type:

list

publication_selectors#

Publication filter. Default: less than 10 authors, not an editorial, at least two words in the title.

Type:

list

add_publication(title, authors, **kwargs)[source]#

Add a manual publication to the lab.

Author names given as strings are resolved to known authors from the lab’s publications using fuzzy matching. Unmatched names become Outsider instances.

Parameters:
  • title (str) – Publication title.

  • authors (list) – Author names (str) or author objects.

  • **kwargs – Passed to Informal (venue, type, year, key, metadata) and to fit_names() (threshold, n_range, length_impact).

del_publication(query, confirm=True, **kwargs)[source]#

Remove publications matching a query from the lab.

Parameters:
expand(target=None, group='moon', desc='Moon information', **kwargs)[source]#

Expand the lab with external collaborators found in publications.

Discovers authors who co-published with lab members, ranks them by collaboration strength, and adds the top candidates.

Parameters:
  • target (int, optional) – Number of new authors to add. Defaults to len(self.authors) // 3.

  • group (str, default=”moon”) – Group label assigned to new authors.

  • desc (str, default=”Moon information”) – Progress bar description.

  • **kwargs – Passed to proper_prospects().

html(**kwargs)[source]#

Generate HTML representation of the collaboration graph.

Parameters:

**kwargs – Passed to make_vis().

Returns:

HTML content as a string.

Return type:

str

save_html(name=None, **kwargs)[source]#

Save the collaboration graph as an HTML file.

Parameters:
  • name (str, optional) – Output filename. Defaults to lab name.

  • **kwargs – Passed to html().

Return type:

None

select_publications(query, n_range=4, length_impact=0.001, threshold=80)[source]#

Search for publications matching a query.

Parameters:
  • query (str or callable) – If a string, matches by exact key or fuzzy title similarity. If a callable, used as a filter f(pub) -> bool on each publication.

  • n_range (int, default=4) – Passed to similarity_matrix().

  • length_impact (float, default=0.001) – Passed to similarity_matrix().

  • threshold (int, default=80) – Minimum similarity score (0-100) for fuzzy title matching.

Returns:

Matching publications.

Return type:

list

show_html(**kwargs)[source]#

Display the collaboration graph in a Jupyter notebook.

Parameters:

**kwargs – Passed to html().

Return type:

None

update_authors(desc='Author information')[source]#

Populate the authors attribute (dict [str, LabAuthor]).

Return type:

None

update_publis(desc='Publications information')[source]#

Populate the publications attribute (dict [str, SourcedPublication]).

Return type:

None

class gismap.lab.labmap.ListMap(author_list, *args, **kwargs)[source]#

Simplest way to create a lab: with a list of names.

Parameters:
  • author_list (list of str) – List of authors names.

  • args (list) – Arguments to pass to the LabMap constructor.

  • kwargs (dict) – Keyword arguments to pass to the LabMap constructor.

EgoMaps#

class gismap.lab.egomap.EgoMap(star, *args, **kwargs)[source]#

Egocentric view of a researcher’s collaboration network.

Displays the star (central researcher), their planets (direct co-authors), and optionally moons (co-authors of co-authors).

Parameters:
  • star (str or LabAuthor) – The central researcher. Can be a name string or LabAuthor object.

  • *args – Passed to LabMap.

  • **kwargs – Passed to LabMap.

Examples

>>> dang = EgoMap("The-Dang Huynh")
>>> dang.build(target=20)
>>> sorted(  
...     a.name for a in dang.authors.values() if len(a.name.split()) < 3
... )
['Bruno Kauffmann', 'Diego Perino', 'Dohy Hong', 'Fabien Mathieu', 'François Baccelli',...]

To add publications, one can use the add_publication() method:

>>> dang.add_publication(
...     title="A new paper",
...     authors=[dang.star, "Fabien Mathieu", "Alice Smith"],
...     venue="Journal of Testing",
... )
>>> str(dang.select_publications(lambda p: "Testing" in p.venue)[0])
'A new paper, by The-Dang Huynh, Fabien Mathieu, and Alice Smith. In Journal of Testing [unpublished], 2026.'

To remove publications, one can use the del_publication() method:

>>> dang.del_publication("A new paper", confirm=False)
>>> dang.select_publications(lambda p: "Testing" in p.venue)
[]
build(**kwargs)[source]#

Build the ego network by fetching publications and adding planets/moons.

Parameters:
  • target (int, default=50) – Target number of authors in the final map.

  • **kwargs – Passed to expand().

Return type:

None

Utilities#

Lab author#

class gismap.lab.lab_author.AuthorMetadata(url: str = None, img: str = None, group: str = None, position: tuple = None)[source]#

Optional information about an author to be used to enhance her presentation.

url#

Homepage of the author.

Type:

str

img#

Url to a picture.

Type:

str

group#

Group of the author.

Type:

str

position#

Coordinates of the author.

Type:

tuple

class gismap.lab.lab_author.LabAuthor(name: str, sources: list = <factory>, metadata: ~gismap.lab.lab_author.AuthorMetadata = <factory>, no_auto: bool = False)[source]#

Examples

The metadata and DB key(s) of an author can be entered in parentheses using key/values.

Improper key/values are ignored (with a warning).

>>> dummy= LabAuthor("My Name(img: https://my.url.img, group:me,url:https://mysite.org,hal:key1,ldb:toto,badkey:hello,no_colon_separator)")
>>> dummy.metadata
AuthorMetadata(url='https://mysite.org', img='https://my.url.img', group='me')
>>> dummy.sources
[HALAuthor(name='My Name', key='key1'), LDBAuthor(name='My Name', key='toto')]

You can enter multiple keys for the same DB. HAL key types are automatically detected.

>>> dummy2= LabAuthor("My Name (hal:key1,hal:123456,hal: My Other Name )")
>>> dummy2.sources  
[HALAuthor(name='My Name', key='key1'),
HALAuthor(name='My Name', key='123456', key_type='pid'),
HALAuthor(name='My Name', key='My Other Name', key_type='fullname')]

For HAL, hal:fullname is a shorthand to force a fullname search using the author’s name (useful when the pid is too restrictive).

>>> dummy3 = LabAuthor("Élie de Panafieu (hal:fullname)")
>>> dummy3.sources
[HALAuthor(name='Élie de Panafieu', key='Élie de Panafieu', key_type='fullname')]

By default, auto_sources() completes missing DBs automatically. Use the no_auto flag to disable this and keep only the explicit sources (e.g. to avoid homonyme pollution from other databases).

>>> dummy4 = LabAuthor("John Smith (hal:fullname, no_auto)")
>>> dummy4.no_auto
True
auto_sources(dbs=None)[source]#

Automatically search for the author in databases not already represented in sources.

If the author already has explicit sources (e.g. from parentheses notation), only the missing databases are queried. Does nothing if no_auto is True.

Parameters:

dbs (list, default=[HAL, DBLP]) – List of DB sources to use.

Return type:

None

property fingerprint#

A normalized version of the author’s name for matching purposes.

Returns:

The fingerprint of the author’s name.

Return type:

str

gismap.lab.lab_author.db_dict()[source]#

Lazy lookup of DB subclasses (avoids import-order dependency).

Forces import of all known backends so that get_classes sees them even when some are lazily imported at package level.

Expansion#

class gismap.lab.expansion.Prospect(author, strengths)[source]#

Candidate for integration to lab.

Parameters:
  • author (Author) – Reference author. Must have a key.

  • strengths (dict) – Dictionary of ProspectStrength.

class gismap.lab.expansion.ProspectStrength(coauthors: int, publications: int)[source]#

Measures the interaction between an external author and a lab by counting co-authors and publications.

A (max,+) addition is handled to deal with multiple keys.

Examples

>>> a1 = ProspectStrength(3, 5)
>>> a2 = ProspectStrength(2, 10)
>>> a1 > a2
True
>>> a1 + a2
ProspectStrength(coauthors=3, publications=15)
gismap.lab.expansion.count_prospect_entries(lab)[source]#

Associate to external coauthors (prospects) their lab strength.

Parameters:

lab (LabMap) – Reference lab.

Returns:

Lab strengths.

Return type:

dict of str to ProspectStrength

gismap.lab.expansion.get_member_names(lab)[source]#
Parameters:

lab (LabMap) – Reference lab.

Returns:

Tuples simplified-name -> key

Return type:

list

gismap.lab.expansion.get_prospects(lab)[source]#
Parameters:

lab (LabMap) – Reference lab.

Returns:

List of prospects.

Return type:

list of Prospect

gismap.lab.expansion.proper_prospects(lab, length_impact=0.05, threshold=80, n_range=4, max_new=None, trim=True)[source]#

Find and rank external collaborators for potential lab expansion.

Identifies authors from publications who are not already lab members, groups them by name similarity, and ranks by collaboration strength.

Parameters:
  • lab (LabMap) – Reference lab.

  • length_impact (float, default=0.05) – Length impact for name similarity matching.

  • threshold (int, default=80) – Similarity threshold for grouping authors.

  • n_range (int, default=4) – N-gram range for name comparison.

  • max_new (int, optional) – Maximum number of new authors to return.

  • trim (bool, default=True) – If True, keep only one source per database for each author.

Returns:

New authors ranked by collaboration strength (descending).

Return type:

list of LabAuthor

gismap.lab.expansion.trim_sources(author)[source]#

Inplace reduction of sources, keeping one unique source per db.

Parameters:

author (SourcedAuthor) – An author.

Return type:

None

Filters#

gismap.lab.filters.author_taboo_filter(w=None)[source]#
Parameters:

w (list, optional) – List of words to filter.

Returns:

Filter function on authors.

Return type:

Callable

gismap.lab.filters.publication_oneword_filter(n_min=2)[source]#
Parameters:

n_min (int, default=2) – Minimum number of words required in the title.

Returns:

Filter on number of words required in the title.

Return type:

callable

gismap.lab.filters.publication_size_filter(n_max=9)[source]#
Parameters:

n_max (int, default=9) – Maximum number of co-authors allowed.

Returns:

Filter on number of co-authors.

Return type:

callable

gismap.lab.filters.publication_taboo_filter(w=None)[source]#
Parameters:

w (list, optional) – List of words to filter.

Returns:

Filter function on publications.

Return type:

Callable

gismap.lab.filters.re_filter(words)[source]#
Parameters:

words (list or str) – List of word(s) to filter.

Returns:

Filter function.

Return type:

callable

Examples

>>> f = re_filter("foo")
>>> f("foobar")
False