{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# FAQ" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "## I don't want to install GisMap on my computer" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "No worries, you can play with it using https://mybinder.org/.\n", "\n", "For example:\n", "\n", "- [A simple interface to display and save collaboration graphs](https://mybinder.org/v2/gh/balouf/gismap/HEAD?urlpath=%2Fdoc%2Ftree%2Fbinder%2Finteractive.ipynb)\n", "- [Tutorial: Making LabMaps](https://mybinder.org/v2/gh/balouf/gismap/HEAD?urlpath=%2Fdoc%2Ftree%2Fdocs%2Ftutorials%2Flab_tutorial.ipynb)\n", "- [Tutorial: Making EgoMaps](https://mybinder.org/v2/gh/balouf/gismap/HEAD?urlpath=%2Fdoc%2Ftree%2Fdocs%2Ftutorials%2Fegomap.ipynb)\n", "- [Jupyter Lab instance with GisMap installed](https://mybinder.org/v2/gh/balouf/gismap/HEAD)" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "## LabMaps" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "### DB, Lab, Author, Maps... What does that mean?" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "The internal structure of GisMap interleaves simple concepts, like author, DB, publication, and lab.\n", "\n", "- An `Author` is a reseacher that has published papers. Internally, it's just a name.\n", "- A `DB` is where the information comes from, like the [HAL portal](https://hal.science/) or\n", "the [The DBLP Computer Science Bibliography](https://dblp.org/).\n", "- A DB should provide a method to search authors and retrieve author's publications.\n", "The results of these methods are lists of `DBAuthor` and `DBPublication`.\n", "- A given author/publication can be present in multiple DBs, and quite frequently multiple times in the same DB.\n", "GisMap uses `SourcedAuthor` and `SourcedPublication` to regroup multiple entries under a single banner.\n", "- A `LabMap` is made of authors (`LabAuthor`, i.e. `SourcedAuthor` with additional metadata)\n", "and publications (`SourcedPublication`).\n", "- An `Egomap` is a special `LabMap` dedicated to one single researcher,\n", "showing her co-authors and possibly their co-authors." ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "### Why are there errors in the authors / publications I get?" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "GisMap operates on real-life databases. The list of things that can go wrong is veeerrryyyy long. For example:\n", "\n", "- One DB may be temporarily down (DBLP can have issues sometimes);\n", "- Papers may be entered with bad author information;\n", "- A paper may exist in multiple versions (e.g. report, conference, journal), possibly with different spellings;\n", "- One given author may have multiple identity keys in the same DB;\n", "- Multiple authors may share the same name (and sometimes identity key) in the same DB.\n", "\n", "We do what we can to smooth things out automatically without any manual intervention, but sometimes it's not enough. However, 80% of the issues you may encounter can probably be solved by manual adjustment (see below)." ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "### How are publications grouped together?" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "We use [Bag of Factors](https://balouf.github.io/bof/), a package base on [Joint Complexity](https://hal.science/hal-03129901).\n", "\n", "If the titles are *close enough*, then we consider that it is the same paper. In case of multiple sources, one main source is selected (based on year, type, and other things).\n", "\n", ":::note\n", "\n", "We are aware that merging from title similarities has its flaws. But after comparing with other options (e.g. strict equality on title, authors, year, and possibly venue), we decided that it was a good compromise.\n", "\n", ":::" ] }, { "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "### How to enter a researcher?" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "When a researcher is uniquely defined by her name, you can just enter that name, e.g." ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "from gismap.lab import LabAuthor\n", "\n", "cc = LabAuthor(\"Céline Comte\")\n", "cc.auto_sources()\n", "cc.sources" ] }, { "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "However, sometimes, the name is ambiguous, and you may have multiple keys that corresponds to multiple authors." ] }, { "cell_type": "code", "execution_count": null, "id": "14", "metadata": {}, "outputs": [], "source": [ "fd = LabAuthor(\"François Durand\")\n", "fd.auto_sources()\n", "fd.sources" ] }, { "cell_type": "markdown", "id": "15", "metadata": {}, "source": [ "When that happens, you need to manually enter the desired keys.\n", "This can be done by specifying the sources:" ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": {}, "outputs": [], "source": [ "from gismap import HALAuthor, DBLPAuthor\n", "\n", "fd = LabAuthor.from_sources(\n", " [\n", " HALAuthor(name=\"François Durand\", key=\"fradurand\"),\n", " DBLPAuthor(name=\"François Durand\", key=\"38/11269\"),\n", " ]\n", ")\n", "fd.sources" ] }, { "cell_type": "markdown", "id": "17", "metadata": {}, "source": [ "For convenience, you can also enter keys in parentheses. For example, an exact equivalent of the above is:" ] }, { "cell_type": "code", "execution_count": null, "id": "18", "metadata": {}, "outputs": [], "source": [ "fd = LabAuthor(\"François Durand (hal: fradurand, dblp: 38/11269)\")\n", "fd.sources" ] }, { "cell_type": "markdown", "id": "19", "metadata": {}, "source": [ ":::note\n", "\n", "- The parenthesis notation can also be used to specify other metadata (URL, image, group) with the same \"key: value, ...\"format\n", "- For proper parsing, a name should contain no parenthesis, a key should not contain \",\" or \":\", and a value should not contain \",\"\n", "- You can enter mutiple keys for the same DB, e.g. \"My name (hal: key1, hal: key2)\"\n", "\n", ":::" ] }, { "cell_type": "code", "execution_count": null, "id": "20", "metadata": {}, "outputs": [], "source": [ "dummy = LabAuthor(\n", " \"My Name(img: https://my.url.img, group:me,url:https://mysite.org,hal:key1,hal:123456,dblp:toto,badkey:hello,no_colon_separator)\"\n", ")\n", "print(dummy.metadata)\n", "print(dummy.sources)" ] }, { "cell_type": "markdown", "id": "21", "metadata": {}, "source": [ "### There is a publication that should not be there in my LabMap" ] }, { "cell_type": "markdown", "id": "22", "metadata": {}, "source": [ "The first thing to check is if the publication has one external author with the same name that one of the lab's authors.\n", "If that is the case, try to manually specify the sources of the author to avoid this.\n", "\n", "\n", "Another possibility (e.g. the DB does not distinguish the internal and external authors) is to add a manual filter to discard the outsider. For example, the following publication is a DBLP false positive." ] }, { "cell_type": "code", "execution_count": null, "id": "23", "metadata": {}, "outputs": [], "source": [ "fd = LabAuthor(\"François Durand (dblp: 38/11269)\")\n", "pubs = fd.get_publications()\n", "print(len(pubs))\n", "[p.title for p in pubs.values() if \"Horse Locomotion\" in p.title]" ] }, { "cell_type": "markdown", "id": "24", "metadata": {}, "source": [ "It will associate wrong co-authors. The solution is then to add a *publication filter*." ] }, { "cell_type": "code", "execution_count": null, "id": "25", "metadata": {}, "outputs": [], "source": [ "no_horse = lambda p: \"Horse Locomotion\" not in p.title\n", "pubs = fd.get_publications(selector=no_horse)\n", "print(len(pubs))\n", "[p.title for p in pubs.values() if \"Horse Locomotion\" in p.title]" ] }, { "cell_type": "markdown", "id": "26", "metadata": {}, "source": [ "### There is a publication that should be there in my LabMap" ] }, { "cell_type": "markdown", "id": "27", "metadata": {}, "source": [ "If a publication exists when you check it with the `get_publications` method of a `LabAuthor`, but is not registered in the lab publications, that probably means that the publication has been rejected by one the default filter.\n", "\n", "- `LabMap` objects store a list of filters in the attribute `publication_selectors`. For a publication to be stored, all filters must return `True`.\n", "- Default filters:\n", " - There should be no more than 9 authors. In my research field, collegial papers are relatively rare and most of the time they stem from a pollution from other fields where 100+ authors are entered at once with a large risk of error.\n", " - The title should have at least 2 words.\n", " - The title should not contain *taboo* words like *Editorial*, *Foreword*, *Brief Announcement*, *Preface*.\n", "\n", "Feel free to change the existing filters or add new oned to adapt to your need.\n", "\n", "For example, to build the EgoMap of François Durand, taking account some specificities:" ] }, { "cell_type": "code", "execution_count": null, "id": "28", "metadata": {}, "outputs": [], "source": [ "from gismap.lab import EgoMap\n", "from gismap.lab.filters import publication_taboo_filter, publication_oneword_filter\n", "\n", "fd = EgoMap(\"François Durand (hal: fradurand, dblp: 38/11269)\")\n", "fd.publication_selectors = [\n", " publication_taboo_filter(),\n", " publication_oneword_filter(n_min=4),\n", " no_horse,\n", "]\n", "fd.build(target=30)\n", "fd.show_html()" ] }, { "cell_type": "markdown", "id": "29", "metadata": {}, "source": [ "### How to specify the DBs to use?" ] }, { "cell_type": "markdown", "id": "30", "metadata": {}, "source": [ "For `LabMap` (including `EgoMap`), you can specify a `dbs` parameter. It can be a string or a DB subclass,\n", "or a list of strings or DB subclasses.\n", "\n", "For `SourcedAuthor`/`LabAuthor`, you can specify `dbs` for the `auto_sources` method." ] }, { "cell_type": "markdown", "id": "31", "metadata": {}, "source": [ "## Graphical representation" ] }, { "cell_type": "markdown", "id": "32", "metadata": {}, "source": [ "### Stars, planets, moons, comets?" ] }, { "cell_type": "markdown", "id": "33", "metadata": {}, "source": [ "The *Maps*, i.e., the graphical representation of the collaboration graph, use vocabulary borrowed from astronomy.\n", "\n", "- **Stars** and **Planets**: For EgoMaps, you are the *star*, e.g., the center of your own universe. *Planets* are people that revolve around you, i.e., your co-authors.\n", "- **Moons**: A *moon* is a researcher connected to a planet or a lab.\n", "- **Comets**: A comet has no direct link with other displayed entities." ] }, { "cell_type": "markdown", "id": "34", "metadata": {}, "source": [ "### I don't want images!" ] }, { "cell_type": "markdown", "id": "35", "metadata": {}, "source": [ "By default, GisMap will retrieve pictures from the DB if it is available. Some `Lab` also fetch image data.\n", "\n", "If you don't want any image in your map, the best is probably to remove them from the authors' metadata:" ] }, { "cell_type": "code", "execution_count": null, "id": "36", "metadata": {}, "outputs": [], "source": [ "for author in fd.authors.values():\n", " author.metadata.img = None\n", "fd.show_html()" ] }, { "cell_type": "markdown", "id": "37", "metadata": {}, "source": [ "### The graph is spinning forever" ] }, { "cell_type": "markdown", "id": "38", "metadata": {}, "source": [ "It's a glitch in the physical engine that can happen from time to time. Usually, hitting the `redraw()` button (upper left corner) does the trick." ] }, { "cell_type": "markdown", "id": "39", "metadata": {}, "source": [ "### It's too small, I cannot see anything" ] }, { "cell_type": "markdown", "id": "40", "metadata": {}, "source": [ "Sadly, for very large graphs, the quantity of information to embed is just too important. However, a few tricks can help a bit:\n", "\n", "- Use `Full Screen` (lower right corner);\n", "- Zoom inside the graph, e.g. using the mouse wheel;\n", "- Move nodes around / use the `Redraw()` button;\n", "- Remind that full names appear when you hover nodes, it can help navigate when you know the people." ] }, { "cell_type": "markdown", "id": "41", "metadata": {}, "source": [ "### How to embed a Map on my website?" ] }, { "cell_type": "markdown", "id": "42", "metadata": {}, "source": [ "- For a simple and static embedding, you can use [Binder to manually generate the Map](https://mybinder.org/v2/gh/balouf/gismap/HEAD?urlpath=%2Fdoc%2Ftree%2Fbinder%2Finteractive.ipynb).\n", "- For something more advanced, it is recommended to write a Python script that generates the Map." ] }, { "cell_type": "markdown", "id": "43", "metadata": {}, "source": [ "## databases" ] }, { "cell_type": "markdown", "id": "44", "metadata": {}, "source": [ "### What databases are available, and what are their advantages?" ] }, { "cell_type": "markdown", "id": "45", "metadata": {}, "source": [ "| Database | Pros | Cons |\n", "|------------|--------------------------------------|-----------------------------------------------------------------|\n", "| HAL | Fast; Rich metadata (e.g. abstracts) | France-based research only; Errors and gaps exist |\n", "| DBLP | Highly accurate; Unified venue names | Computer Science only; Very slow |\n", "| Local DBLP | Ultra-fast | Not integrated yet! All DBLP cons + No unique search keys |" ] }, { "cell_type": "markdown", "id": "46", "metadata": {}, "source": [ "### Why doesn’t GisMap use database X?" ] }, { "cell_type": "markdown", "id": "47", "metadata": {}, "source": [ "I wanted to rely on publication databases that:\n", "\n", "- have a public API;\n", "- are relatively clean and up to date;\n", "- do not require an API key.\n", "\n", "To date, only HAL and DBLP seem to meet these specifications.\n", "\n", "That said, GisMap is designed to be multi-source, so if a contributor wants to add an interface to another database (Google Scholar, ORCID, ...), they are encouraged to write it and make a PR!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 5 }