{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# HALTools\n" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "GisMap exposes two helpers to compare what different databases know about\n", "a researcher and to spot duplicates inside a single database:\n", "\n", "- `diff_sources(a, b)` lists publications present in `a` but not in `b`,\n", " and vice-versa. Useful when you want to manually update one database\n", " using inputs from another one.\n", "- `find_duplicates(a)` groups publications that look like the same paper\n", " inside `a` (e.g. an entry created by you and a near-duplicate created\n", " by a co-author).\n", "\n", "A frequent use is to audit the database you can actually edit, and\n", "compare it with a more remote one. For researchers based in France, the\n", "editable database is usually [HAL](https://hal.science/) — hence the\n", "nickname *HALTools*. The methods themselves are not HAL-specific.\n", "\n", "Typical use cases:\n", "\n", "- Spot duplicates of the same publication that you and a co-author\n", " registered independently.\n", "- Verify that all your DBLP-indexed publications have a counterpart in\n", " HAL (mandatory for most French academic labs, easy to forget).\n", "- For researchers with a partial HAL footprint (e.g. several `pid`s and\n", " no unified `HAL-ID`), HALTools helps reconcile what is found by which\n", " identifier.\n", "\n", "The rest of this notebook walks through illustrated examples.\n" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ ":::note\n", "You do not need a full `LabMap` to use HALTools — a `LabAuthor` is enough\n", "(technically `SourcedAuthor` would do, but `LabAuthor` is more convenient).\n", ":::\n" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ ":::important\n", "HALTools surface entries to **check**, not entries to **change**. As we\n", "will see below, many flagged items are perfectly fine and need no action.\n", ":::\n" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## Céline — a clean researcher\n", "\n", "We start with [Céline Comte](https://homepages.laas.fr/ccomte/index.html), whose HAL and\n", "DBLP profiles are well-maintained. The differences we find should be\n", "explainable, not actionable.\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "5", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:08:57.945162Z", "iopub.status.busy": "2026-05-19T16:08:57.944990Z", "iopub.status.idle": "2026-05-19T16:09:18.510999Z", "shell.execute_reply": "2026-05-19T16:09:18.509999Z" } }, "outputs": [], "source": [ "from gismap.lab.lab_author import LabAuthor\n", "\n", "celine = LabAuthor(\"Céline Comte\")\n", "celine.auto_sources()" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "### Comparing HAL and LDB\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "7", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:18.512995Z", "iopub.status.busy": "2026-05-19T16:09:18.512726Z", "iopub.status.idle": "2026-05-19T16:09:19.462223Z", "shell.execute_reply": "2026-05-19T16:09:19.461218Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Only in hal (8) ===\n", "\"Modèle de couplage stochastique non-biparti\" (2021, conference) - https://hal.science/hal-03219422v1\n", "\"Online Stochastic Matching: A Polytope Perspective\" (2025, report) - https://hal.science/hal-03502084v6\n", "\"Un seul serveur vous manque, et tout est découplé !\" (2018, conference) - https://hal.science/hal-01773674v1\n", "\"À la racine du parallélisme\" (2017, conference) - https://hal.science/hal-01517150v1\n", "\"0 = 0, c'est le truc du noyau ! Application aux files d'attente\" (2019, conference) - https://hal.science/hal-02118156v1\n", "\"Performance of a Server Cluster with Parallel Processing and Randomized Load Balancing\" (2016, report) - https://hal.science/hal-01306343v1\n", "\"Rien ne sert de prédire ; il faut servir ancien.\" (2019, conference) - https://hal.science/hal-02118170v1\n", "\"La Grille de Kleinberg, l’Univers et le Reste\" (2017, conference) - https://hal.science/hal-01517123v1\n", "=== Only in ldb (3) ===\n", "\"Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems.\" (2023, journal) - https://doi.org/10.48550/ARXIV.2312.02804\n", "\"Networks of multi-server queues with parallel processing.\" (2016, journal) - http://arxiv.org/abs/1604.06763\n", "\"Stochastic dynamic matching: A mixed graph-theory and linear-algebra approach.\" (2021, journal) - https://arxiv.org/abs/2112.14457\n" ] } ], "source": [ "print(celine.diff_sources(\"hal\", \"ldb\"))" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "Reading the output:\n", "\n", "- Most HAL-only papers are written in French, typically Algotel\n", " conference papers aimed at the French-speaking community. DBLP rarely\n", " indexes French-only papers (with a few exceptions like\n", " [JIAF 2023](https://dblp.org/db/conf/jiaf/jiaf2023.html#conf/jiaf/CaizerguesDM23)).\n", " Nothing wrong here.\n", "- *Networks of multi-server queues with parallel processing* (LDB-only):\n", " likely an early research report later absorbed into a journal paper\n", " with a different title.\n", "- *Online Stochastic Matching: A Polytope Perspective* (HAL) and\n", " *Stochastic dynamic matching: A mixed graph-theory and linear-algebra\n", " approach* (LDB) are actually the same article. HAL stores the latest\n", " version (newer date and title), DBLP/LDB keeps the original. Merging\n", " these would be a hard feature for a small payoff — don't expect it\n", " soon.\n", "- *Score-Aware Policy-Gradient Methods…* (LDB-only): same pattern, a\n", " report later published in a journal under a slightly different title.\n", " Could be merged with a more lenient title-similarity threshold, but\n", " that knob is not exposed in the public API yet.\n", "\n", "Bottom line: everything is fine.\n" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "### Duplicates inside HAL\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "10", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:19.464759Z", "iopub.status.busy": "2026-05-19T16:09:19.464568Z", "iopub.status.idle": "2026-05-19T16:09:20.425144Z", "shell.execute_reply": "2026-05-19T16:09:20.423705Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Duplicates in hal (2 groups) ===\n", " Group 1:\n", " \"À la racine du parallélisme\" (2017, conference) - https://hal.science/hal-01517150v1\n", " \"À la racine du parallélisme\" (2017, report) - https://inria.hal.science/hal-01476889v3\n", " Group 2:\n", " \"Dynamic load balancing with tokens\" (2019, journal) - https://hal.science/hal-02340255v1\n", " \"Dynamic Load Balancing with Tokens\" (2018, conference) - https://hal.science/hal-01758912v2\n" ] } ], "source": [ "print(celine.find_duplicates(\"hal\"))" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "Both groups are report-to-conference or conference-to-journal\n", "lifecycles — no actual issue. GisMap merges these duplicates\n", "automatically when building a map, so no action is needed.\n" ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "## François — a messier real-world case\n", "\n", "[François Durand](https://cv.hal.science/fradurand) has a different history,\n", "with French papers, posters, and a few real gaps. We pin the\n", "HAL `pid` explicitly to avoid homonyms.\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "13", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:20.427108Z", "iopub.status.busy": "2026-05-19T16:09:20.426795Z", "iopub.status.idle": "2026-05-19T16:09:20.697278Z", "shell.execute_reply": "2026-05-19T16:09:20.696482Z" } }, "outputs": [], "source": [ "francois = LabAuthor(\"François Durand (hal:fradurand)\")\n", "francois.auto_sources()" ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "### Spotting actual gaps in HAL" ] }, { "cell_type": "code", "execution_count": 5, "id": "15", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:20.699600Z", "iopub.status.busy": "2026-05-19T16:09:20.699409Z", "iopub.status.idle": "2026-05-19T16:09:21.639709Z", "shell.execute_reply": "2026-05-19T16:09:21.638581Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Only in hal (12) ===\n", "\"Coupure enharmonique, complétude et applications\" (2012, report) - https://hal.science/hal-00694023v1\n", "\"Voter Autrement 2017 for the French Presidential Election — The data of the In Situ Experiments\" (2020, report) - https://hal.science/hal-03223803v1\n", "\"Reducing Manipulability\" (2014, poster) - https://hal.science/hal-01095992v1\n", "\"Trier des cochons sauvages\" (2023, conference) - https://hal.science/hal-04077471v1\n", "\"Élection du Best Paper AlgoTel 2012 : étude de la manipulabilité\" (2014, conference) - https://inria.hal.science/hal-00986060v1\n", "\"Démocratie à géométrie variable (à l'usage des algorithmes)\" (2021, conference) - https://hal.science/hal-03213987v1\n", "\"Making a voting system depend only on orders of preference reduces its manipulability rate\" (2014, report) - https://inria.hal.science/hal-01009136v1\n", "\"Shannon, Turing and Hats: Information Theory Incompleteness\" (2017, conference) - https://inria.hal.science/hal-01675019v1\n", "\"Vers des modes de scrutin moins manipulables\" (2015, thesis) - https://inria.hal.science/tel-01242440v2\n", "\"Sortus Interruptus : Trier des phoques en O(n log(n))\" (2025, conference) - https://hal.science/hal-05003794v1\n", "\"Coalitional manipulation of voting rules: simulations on empirical data\" (2023, journal) - https://hal.science/hal-05573536v1\n", "\"Élection d'un chemin dans un réseau : étude de la manipulabilité\" (2014, conference) - https://inria.hal.science/hal-00986050v1\n", "=== Only in ldb (6) ===\n", "\"Learning-Based Multiuser Scheduling in Mimo-Ofdm Systems With Hybrid Beamforming.\" (2025, conference) - https://doi.org/10.1109/EUCNC/6GSUMMIT63408.2025.11037174\n", "\"Post-Computing Analog Beams after User Selection in a Hybrid Beamforming System.\" (2024, conference) - https://doi.org/10.1109/EUCNC/6GSUMMIT60053.2024.10597109\n", "\"L'abus de comparaisons est mauvais pour la santé.\" (2023, conference) - https://hal.science/hal-04209856v1/document\n", "\"Detection of Horse Locomotion Modifications Due to Training with Inertial Measurement Units: A Proof-of-Concept.\" (2022, journal) - https://doi.org/10.3390/S22134981\n", "\"Probability of a Condorcet Winner for Large Electorates: An Analytic Combinatorics Approach.\" (2025, journal) - https://doi.org/10.48550/ARXIV.2505.06028\n", "\"Sorting wild pigs.\" (2023, journal) - https://doi.org/10.48550/ARXIV.2304.11952\n" ] } ], "source": [ "print(francois.diff_sources(\"hal\", \"ldb\"))" ] }, { "cell_type": "markdown", "id": "16", "metadata": {}, "source": [ "**Only in HAL** — all explainable:\n", "\n", "- French papers (DBLP rarely indexes them), including the PhD thesis (written in French).\n", "- Items without proper proceedings (posters, *Voter Autrement* dataset\n", " reports).\n", "- *Coalitional manipulation of voting rules…* — published in an\n", " economics venue outside DBLP's scope.\n", "\n", "**Only in LDB** — mixed bag:\n", "\n", "- The two beamforming papers and the Condorcet paper *should* be in HAL\n", " but were forgotten. These are real action items.\n", "- *L'abus de comparaisons est mauvais pour la santé*: the HAL entry is\n", " the **whole proceedings** of the conference, with the PC chairs as\n", " authors. DBLP somehow extracted the individual papers from HAL even\n", " though HAL itself does not split them.\n", "- *Detection of Horse Locomotion…*: a rare DBLP confusion. The `pid`\n", " `38/11269` is supposed to point to a single person, but this\n", " particular paper was authored by a homonym. The only fix is to email\n", " DBLP directly (`dblp@dagstuhl.de`); response time can be slow.\n", "- *Sorting wild pigs* / *Trier des cochons sauvages*: a French paper\n", " that was OK to file in French because it carries an English title and\n", " abstract. Nothing to patch.\n" ] }, { "cell_type": "markdown", "id": "17", "metadata": {}, "source": [ "### Duplicates: same paper, different lifecycle stages\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "18", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:21.642270Z", "iopub.status.busy": "2026-05-19T16:09:21.642055Z", "iopub.status.idle": "2026-05-19T16:09:22.571079Z", "shell.execute_reply": "2026-05-19T16:09:22.569942Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Duplicates in hal (5 groups) ===\n", " Group 1:\n", " \"Voter Autrement 2017 for the French Presidential Election — The data of the In Situ Experiments\" (2020, report) - https://hal.science/hal-03223803v1\n", " \"Voter Autrement 2017 for the French Presidential Election\" (2019, report) - https://shs.hal.science/halshs-02379941v1\n", " \"Voter Autrement 2017 - Online Experiment\" (2018, report) - https://hal.science/hal-03223762v1\n", " Group 2:\n", " \"Reducing Manipulability\" (2014, poster) - https://hal.science/hal-01095992v1\n", " \"Making most voting systems meet the Condorcet criterion reduces their manipulability\" (2014, report) - https://inria.hal.science/hal-01009134v1\n", " Group 3:\n", " \"Geometry on the Utility Space\" (2015, conference) - https://inria.hal.science/hal-01222871v1\n", " \"Geometry on the Utility Space\" (2014, conference) - https://hal.science/hal-01096018v1\n", " Group 4:\n", " \"SVVAMP: Simulator of Various Voting Algorithms in Manipulating Populations\" (2016, conference) - https://hal.science/hal-01369835v1\n", " \"SVVAMP: Simulator of Various Voting Algorithms in Manipulating Populations\" (2015, report) - https://hal.science/hal-01135109v1\n", " Group 5:\n", " \"On the Manipulability of Voting Systems: Application to Multi-Operator Networks\" (2013, conference) - https://hal.science/hal-00874096v1\n", " \"On the Manipulability of Voting Systems: Application to Multi-Carrier Networks\" (2012, report) - https://inria.hal.science/hal-00692096v1\n" ] } ], "source": [ "print(francois.find_duplicates(\"hal\"))" ] }, { "cell_type": "markdown", "id": "19", "metadata": {}, "source": [ "All five groups are benign:\n", "\n", "- **Group 1** — three distinct documents around the same *Voter\n", " Autrement 2017* experiment.\n", "- **Group 2** — a poster summarizing a longer report.\n", "- **Group 3** — two evolutions of the same paper.\n", "- **Group 4** — short report vs. short announcement of the same tool.\n", "- **Group 5** — report-to-conference lifecycle.\n" ] }, { "cell_type": "markdown", "id": "20", "metadata": {}, "source": [ "## Fabien — many duplicates, one real issue\n", "\n", "On older HAL profiles, like for [Fabien Mathieu](https://balouf.github.io/), the report → conference → journal lifecycle has\n", "had time to accumulate. The interesting question is whether a real\n", "duplicate hides among the noise.\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "21", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:22.573131Z", "iopub.status.busy": "2026-05-19T16:09:22.572937Z", "iopub.status.idle": "2026-05-19T16:09:23.351068Z", "shell.execute_reply": "2026-05-19T16:09:23.350138Z" } }, "outputs": [], "source": [ "fabien = LabAuthor(\"Fabien Mathieu\")\n", "fabien.auto_sources()" ] }, { "cell_type": "markdown", "id": "22", "metadata": {}, "source": [ "Focus on duplicates:\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "23", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:23.352983Z", "iopub.status.busy": "2026-05-19T16:09:23.352784Z", "iopub.status.idle": "2026-05-19T16:09:24.515435Z", "shell.execute_reply": "2026-05-19T16:09:24.514029Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Duplicates in hal (14 groups) ===\n", " Group 1:\n", " \"LiveRank: How to Refresh Old Datasets\" (2015, journal) - https://inria.hal.science/hal-01251552v1\n", " \"LiveRank: How to Refresh Old Crawls\" (2014, conference) - https://inria.hal.science/hal-01093188v1\n", " Group 2:\n", " \"Geometry on the Utility Space\" (2015, conference) - https://inria.hal.science/hal-01222871v1\n", " \"Geometry on the Utility Space\" (2014, conference) - https://hal.science/hal-01096018v1\n", " Group 3:\n", " \"Stratification in P2P networks, Application to BitTorrent\" (2007, conference) - https://hal.science/hal-00159663v1\n", " \"Stratification in P2P Networks - Application to BitTorrent\" (2006, report) - https://inria.hal.science/inria-00121974v2\n", " Group 4:\n", " \"SVVAMP: Simulator of Various Voting Algorithms in Manipulating Populations\" (2016, conference) - https://hal.science/hal-01369835v1\n", " \"SVVAMP: Simulator of Various Voting Algorithms in Manipulating Populations\" (2015, report) - https://hal.science/hal-01135109v1\n", " Group 5:\n", " \"Upper bounds for stabilization in acyclic preference-based systems\" (2007, conference) - https://inria.hal.science/hal-00668356v1\n", " \"Acyclic Preference-Based Systems\" (2010, chapter) - https://inria.hal.science/hal-00667351v1\n", " Group 6:\n", " \"On Using Matching Theory to Understand P2P Network Design\" (2007, conference) - https://hal.science/hal-00159678v1\n", " \"On Using Matching Theory to Understand P2P Network Design\" (2006, report) - https://inria.hal.science/inria-00121604v2\n", " Group 7:\n", " \"Acyclic Preference Systems in P2P Networks\" (2007, conference) - https://inria.hal.science/inria-00471720v1\n", " \"Acyclic Preference Systems in P2P Networks\" (2007, report) - https://inria.hal.science/inria-00143790v2\n", " Group 8:\n", " \"Local Aspects of the Global Ranking of Web Pages\" (2006, conference) - https://inria.hal.science/inria-00160799v1\n", " \"Local Aspects of the Global Ranking of Web Pages\" (2004, report) - https://inria.hal.science/inria-00070800v1\n", " Group 9:\n", " \"On resource aware algorithms in epidemic live streaming\" (2010, conference) - https://inria.hal.science/hal-00668321v1\n", " \"On Resource Aware Algorithms in Epidemic Live Streaming\" (2009, report) - https://inria.hal.science/inria-00414706v1\n", " Group 10:\n", " \"The Stable Configuration of Acyclic Preference-Based Systems\" (2009, conference) - https://inria.hal.science/hal-00668292v1\n", " \"The stable configuration in acyclic preference-based systems\" (2008, report) - https://inria.hal.science/inria-00318621v1\n", " Group 11:\n", " \"Reducing Manipulability\" (2014, poster) - https://hal.science/hal-01095992v1\n", " \"Making most voting systems meet the Condorcet criterion reduces their manipulability\" (2014, report) - https://inria.hal.science/hal-01009134v1\n", " Group 12:\n", " \"Deciding and verifying network properties locally with few output bits\" (2020, journal) - https://inria.hal.science/hal-03100213v1\n", " \"Deciding and verifying network properties locally with few output bits\" (2020, journal) - https://hal.science/hal-02285014v1\n", " Group 13:\n", " \"À la racine du parallélisme\" (2017, conference) - https://hal.science/hal-01517150v1\n", " \"À la racine du parallélisme\" (2017, report) - https://inria.hal.science/hal-01476889v3\n", " Group 14:\n", " \"On the Manipulability of Voting Systems: Application to Multi-Operator Networks\" (2013, conference) - https://hal.science/hal-00874096v1\n", " \"On the Manipulability of Voting Systems: Application to Multi-Carrier Networks\" (2012, report) - https://inria.hal.science/hal-00692096v1\n" ] } ], "source": [ "print(fabien.find_duplicates(\"hal\"))" ] }, { "cell_type": "markdown", "id": "24", "metadata": {}, "source": [ "Groups 1 to 13 follow the usual lifecycle pattern (report → conference\n", "→ journal/chapter) and need no action.\n", "\n", "**Group 14 is the real issue**: *Deciding and verifying network\n", "properties locally with few output bits* exists twice in HAL because two\n", "co-authors entered it independently. The fix is to ask HAL support to\n", "merge the two entries.\n" ] }, { "cell_type": "markdown", "id": "25", "metadata": {}, "source": [ "## Élie — when the HAL `pid` is too strict\n", "\n", "Élie de Panafieu has a low HAL footprint and a more idiosyncratic\n", "profile. Default settings will only catch part of his work — HALTools\n", "is exactly the tool to diagnose what is missing and why.\n" ] }, { "cell_type": "markdown", "id": "26", "metadata": {}, "source": [ "By default, GisMap picks the most specific HAL identifier it can find:\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "27", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:24.517437Z", "iopub.status.busy": "2026-05-19T16:09:24.517254Z", "iopub.status.idle": "2026-05-19T16:09:25.218953Z", "shell.execute_reply": "2026-05-19T16:09:25.218072Z" } }, "outputs": [ { "data": { "text/plain": [ "[HALAuthor(name='Élie de Panafieu', key='1319887', key_type='pid')]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "elie = LabAuthor(\"Élie de Panafieu\")\n", "elie.auto_sources(\"hal\")\n", "elie.sources" ] }, { "cell_type": "code", "execution_count": 10, "id": "28", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:25.220644Z", "iopub.status.busy": "2026-05-19T16:09:25.220481Z", "iopub.status.idle": "2026-05-19T16:09:25.929843Z", "shell.execute_reply": "2026-05-19T16:09:25.928622Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The birth of the strong components, by Sergey Dovgal, Élie de Panafieu, Dimbinaina Ralaivaosaona, Vonjy Rasendrahasina, and Stephan Wagner. In Random Structures and Algorithms [journal], 2023.\n" ] } ], "source": [ "print(\"\\n\".join(str(p) for p in elie.get_publications().values()))\n" ] }, { "cell_type": "markdown", "id": "29", "metadata": {}, "source": [ "Only one publication. The HAL `pid` is unambiguous but too narrow:\n", "many of Élie's HAL entries are not attached to that `pid`. We can\n", "confirm this by comparing `pid`-based and fullname-based searches:\n" ] }, { "cell_type": "code", "execution_count": 11, "id": "30", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:25.931945Z", "iopub.status.busy": "2026-05-19T16:09:25.931744Z", "iopub.status.idle": "2026-05-19T16:09:27.348217Z", "shell.execute_reply": "2026-05-19T16:09:27.347072Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Only in hal (0) (0) ===\n", "=== Only in hal (1) (5) ===\n", "\"Enumeration and structure of inhomogeneous graphs\" (2015, conference) - https://hal.science/hal-01337762v1\n", "\"Counting connected graphs with large excess\" (2016, conference) - https://hal.science/hal-02166341v1\n", "\"Complexity Estimates for Two Uncoupling Algorithms\" (2013, conference) - https://inria.hal.science/hal-00780010v2\n", "\"Of Kernels and Queues: when network calculus meets analytic combinatorics\" (2018, conference) - https://hal.science/hal-01889101v1\n", "\"0 = 0, c'est le truc du noyau ! Application aux files d'attente\" (2019, conference) - https://hal.science/hal-02118156v1\n" ] } ], "source": [ "elie = LabAuthor(\"Élie de Panafieu (hal:1319887, hal:fullname)\")\n", "print(elie.diff_sources(0, 1))" ] }, { "cell_type": "markdown", "id": "31", "metadata": {}, "source": [ "The fullname search picks up five more HAL entries that the `pid` misses.\n", "The fix is to switch to fullname (the `hal:fullname` flag is documented\n", "in the [FAQ](../faq.ipynb)):\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "32", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:27.350113Z", "iopub.status.busy": "2026-05-19T16:09:27.349919Z", "iopub.status.idle": "2026-05-19T16:09:27.403394Z", "shell.execute_reply": "2026-05-19T16:09:27.402621Z" } }, "outputs": [], "source": [ "elie = LabAuthor(\"Élie de Panafieu (hal:fullname)\")\n", "elie.auto_sources()" ] }, { "cell_type": "markdown", "id": "33", "metadata": {}, "source": [ "Comparing HAL (now in fullname mode) against LDB:\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "34", "metadata": { "execution": { "iopub.execute_input": "2026-05-19T16:09:27.405561Z", "iopub.status.busy": "2026-05-19T16:09:27.405379Z", "iopub.status.idle": "2026-05-19T16:09:28.150719Z", "shell.execute_reply": "2026-05-19T16:09:28.149539Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== Only in hal (3) ===\n", "\"Counting connected graphs with large excess\" (2016, conference) - https://hal.science/hal-02166341v1\n", "\"Enumeration and structure of inhomogeneous graphs\" (2015, conference) - https://hal.science/hal-01337762v1\n", "\"0 = 0, c'est le truc du noyau ! Application aux files d'attente\" (2019, conference) - https://hal.science/hal-02118156v1\n", "=== Only in ldb (17) ===\n", "\"torus packing for multisets.\" (2024, misc) - https://doi.org/10.4230/ARTIFACTS.22479\n", "\"Tree Walks and the Spectrum of Random Graphs.\" (2024, conference) - https://doi.org/10.4230/LIPICS.AOFA.2024.11\n", "\"Combinatorics of nondeterministic walks of the Dyck and Motzkin type.\" (2019, conference) - https://doi.org/10.1137/1.9781611975505.1\n", "\"Graphs with degree constraints.\" (2016, conference) - https://doi.org/10.1137/1.9781611974324.4\n", "\"Expressive Key-Policy Attribute-Based Encryption with Constant-Size Ciphertexts.\" (2011, conference) - https://doi.org/10.1007/978-3-642-19379-8_6\n", "\"Phase transition of random non-uniform hypergraphs.\" (2015, journal) - https://doi.org/10.1016/J.JDA.2015.01.009\n", "\"2-Xor Revisited: Satisfiability and Probabilities of Functions.\" (2016, journal) - https://doi.org/10.1007/S00453-016-0119-X\n", "\"Active clustering for labeling training data.\" (2021, conference) - https://proceedings.neurips.cc/paper/2021/hash/47841cc9e552bd5c40164db7073b817b-Abstract.html\n", "\"Robot Positioning Using Torus Packing for Multisets.\" (2024, conference) - https://doi.org/10.4230/LIPICS.ICALP.2024.43\n", "\"Threshold functions for small subgraphs in simple graphs and multigraphs.\" (2020, journal) - https://doi.org/10.1016/J.EJC.2020.103113\n", "\"Analytic description of the phase transition of inhomogeneous multigraphs.\" (2015, journal) - https://doi.org/10.1016/J.EJC.2015.02.020\n", "\"Attribute-based encryption schemes with constant-size ciphertexts.\" (2012, journal) - https://doi.org/10.1016/J.TCS.2011.12.004\n", "\"Exact enumeration of satisfiable 2-SAT formulae.\" (2023, journal) - https://doi.org/10.5070/C63261985\n", "\"Threshold functions for small subgraphs: an analytic approach.\" (2017, journal) - https://doi.org/10.1016/J.ENDM.2017.06.048\n", "\"Analytic combinatorics of connected graphs.\" (2019, journal) - https://doi.org/10.1002/RSA.20836\n", "\"Probability of a Condorcet Winner for Large Electorates: An Analytic Combinatorics Approach.\" (2025, journal) - https://doi.org/10.48550/ARXIV.2505.06028\n", "\"Counting directed acyclic and elementary digraphs.\" (2020, journal) - https://arxiv.org/abs/2001.08659\n" ] } ], "source": [ "print(elie.diff_sources(\"hal\", \"ldb\"))" ] }, { "cell_type": "markdown", "id": "35", "metadata": {}, "source": [ "Élie's exposure is mostly on DBLP/LDB — and that is fine. HAL deposit\n", "is only mandatory for researchers affiliated with a French academic\n", "institution; the LDB-only entries here are not action items.\n" ] }, { "cell_type": "markdown", "id": "36", "metadata": {}, "source": [ "## When to act\n", "\n", "Across the four examples, only a handful of items called for action:\n", "\n", "- **Missing HAL deposits** (François: beamforming and Condorcet papers).\n", "- **Genuine HAL duplicates** entered by two co-authors (Fabien: *Deciding\n", " and verifying network properties…*).\n", "- **Wrong DBLP attribution to a homonym** (François: *Horse Locomotion*).\n", "- **Identifier strategy** (Élie: switch from `pid` to fullname).\n", "\n", "Everything else — lifecycle duplicates, French-only papers, reports\n", "absorbed into journals — is normal database life that GisMap already\n", "handles when building a map.\n" ] } ], "metadata": { "keep_output": true, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 5 }