History
X.X.X (TODO-List)
Rethink distortion on both vectors normalization and IDTF/query trade-off.
Accelerate similarity computation (currently sklearn-based) in clustering.
0.4.X (2023-0X-XX) (tentative)
Context manager for FileSource (e.g.
with FileSource(...) as source:
)3.9 compatibility issues rechecked
Wheels
Minor change in test_dblp.py
0.4.3 (2022-12-26)
Refresh dependencies, compatibilities, and such.
Gismo is tested up to Python 3.10.
Patch sklearn change of API (ngram_range must be a tuple, get_feature_names has been renamed get_feature_names_out)
Updates MixInIO logic: you now save with the dump method and load with the load class method.
Package management now uses Github actions.
0.4.2 (2021-05-05)
Minor patch
Signature of the Gismo rank method changed to allow to enter directly a query vector instead of a string query (useful if one wants to craft a custom query vector).
Original source of the Reuters 50/50 dataset was discontinued; changed to an alternate source.
Fix change in spacy API
0.4.1 (2020-11-25)
Minor update.
DBLP API modified to you can specify the set of fields you want to retrieve.
Minor update in doctests.
Python 3.9 compatibility added.
0.4.0 (2020-07-21)
0.4 is a big update. Lot of things added, lot of things changed.
- New API for Gismo runtime parameters (see new parameters module for details). Short version:
gismo = Gismo(corpus, embedding, alpha=0.85)
: create a gismo with damping factor set to 0.85 instead of default value.gismo.parameters.alpha = 0.85
: set the damping factor of the gismo to 0.85.gismo.rank(query, alpha=0.85)
: makes a query with damping factor temporarily set to 0.85.
- Landmarks! Half Corpus, half Gismo, the Landmarks class can simplify many analysis tasks.
Landmarks are (small) corpus where each entry is augmented with the computation of an associated gismo query;
Landmarks can be used to refine the analysis around a part of your data;
They can be used as soft and fast classifiers.
Landmarks’ runtime parameters follow the same approach than for Gismo instances (cf above).
See the dedicated tutorial to learn more!
Documentation summer cleaning.
query_distortion
parameter (reshape subspace for clustering) is renameddistortion
and is now a float instead of a bool (e.g. you can apply distortion in a non-binary way).- Full refactoring of get_*** and post_*** methods and objects.
The good news is that they are now more natural, self-describing, and unified.
The bad news is that there is no backward-compatibility with previous Gismo versions. Hopefully this refactoring will last for some time!
Gismo logo added!
0.3.1 (2020-06-12)
New dataset: Reuters C50
New module: sentencizer
0.3.0 (2020-05-13)
dblp module: url2source function added to directly load a small dblp source in memory instead of using a FileSource approach.
Possibility to disable query distortion in gismo.
XGismo class to cross analyze embeddings.
Tutorials updated
0.2.5 (2020-05-11)
auto_k feature: if not specified, a query-dependent, reasonable, number of results k is estimated.
covering methods added to gismo. It is now possible to use get_covering_* instead of get_ranked_* to maximize coverage and/or eliminate redundancy.
0.2.4 (2020-05-07)
- Tutorials for ACM and DBLP added. After cleaning, there is currently 3 tutorials:
Toy model, to get the hang of Gismo on a tiny example,
ACM, to play with Gismo on a small example,
DBLP, to play with a large dataset.
0.2.3 (2020-05-04)
ACM and DBLP dataset creation added.
0.2.2 (2020-05-04)
Notebook tutorials added (early version)
0.2.1 (2020-05-03)
Actual code
Coverage badge
0.1.0 (2020-04-30)
First release on PyPI.