History#

0.4.1 (2025-06-25): Maintenance#

  • Dependencies updated.

  • Switched management to UV and documentation to Myst’s MD.

  • Ruff’d

0.4 (2021-04-08): Back to Numba#

  • Cython is too difficult to maintain and Numba dict management is relatively OK since last time. Time to switch!

0.3.5 (2021-04-08): ARM64#

  • Attempt to update PyPi with Mac M1 compatible wheels.

0.3.4 (2021-01-05): Cleaning#

  • Renaming process.py to fuzz.py to emphasize that the module aims at being an alternative to the fuzzywuzzy package.

  • Removed modules FactorTree and JC. What they did is now essentially covered by the feature_extraction and fuzz modules.

  • General cleaning / rewriting of the documentation.

0.3.3 (2021-01-01): Cython/Numba balanced#

  • All core CountVectorizer methods ported to Cython. Roughly 2.5X faster than sklearn counterpart (mainly because some features like min_df/max_df are not implemented).

  • Process numba methods NOT converted to Cython as Numba seems to be 20% faster for csr manipulation.

  • Numba functions are cached to avoid compilation lag.

0.3.2 (2020-12-30): Going Cython#

  • First attempt to use Cython

  • Right now only the fit_transform method of CountVectorizer has been cythonized, for testing wheels.

  • If all goes well, numba will probably be abandoned and all the heavy-lifting will be in Cython.

0.3.1 (2020-12-28): Simplification of core algorithm#

  • Attributes of the CountVectorizer have been reduced to the minimum: one dict!

  • Now faster than sklearn counterpart! (The reason been only one case is considered here so we can ditch a lot of checks and attributes).

0.3.0 (2020-12-15): CountVectorizer and Process#

  • The core is now the CountVectorizer class. Lighter and faster. Only features are kept inside.

  • New process module inspired by fuzzywuzzy!

0.2.0 (2020-12-15): Fit/Transform#

  • Full refactoring to make the package fit/transform compliant.

  • Add a fit_sampling method that allows to fit only a (random) subset of factors

0.1.1 (2020-12-12): Upgrades#

  • Docstrings added

  • Common module (feat. save/load capabilities)

  • Joint Complexity module

0.1.0 (2020-12-12): First release#

  • First release on PyPI.

  • Core FactorTree class added.