D-Iteration#

This module transforms queries into relevance vectors that can be used to rank and organize documents and features.

class gismo.diteration.DIteration(n, m)[source]#

This class is in charge of performing the DIteration algorithm.

Parameters:
  • n (int) – Number of documents.

  • m (int) – Number of features.

x_relevance#

Relevance of documents.

Type:

ndarray

y_relevance#

Relevance of features.

Type:

ndarray

x_order#

Indices of documents sorted by relevance.

Type:

ndarray

y_order#

Indices of features sorted by relevance.

Type:

ndarray

gismo.diteration.jit_diffusion(x_pointers, x_indices, x_data, y_pointers, y_indices, y_data, z_indices, z_data, x_relevance, y_relevance, alpha, n_iter, offset: float, x_fluid, y_fluid)[source]#

Core diffusion engine written to be compatible with Numba. This is where the DIteration algorithm is applied inline.

Parameters:
  • x_pointers (ndarray) – Pointers of the csr_matrix embedding of documents.

  • x_indices (ndarray) – Indices of the csr_matrix embedding of documents.

  • x_data (ndarray) – Data of the csr_matrix embedding of documents.

  • y_pointers (ndarray) – Pointers of the csr_matrix embedding of features.

  • y_indices (ndarray) – Indices of the csr_matrix embedding of features.

  • y_data (ndarray) – Data of the csr_matrix embedding of features.

  • z_indices (ndarray) – Indices of the csr_matrix embedding of the query projection.

  • z_data (ndarray) – Data of the csr_matrix embedding of the query_projection.

  • x_relevance (ndarray) – Placeholder for relevance of documents.

  • y_relevance (ndarray) – Placeholder for relevance of features.

  • alpha (float in range [0.0, 1.0]) – Damping factor. Controls the trade-off between closeness and centrality.

  • n_iter (int) – Number of round-trip diffusions to perform. Higher value means better precision but longer execution time.

  • offset (float in range [0.0, 1.0]) – Controls how much of the initial fluid should be deduced form the relevance.

  • x_fluid (ndarray) – Placeholder for fluid on the side of documents.

  • y_fluid (ndarray) – Placeholder for fluid on the side of features.