Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] IVF-Flat optimize loading cluster data for large batch search #2191

Open
tfeher opened this issue Feb 19, 2024 · 1 comment
Open

[FEA] IVF-Flat optimize loading cluster data for large batch search #2191

tfeher opened this issue Feb 19, 2024 · 1 comment
Labels
feature request New feature or request

Comments

@tfeher
Copy link
Contributor

tfeher commented Feb 19, 2024

Is your feature request related to a problem? Please describe.
During IVF-Flat search a query vector is compared to all the vectors from n_probes clusters, and we have n_queries * n_probes query-probe pairs. For large batch search, when n_queries * n_probes > n_clusters then there will be clusters that are compared to more than one query vector.

The execution time of IVF-Flat is determined by the time to load the clusters from memory. Currently the query-probe pairs are sorted according to query index. To improve memory load time, we can sort the query-probe pairs according to the probe id (cluster label).

Describe the solution you'd like

Sort the query-probe pairs during fine search for better cache reuse. This is already implemented for IVF-PQ, and the same can be applied for IVF-Flat as well:

auto coresidency = expected_probe_coresidency(index.n_lists(), n_probes, n_queries);
if (coresidency > 1) {
// Sorting index by cluster number (label).
// The goal is to incrase the L2 cache hit rate to read the vectors
// of a cluster by processing the cluster at the same time as much as
// possible.
index_list_sorted_buf.resize(n_queries_probes, stream);
auto index_list_buf =
make_device_mdarray<uint32_t>(handle, mr, make_extents<uint32_t>(n_queries_probes));
rmm::device_uvector<uint32_t> cluster_labels_out(n_queries_probes, stream, mr);
auto index_list = index_list_buf.data_handle();
index_list_sorted = index_list_sorted_buf.data();
linalg::map_offset(handle, index_list_buf.view(), identity_op{});
int begin_bit = 0;
int end_bit = sizeof(uint32_t) * 8;
size_t cub_workspace_size = 0;
cub::DeviceRadixSort::SortPairs(nullptr,
cub_workspace_size,
clusters_to_probe,
cluster_labels_out.data(),
index_list,
index_list_sorted,
n_queries_probes,
begin_bit,
end_bit,
stream);
rmm::device_buffer cub_workspace(cub_workspace_size, stream, mr);
cub::DeviceRadixSort::SortPairs(cub_workspace.data(),
cub_workspace_size,
clusters_to_probe,
cluster_labels_out.data(),
index_list,
index_list_sorted,
n_queries_probes,
begin_bit,
end_bit,
stream);
}

Additional context
In IVF-Flat search we typically have 0.1-1% of the clusters searched, therefore this optimization is expected to help with batch size that is correspondingly large (hundreds or thousends of query vectors). We have a helper utility to calculate the expected number of times a cluster is loaded. This can be used to decide whether to sort the input data or not.

constexpr inline auto expected_probe_coresidency(uint32_t n_clusters,

@tfeher tfeher added the feature request New feature or request label Feb 19, 2024
@tfeher
Copy link
Contributor Author

tfeher commented Feb 19, 2024

This issue shall be implemented as a follow up to #2169, because that PR changes a few details in the IVF-Flat fine search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant