Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FAISS with RAFT enabled Benchmarking to raft-ann-bench #2026

Open
wants to merge 157 commits into
base: branch-24.06
Choose a base branch
from

Conversation

tarang-jain
Copy link
Contributor

@tarang-jain tarang-jain commented Nov 29, 2023

  • RMM pooled resource for RAFT enabled FAISS
  • Update yaml config for faiss_gpu_ivf_flat, faiss_gpu_ivf_pq
  • Small fix FAISS GPU IVFPQ params
  • Update get_faiss.cmake
  • Fix SIGSEGV

Notes: The StandardGpuResources object is part of FAISS' index classes. As a result, there is no way of creating a separate Raft handle for each thread without creating a new instance of the whole index object for a new thread. As such, multi-threaded benchmarking for GPU indices will not work.
For CPU indices, @divyegala had seen some other issues. Though this PR is mainly about raft-enabled FAISS GPU indices.

tarang-jain and others added 30 commits June 30, 2023 17:03
@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tarang for the updates, please find a few smaller comments below.

Regarding multi-threaded benchmarks, I am not sure how far we want to go to fix that:

  • The current PR works in single thread mode.
  • Multi threaded version is serialized on the same GPU stream because GpuResource is shared.
  • A proper solution might need a custom GpuResource object which shares all the resources that we want to share, but allows different stream for each resource. This is specific for our benchmark, so we would need to add such object to the benchmark codebase.
  • Having separate GpuResource objects for each thread could also work, but we need to prevent excessive TempMem allocations.

Our multi-threaded benchmark mode is not the best fit FAISS indeces, and I am fine with either of these options.

@@ -208,7 +234,7 @@ void FaissGpu<T>::build(const T* dataset, size_t nrow, cudaStream_t stream)
nlist_,
index_ivf->cp.min_points_per_centroid);
}
index_ivf->cp.max_points_per_centroid = max_ppc;
index_ivf->cp.max_points_per_centroid = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this back to max_ppc?

useFloat16: [False]
useRaft: [True]
search:
nprobe: [2048]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nprobe: [2048]
nprobe: [50]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. The StandardGpuResources object is indeed not thread safe according to the docs here.
FAISS multi-threaded docs say that a separate GPU resource must be created for each thread. As such, sharing the same resource among separate threads, as we do in raft-ann-bench would not work. And creating a separate resource for each thread has the memory constraints as you mentioned. I can come to the conclusion that at this time we cannot support multi-threaded benchmarks for FAISS indices. I'll revert my comments referencing the FAISS issues and add a note explaining why we cannot support multi-threaded search.
I have confirmed that single-threaded benchmarks work in both -- latency and throughput mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tarang-jain and @tfeher for investigating this capability. Given this limitation is in FAISS and not RAFT/cuVS, and I suspect this might not be unique to just FAISS, I wonder if it would be worth adding an accessor on the indivudal raft-ann-bench index types to specify whether or not the index type can be used with the throughput mode (specifically with threads>1). This way, instead of a user encountering unexpected behavior, they can at least get an error that states the index type is not supported.

rmm::mr::cuda_memory_resource cuda_mr;
// Construct a resource that uses a coalescing best-fit pool allocator
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> pool_mr{&cuda_mr};
rmm::mr::set_current_device_resource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes setDefaultStream will need to update RAFT handle's default stream. But you would need to make sure that the gpu_resource_ is not a shared object anymore, because then everyone is trying to update the stream for the same object.

In practice we use the GpuResource object as a wrapper around stream. But GpuResource contains other resources like the Temporary memory allocator, which might grab 1.5 GiB per GpuResource object. That can be circumvented by setting TempMem size to 0.

tarang-jain and others added 3 commits May 14, 2024 11:41
@tarang-jain
Copy link
Contributor Author

/ok to test

@tarang-jain
Copy link
Contributor Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants