-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FAISS with RAFT enabled Benchmarking to raft-ann-bench #2026
base: branch-24.06
Are you sure you want to change the base?
Conversation
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tarang for the updates, please find a few smaller comments below.
Regarding multi-threaded benchmarks, I am not sure how far we want to go to fix that:
- The current PR works in single thread mode.
- Multi threaded version is serialized on the same GPU stream because
GpuResource
is shared. - A proper solution might need a custom
GpuResource
object which shares all the resources that we want to share, but allows different stream for each resource. This is specific for our benchmark, so we would need to add such object to the benchmark codebase. - Having separate
GpuResource
objects for each thread could also work, but we need to prevent excessiveTempMem
allocations.
Our multi-threaded benchmark mode is not the best fit FAISS indeces, and I am fine with either of these options.
@@ -208,7 +234,7 @@ void FaissGpu<T>::build(const T* dataset, size_t nrow, cudaStream_t stream) | |||
nlist_, | |||
index_ivf->cp.min_points_per_centroid); | |||
} | |||
index_ivf->cp.max_points_per_centroid = max_ppc; | |||
index_ivf->cp.max_points_per_centroid = 300; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this back to max_ppc
?
python/raft-ann-bench/src/raft-ann-bench/run/conf/algos/faiss_cpu_ivf_flat.yaml
Outdated
Show resolved
Hide resolved
useFloat16: [False] | ||
useRaft: [True] | ||
search: | ||
nprobe: [2048] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nprobe: [2048] | |
nprobe: [50] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. The StandardGpuResources object is indeed not thread safe according to the docs here.
FAISS multi-threaded docs say that a separate GPU resource must be created for each thread. As such, sharing the same resource among separate threads, as we do in raft-ann-bench
would not work. And creating a separate resource for each thread has the memory constraints as you mentioned. I can come to the conclusion that at this time we cannot support multi-threaded benchmarks for FAISS indices. I'll revert my comments referencing the FAISS issues and add a note explaining why we cannot support multi-threaded search.
I have confirmed that single-threaded benchmarks work in both -- latency and throughput mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tarang-jain and @tfeher for investigating this capability. Given this limitation is in FAISS and not RAFT/cuVS, and I suspect this might not be unique to just FAISS, I wonder if it would be worth adding an accessor on the indivudal raft-ann-bench index types to specify whether or not the index type can be used with the throughput mode (specifically with threads>1). This way, instead of a user encountering unexpected behavior, they can at least get an error that states the index type is not supported.
rmm::mr::cuda_memory_resource cuda_mr; | ||
// Construct a resource that uses a coalescing best-fit pool allocator | ||
rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> pool_mr{&cuda_mr}; | ||
rmm::mr::set_current_device_resource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes setDefaultStream
will need to update RAFT handle's default stream. But you would need to make sure that the gpu_resource_ is not a shared object anymore, because then everyone is trying to update the stream for the same object.
In practice we use the GpuResource
object as a wrapper around stream. But GpuResource
contains other resources like the Temporary memory allocator, which might grab 1.5 GiB per GpuResource
object. That can be circumvented by setting TempMem size to 0.
…cpu_ivf_flat.yaml Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>
/ok to test |
/ok to test |
faiss_gpu_ivf_flat
,faiss_gpu_ivf_pq
get_faiss.cmake
Notes: The
StandardGpuResources
object is part of FAISS' index classes. As a result, there is no way of creating a separate Raft handle for each thread without creating a new instance of the whole index object for a new thread. As such, multi-threaded benchmarking for GPU indices will not work.For CPU indices, @divyegala had seen some other issues. Though this PR is mainly about raft-enabled FAISS GPU indices.