Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary search for IList<T> and IReadOnlyList<T> #15804

Open
hickford opened this issue Nov 27, 2015 · 23 comments
Open

Binary search for IList<T> and IReadOnlyList<T> #15804

hickford opened this issue Nov 27, 2015 · 23 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@hickford
Copy link
Contributor

arguably the most important algorithm of all: binary search

We have List.BinarySearch and Array.Search for searching sorted lists and arrays. But what about other collections that implement IList<T> or IReadOnlyList<T>? It would be useful to have similar functions (extension methods) to search them. Binary search is a fundamental algorithm, so it's helpful to have a fast reliable well-documented implementation in the standard library. Otherwise developers waste time reinventing the wheel.

A workaround is calling .ToList() or .ToArray() and then searching the new object, but that's not ideal because it copies the collection.

@benaadams
Copy link
Member

Suggestion, inherit to a new interface and implement that.

ISortedList<T> : IList<T>
{
    int BinarySearch(T item)
    ...
}

@svick
Copy link
Contributor

svick commented Nov 28, 2015

@benaadams What's the point? Shouldn't binary search be the same for each implementation of IList<T>?

@benaadams
Copy link
Member

Not if someone has implemented the IList<T> interface on their class, but not the BinarySearch function then you've just broken their code.

Adding a new interface that derives from IList<T> and adding it to, for example List<T> and Array means its a compatible addition that doesn't break any upstream code.

@svick
Copy link
Contributor

svick commented Nov 29, 2015

@benaadams I was assuming it would be an extension method. That way, it's easy to use and you don't need to reimplement it for every collection.

@benaadams
Copy link
Member

@svick ah, yeah, that could work :D

@benaadams
Copy link
Member

Probably would want a matching Sort function to get it in right state?

@CodesInChaos
Copy link

Perhaps it could be useful to add a collection independent helper methods:

Perhaps

int BinarySearch(int start, int length, Func<int, int> compare)

where compare maps an index to a comparison result.

Or perhaps

int BinarySearch<T>(Func<int, T> getElementAt, int index, int length, T value, IComparison<T> comparison)

@gsfreema
Copy link
Contributor

gsfreema commented Dec 2, 2015

I do like the idea of having extension methods for sorting and doing a binary search through the IList interface. All you really need for this generic algorithm is random access to the elements, which is what you get through the IList interface. One interesting challenge I see from just briefly looking at the code is the need for duplicate code.

Ideally, you would only need a single implementation of an algorithm like binary search for collections that provide random access. The problem is that System.Array and System.Collections.Generic.List live in mscorlib so their binary search implementation would also need to exist within that assembly since I don't believe mscorlib can reference another assembly. With that in mind, in order to provide extension methods that use the same binary search implementation as Array and List, those extension methods would need to live in mscorlib or some other generic method would need to be publicly exposed in mscorlib. This would be very different from the extension methods for IEnumerable that live in the System.Linq assembly.

If someone has a good idea so that Array and List can share the same generic implementation as an extension method for IList, I'd be interested in hearing it. Otherwise, I wonder if people would just want to bite the bullet and duplicate the code in some other assembly.

@ianhays
Copy link
Contributor

ianhays commented Dec 17, 2015

If someone has a good idea so that Array and List can share the same generic implementation as an extension method for IList, I'd be interested in hearing it. Otherwise, I wonder if people would just want to bite the bullet and duplicate the code in some other assembly.

List< T > uses the Array.BinarySearch implementation on it's internal array. There isn't a way to have the IList extension method do the same since we don't have access to any underlying array, just the item getters. Therefore, it would need to be distinct (i.e. duplicated) from the implementation in Array.BinarySearch regardless of where it lives.

Otherwise, I wonder if people would just want to bite the bullet and duplicate the code in some other assembly.

I don't think it should go in mscorlib, so the next logical place would be in the System.Collections partial facade in CoreFX.

Probably would want a matching Sort function to get it in right state?

Imo this is an overdue addition. There are a ton of people who want to perform sorting on an IList but their options are limited to:

  • duplicate the structure e.g. by using Linq's OrderBy.
  • write their own extension method to IList

@ankitbko
Copy link
Member

I did the implementation of BinarySearch for IList<T> which basically mimics binary search in Array.cs. Points to note here are

  1. TrySZBinarySearch is not available so we are missing the faster C++ search. Any thoughts on this?
  2. I assume we would also want same capability on IList. IList<T> does not implement IList and I was not able to get a good way to convert between them. The only option I could get is to create a wrapper class to convert between them. For now I used Func as in below snippet. Will the Func work or should we go with some kind of wrapper? I hope using Func does not have any performance impact.
  3. Also where would this go? I was planning to keep this in IListExtensions.cs in System.Collections in corefx.
private static int BinarySearch(int index, int count, Func<int, int> compare)
{
   ...
   while (low <= hi)
   {
    int i = GetMedian(low, hi);
    int c;
    try
    {
        c = compare(i);
   ...  //Full Binary Search Logic
}

public static int BinarySearch<T>(this IList<T> list, int index, int count, T item, IComparer<T> comparer)
{
  ... //Validation
  return BinarySearch(index, count, i => comparer.Compare(list[i], item));
}

public static int BinarySearch(this IList list, int index, int count, object item, IComparer comparer)
{
  ... //Validation
  return BinarySearch(index, count, i => comparer.Compare(list[i], item));
}

To maintain consistency I was planning to mimic/reuse same test cases which are used to test List BinarySearch. I see there are two versions List_List_BinarySearchOv1 and List_List_BinarySearchOv2. What are the differences between them?

@ianhays
Copy link
Contributor

ianhays commented Jan 20, 2016

  1. TrySZBinarySearch is not available so we are missing the faster C++ search. Any thoughts on this?

This is unavoidable. TrySZBinarySearch requires an array to access but from the IList interface all we have is getters - we do not have any underlying array to pass a reference to.

  1. I assume we would also want same capability on IList. IList does not implement IList and I was not able to get a good way to convert between them. The only option I could get is to create a wrapper class to convert between them.

Why do you want to convert between them? Since there's no guarantee that a non-generic IList will contain all elements of type T, the two may not be used interchangeably.

For now I used Func as in below snippet. Will the Func work or should we go with some kind of wrapper? I hope using Func does not have any performance impact.

I'm not too well-versed in how delegates get compiled, but I'd imagine there is some overhead since they're subject to change during execution and thus cannot be inlined. BinarySearch only takes like 10 lines of code so imo a little bit of duplication is preferable to adding unnecessary overhead/complication via the use of delegates to wrap the comparers.

Also where would this go? I was planning to keep this in IListExtensions.cs in System.Collections in corefx.

Seems reasonable.

  • IList{T}: corefx\src\System.Collections\src\System\Collections\Generic\IListExtensions.cs
  • IList: corefx\src\System.Collections.NonGeneric\src\System\Collections\IListExtensions.cs

To maintain consistency I was planning to mimic/reuse same test cases which are used to test List BinarySearch. I see there are two versions List_List_BinarySearchOv1 and List_List_BinarySearchOv2. What are the differences between them?

Those tests are ported to xunit from an old framework that was last used years ago. I do not know the difference, nor would I suggest you spend time figuring it out and replicating them as they will be removed in #14117.

@ankitbko
Copy link
Member

I did the implementation for binary search however I am unable to write test cases. The extension method just doesn't show up in System.Collections.Tests project. Digging further I found that the definition of IList<T> in System.Collections is at mscorlib

#region Assembly mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e
// D:\net\corefx-master\packages\Microsoft.TargetingPack.Private.CoreCLR\1.0.0-rc2-23520\ref\dnxcore50\mscorlib.dll
#endregion

and in System.Collections.Tests it is at System.Runtime

#region Assembly System.Runtime, Version=4.0.20.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
// D:\net\corefx-master\packages\System.Runtime\4.0.20\ref\dotnet\System.Runtime.dll
#endregion

Since both the definitions are not same, the implemented extension methods are not available in test projects. What is the workaround?

@ankitbko
Copy link
Member

I have intentionally not added sort logic. It seems there are multiple open issues regarding sort such as #15803 and #14800 . Isn't it better we resolve them first before duplicating the code again?
I really like idea of abstracting the sort functionality out and make the underlying algorithm interchangeable.

@weitzhandler
Copy link
Contributor

Ditto.

@jnm2
Copy link
Contributor

jnm2 commented Jul 25, 2017

I'm partial to what I've been using:

public enum BinarySearchType
{
    /// <summary>
    /// The standard BCL binary search. Returns the index of the first equal item found,
    /// or if there are no equal items, the two's complement of index where it would be inserted.
    /// </summary>
    Fast = 0,
    /// <summary>
    /// Return the index of the first equal item, or if there are no equal items, the index where it would be inserted.
    /// </summary>
    PrependIndex,
    /// <summary>
    /// Return the index following the last equal item, or if there are no equal items, the index where it would be inserted.
    /// </summary>
    AppendIndex
}

public static class BinarySearchExtensions
{
    private interface IInlineComparer<T>
    {
        int Compare(T value);
    }

    private static int BinarySearchInline<T, TCalculator>(this IReadOnlyList<T> sortedList, TCalculator calculator, BinarySearchType searchType)
        where TCalculator : IInlineComparer<T>
    {
        if (calculator == null) throw new ArgumentNullException(nameof(calculator));

        var high = sortedList.Count - 1;
        var low = 0;

        switch (searchType)
        {
            case BinarySearchType.Fast:
                while (low <= high)
                {
                    var mid = (high + low) >> 1;
                    var comparison = calculator.Compare(sortedList[mid]);
                    if (comparison == 0) return mid;
                    if (comparison < 0)
                        low = mid + 1;
                    else
                        high = mid - 1;
                }

                return ~low;

            case BinarySearchType.AppendIndex:
                while (low <= high)
                {
                    var mid = (high + low) >> 1;
                    var comparison = calculator.Compare(sortedList[mid]);
                    if (comparison <= 0)
                        low = mid + 1;
                    else
                        high = mid - 1;
                }

                return low;

            case BinarySearchType.PrependIndex:
                while (low <= high)
                {
                    var mid = (high + low) >> 1;
                    var comparison = calculator.Compare(sortedList[mid]);
                    if (comparison < 0)
                        low = mid + 1;
                    else
                        high = mid - 1;
                }

                return low;

            default:
                throw new InvalidEnumValueException(searchType);
        }
    }

And then I've got:

    public static int BinarySearch<T, TComparable>(this IReadOnlyList<T> sortedList, TComparable comparableValue, BinarySearchType searchType = BinarySearchType.Fast)
        where T : IComparable<TComparable>
    { /* ... */ }

    public static int BinarySearch<T, TOrder>(this IReadOnlyList<T> sortedList, Func<T, TOrder> orderSelector, TOrder value, IComparer<TOrder> comparer = null, BinarySearchType searchType = BinarySearchType.Fast)
    { /* ... */ }

    public static int BinarySearch<T>(this IReadOnlyList<T> sortedList, T value, IComparer<T> comparer = null, BinarySearchType searchType = BinarySearchType.Fast)
    { /* ... */ }
}

Entire thing: https://gist.github.com/jnm2/9bbdc74c5f2ec835a2af2a88ca8c100d

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@eiriktsarpalis
Copy link
Member

Triage: I would assume that efficient binary search on an IList<T> instance implies that the indexing operation is O(1), however we have classes such as ImmutableSortedSet<T> that do implement the interface, however indexing in that case is O(log n). In that sense I would personally not be in favour of implementing this as an extension method.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label May 6, 2020
@gsfreema
Copy link
Contributor

gsfreema commented May 6, 2020

With the new C#8 feature of providing default implementations of methods on interfaces, would it be worth considering adding a "efficient" default implementation on IList<T> which would allow classes implementing this interface to have the option to override the default implementation if it can be done more efficiently?

@eiriktsarpalis eiriktsarpalis modified the milestones: 5.0.0, Future Jun 24, 2020
@GrabYourPitchforks
Copy link
Member

@eiriktsarpalis This means that a binary search would be O(log n * log n), which doesn't seem like the worst thing in the world? Even so, I think most people would reasonably assume that list indexing is an O(1) operation, which means that ImmutableSortedSet<T> is the odd man out here. I don't think we'd want to punish all consumers (the majority of which are using T[] or List<T>) just because a small fraction of consumers might be using ImmutableSortedSet<T>.

@eiriktsarpalis
Copy link
Member

To me running binary search over a list implies that we care about performance (otherwise one might as well use a dedicated data structure like SortedList). Having an extension method over IList<T> seems to somewhat negate that motivation, personally I would just use Array.BinarySearch. Exposing it as a DIM might be better but it's unlikely we'd ever add DIMs to existing interfaces due to the potential for breaking derived types in user apps.

@theodorzoulias
Copy link
Contributor

theodorzoulias commented Apr 25, 2022

It seems that there is some demand for this API. The following StackOverflow question has 45 upvotes:
How to perform a binary search on IList<T>?

@theodorzoulias
Copy link
Contributor

theodorzoulias commented Apr 26, 2022

API proposal, targeting the IReadOnlyList<T> interface:

namespace System.Collections.Generic
{
    public static class CollectionExtensions
    {
        public static int BinarySearch<T>(this IReadOnlyList<T> list, T item);
        public static int BinarySearch<T>(this IReadOnlyList<T> list, T item, IComparer<T>? comparer);
        public static int BinarySearch<T>(this IReadOnlyList<T> list, int index, int count, T item, IComparer<T>? comparer);
    }
}

The overloads and the arguments are the same with the List<T>.BinarySearch method, as well as the Array.BinarySearch<T> method.

Adding the same extension method to the IList<T> interface results in CS0121 compiler errors for collections that implement both interfaces.

@theodorzoulias
Copy link
Contributor

Actually I am not sure if the IList<T> or the IReadOnlyList<T> is the correct interface to target. My incentive for this proposal was to use it for searching in the Keys of the SortedList<TKey,TValue> collection, which has a type of IList<T>. Currently this collection lacks the binary search functionality (#24022), and adding it to the IList<T> interface would fill this gap.

SortedList<string, string> list = new(StringComparer.OrdinalIgnoreCase);
int index = list.Keys.BinarySearch(key, list.Comparer);

@Ilchert
Copy link

Ilchert commented Mar 20, 2024

I faced with the same issue - binary search over SortedList.Keys.

In .NET 9 IList<T> will implement IReadOnlyList<T> #31001, so maybe it is time to implement binary search for IReadonlyList<T>?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests