Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add randomElement #124

Open
Lasering opened this issue Jul 5, 2022 · 9 comments
Open

Add randomElement #124

Lasering opened this issue Jul 5, 2022 · 9 comments

Comments

@Lasering
Copy link

Lasering commented Jul 5, 2022

Hey,
I propose the method randomElement to be added to Iterable. This method returns a random element from the collection.

On Iterable it can be (probably naively) implemented with:

def randomElement: A = {
  val n = util.Random.nextInt(size)
  iterator.drop(n).next
}

On Seq:

def randomElement: A = {
  val n = util.Random.nextInt(size)
  apply(n)
}

A version receiving the Random/seed could also be added:

def randomElement(random: Random): A = ???
def randomElement(seed: Long): A = ???

Or maybe the Random/seed could be an Option (this way there would no overloads):

def randomElement(random: Option[Random] = None): A = ???

I'm certainly not the right person to know which approach is better.

@Ichoran
Copy link

Ichoran commented Jul 5, 2022

This seems like the kind of thing that should go into a random number library, not a collections library. Seems awfully specialized. What if you want to choose k of n things? What if you want k things with replacement?

This seems to me just one of a variety of random sampling tasks that are useful. (Note that "shuffle" is also in this class, and that is provided in the random number library.)

@Lasering
Copy link
Author

Lasering commented Jul 5, 2022

Given my comments about a version receiving the random/seed I understand your comments. I shouldn't have suggested that. However the simple version is very handy to have, at least these languages think so:

  • Ruby - the method is called sample.
  • Swift - is called randomElement and is available for collections
  • Rust - its called choose

@Lasering
Copy link
Author

Lasering commented Jul 6, 2022

Adding the method to Random would already be an improvement, although the ergonomics wouldn't be as good as adding it to Iterable.

@julian-a-avar-c
Copy link

Adding Python to languages that have this feature in their standard library.

@BalmungSan
Copy link
Contributor

@Ichoran while I agree this should belong to Random rather than the collections framework, especially considering the need for a seed.
While shuffle is general enough to represent this behavior, it feels unnecessarily expensive if you either just want a single element or want to keep getting multiple random elements when repetition is allowed.
Having said that, I guess the signature must either be Option[A] or throw if the collection is empty.

@Ichoran
Copy link

Ichoran commented Mar 31, 2024

@BalmungSan - I didn't mention shuffle because I thought you should implement selection with shuffle! I just mentioned it because it indicates that Random is a natural home for methods like these.

It's fine to add Random.sample. I was just suggesting it not be quite so specific. For instance, r sample xs might pick one element of xs, but r.sample(xs, 5) might pick five, and r.sample(xs, 5, replace = true) might pick five that could repeat.

@BalmungSan
Copy link
Contributor

@Ichoran ah fair and apologies for the misunderstanding.
Yeah, that API feels great, the only missing thing would be the semantics of what happens if the n (the number of requested elements) is bigger than m (the size of the collection). Also, if the 1 case should be an overload with a different return type or just the default for m and get a collection of size 1.

BTW, my point was also that AFAIK this repo was not limited to only the collections framework, but the stdlib in general. And with plans for unfreezing the stdlib for future versions of Scala 3 I guess is great to re-activate the discussions here.

@Ichoran
Copy link

Ichoran commented Apr 1, 2024

Yeah, rather than using a default argument, I'd recommend sample(coll): A but sample(coll, n): Coll[A] would make sense. In the spirit of safe operations by default, if you ask for m > n items, I'd say you would write m = h + k * n where k >= 1 and h < n, and return k shuffles of the original collection plus h randomly selected items. That is, you use everything at random; if you run out, everything is available again and you keep going.

I'll add something like this to my kse3 library, and hopefully will report back if I run into any problems.

@Ichoran
Copy link

Ichoran commented Apr 15, 2024

@BalmungSan - I implemented something for arrays which generalizes fine to IndexedSeq. Then for things that aren't arrays, I implemented Algorithm L which doesn't even need to know how big the target is, but has a somewhat high constant cost (due to exponents/logs being used every step).

It is live in kse3 maths 0.2.13, and I decorated the collections themselves with sample (in addition to putting it in the random number generator I wrote). No major hitches save for the usual ones when you want extension methods on collections (extension methods don't always understand the inheritance hierarchy).

Here's a runnable example (using the variants that take a given random number source): https://scastie.scala-lang.org/dS0La51UQxm9Gu3nRzWjAA

Anyway, no significant problems!

I ended up using rng.sample(n)(coll) for argument order for the method on the random number generator, but coll.sample(n)(rng) for the order on collection extension method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants