Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement: Add atomic update on TrieMap #5938

Merged
merged 1 commit into from
Dec 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
41 changes: 41 additions & 0 deletions mtags/src/main/scala/scala/meta/internal/mtags/AtomicTrieMap.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
package scala.meta.internal.mtags

import java.util.concurrent.ConcurrentHashMap

import scala.collection.concurrent.TrieMap

/**
* This class is a wrapper around TrieMap that provides atomic updateWith
*/
final class AtomicTrieMap[K, V] {
private val trieMap = new TrieMap[K, V]()
private val concurrentMap = new ConcurrentHashMap[K, V]

def get(key: K): Option[V] = trieMap.get(key)

def contains(key: K): Boolean = trieMap.contains(key)

def updateWith(key: K)(remappingFunc: Option[V] => Option[V]): Unit = {
val computeFunction = new java.util.function.BiFunction[K, V, V] {
override def apply(k: K, v: V): V = {
trieMap.get(key) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but maybe if inside this function we referenced k instead of key (so input of computeFunction rather than the updateWith method) we could then extract it into a constant defined on the AtomicTrieMap class level so that there's no need to allocate a new one each time an updateWith is called? 🤔

Copy link
Member Author

@jkciesluk jkciesluk Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see its possible, because we need to pass it remappingFunc.
We could change trieMap type to TrieMap[K, Set[V]], since we only use it like this and add a append function, that would look the way you suggested 🤔

Edit: I think it won't work either, since we need to pass the value to append

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're absolutely right, somehow I missed the remappingFunc. 👍

case Some(value) =>
remappingFunc(Some(value)) match {
case Some(newValue) =>
trieMap.update(key, newValue)
case None =>
trieMap.remove(key)
}
case None =>
remappingFunc(None).foreach(trieMap.update(key, _))
}
null.asInstanceOf[V]
}
}
concurrentMap.compute(key, computeFunction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this duplicate the amount of data we hold? Should we just use ConcurrentMap instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of concurrentMap is always 0, its only used to provide atomic updateWith for trieMap.
I don't think we should simply replace TrieMap with ConcurrentMap, because get on TrieMap should be faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of concurrentMap is always 0, its only used to provide atomic updateWith for trieMap.

Can you ELI5 this? :D

get on TrieMap should be faster

Can you share some source about that, benchmark maybe? I did some reading, found https://medium.com/@igabaydulin/java-concurrenthashmap-vs-scala-concurrent-map-e185e8a0b798 which then led me to scala/scala#10027 and I need to read everything once again because I'm already lost. Best part - those are not even about TrieMap 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of concurrentMap is always 0, its only used to provide atomic updateWith for trieMap.

Can you ELI5 this? :D

Sure :D
We need an atomic way to update a TrieMap with a remapping function, but its not provided in its API.
ConcurrentHashMap has the ability to do that, with compute function.
We don't to store any data in ConcurrentHashMap and only use it as a sort of lock on the updated value.
So our compute function atomically calculates new value to be stored, but updates it only in TrieMap - it inserts null to ConcurrentHashMap, which means no mapping.

Can you share some source about that, benchmark maybe? I did some reading, found https://medium.com/@igabaydulin/java-concurrenthashmap-vs-scala-concurrent-map-e185e8a0b798 which then led me to scala/scala#10027 and I need to read everything once again because I'm already lost. Best part - those are not even about TrieMap 😅

To be honest I'm not sure anymore 😅 . I thought using a trie would result in a better performance, but from what I've read it could be similar or slightly worse than ConcurrentHashMap (TrieMap biggest advantage are snapshots, but we are not using them).

}
}

object AtomicTrieMap {
def empty[K, V]: AtomicTrieMap[K, V] = new AtomicTrieMap[K, V]
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ import java.nio.CharBuffer
import java.util.logging.Level
import java.util.logging.Logger

import scala.collection.concurrent.TrieMap
import scala.util.Properties
import scala.util.control.NonFatal

Expand Down Expand Up @@ -35,8 +34,8 @@ final case class SymbolLocation(
* while definitions contains only symbols generated by ScalaMtags.
*/
class SymbolIndexBucket(
toplevels: TrieMap[String, Set[AbsolutePath]],
definitions: TrieMap[String, Set[SymbolLocation]],
toplevels: AtomicTrieMap[String, Set[AbsolutePath]],
definitions: AtomicTrieMap[String, Set[SymbolLocation]],
sourceJars: OpenClassLoader,
toIndexSource: AbsolutePath => AbsolutePath = identity,
mtags: Mtags,
Expand Down Expand Up @@ -84,8 +83,10 @@ class SymbolIndexBucket(
): Unit = {
if (sourceJars.addEntry(jar.toNIO)) {
symbols.foreach { case (sym, path) =>
val acc = toplevels.getOrElse(sym, Set.empty)
toplevels(sym) = acc + path
toplevels.updateWith(sym) {
case Some(acc) => Some(acc + path)
case None => Some(Set(path))
}
}
}
}
Expand All @@ -96,8 +97,10 @@ class SymbolIndexBucket(
): List[String] = {
val symbols = indexSource(source, dialect, sourceDirectory)
symbols.foreach { symbol =>
val acc = toplevels.getOrElse(symbol, Set.empty)
toplevels(symbol) = acc + source
toplevels.updateWith(symbol) {
case Some(acc) => Some(acc + source)
case None => Some(Set(source))
}
}
symbols
}
Expand Down Expand Up @@ -132,8 +135,10 @@ class SymbolIndexBucket(
toplevel: String
): Unit = {
if (source.isAmmoniteScript || !isTrivialToplevelSymbol(path, toplevel)) {
val acc = toplevels.getOrElse(toplevel, Set.empty)
toplevels(toplevel) = acc + source
toplevels.updateWith(toplevel) {
case Some(acc) => Some(acc + source)
case None => Some(Set(source))
}
}
}

Expand Down Expand Up @@ -220,20 +225,20 @@ class SymbolIndexBucket(
.map(_.map(_.path))
.getOrElse(Set.empty)).filter(_.exists)

toplevels.get(symbol.value) match {
case None => ()
toplevels.updateWith(symbol.value) {
case None => None
case Some(acc) =>
val updated = acc.filter(exists(_))
if (updated.isEmpty) toplevels.remove(symbol.value)
else toplevels(symbol.value) = updated
if (updated.isEmpty) None
else Some(updated)
}

definitions.get(symbol.value) match {
case None => ()
definitions.updateWith(symbol.value) {
case None => None
case Some(acc) =>
val updated = acc.filter(loc => exists(loc.path))
if (updated.isEmpty) definitions.remove(symbol.value)
else definitions(symbol.value) = updated
if (updated.isEmpty) None
else Some(updated)
}
}

Expand Down Expand Up @@ -272,8 +277,10 @@ class SymbolIndexBucket(
docs.documents.foreach { document =>
document.occurrences.foreach { occ =>
if (occ.symbol.isGlobal && occ.role.isDefinition) {
val acc = definitions.getOrElse(occ.symbol, Set.empty)
definitions.put(occ.symbol, acc + SymbolLocation(file, occ.range))
definitions.updateWith(occ.symbol) {
case Some(acc) => Some(acc + SymbolLocation(file, occ.range))
case None => Some(Set(SymbolLocation(file, occ.range)))
}
} else {
// do nothing, we only care about global symbol definitions.
}
Expand Down Expand Up @@ -339,8 +346,8 @@ object SymbolIndexBucket {
toIndexSource: AbsolutePath => AbsolutePath
): SymbolIndexBucket =
new SymbolIndexBucket(
TrieMap.empty,
TrieMap.empty,
AtomicTrieMap.empty,
AtomicTrieMap.empty,
new OpenClassLoader,
toIndexSource,
mtags,
Expand Down