Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve router performance #2651

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rorp
Copy link
Contributor

@rorp rorp commented May 8, 2023

This PR parallelizes route calculations. It creates a bunch of worker actors that perform route calculation in parallel.
As a side effect it prevents the Router's mailbox from clogging with route requests, so it can process network updates on time.

See #2644 for the context.

@rorp rorp marked this pull request as draft May 8, 2023 16:46
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (c37df26) 85.85% compared to head (86ffd1a) 85.92%.
Report is 1 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2651      +/-   ##
==========================================
+ Coverage   85.85%   85.92%   +0.07%     
==========================================
  Files         216      216              
  Lines       18201    18221      +20     
  Branches      749      789      +40     
==========================================
+ Hits        15626    15656      +30     
+ Misses       2575     2565      -10     
Files Coverage Δ
...re/src/main/scala/fr/acinq/eclair/NodeParams.scala 93.41% <100.00%> (+0.02%) ⬆️
...src/main/scala/fr/acinq/eclair/crypto/Sphinx.scala 100.00% <100.00%> (ø)
.../scala/fr/acinq/eclair/payment/PaymentPacket.scala 90.90% <100.00%> (+0.08%) ⬆️
...scala/fr/acinq/eclair/payment/send/Recipient.scala 98.52% <100.00%> (ø)
...cinq/eclair/remote/EclairInternalsSerializer.scala 97.82% <100.00%> (ø)
...a/fr/acinq/eclair/wire/protocol/OnionRouting.scala 80.00% <100.00%> (+13.33%) ⬆️
...a/fr/acinq/eclair/wire/protocol/PaymentOnion.scala 99.27% <100.00%> (-0.02%) ⬇️
...src/main/scala/fr/acinq/eclair/router/Router.scala 94.42% <95.34%> (+0.02%) ⬆️
...cala/fr/acinq/eclair/router/RouteCalculation.scala 94.50% <70.00%> (-0.06%) ⬇️

... and 8 files with indirect coverage changes

@rorp rorp marked this pull request as ready for review June 27, 2023 18:12
Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a useful change, but I dislike the volatile wrapper. We should also probably motivate this change with benchmarks, like any non-trivial change made solely for performance gains.

@@ -71,7 +71,7 @@ class Router(val nodeParams: NodeParams, watcher: typed.ActorRef[ZmqWatcher.Comm

val db: NetworkDb = nodeParams.db.network

{
private val dataHolder: VolatileRouterDataHolder = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overkill and will very likely create hard-to-debug performance issues. We only need two things to be accessible from the worker actors:

  • the excludedChannels: this should be a small list, it's probably fine to copy it
  • the graphWithBalances: this is a large data structure that we definitely don't want to copy, but we can probably make it thread-safe by simply using standard thread-safe collections? It would be useful to benchmark the cost of those changes

@rorp
Copy link
Contributor Author

rorp commented Jan 21, 2024

I think this is a useful change, but I dislike the volatile wrapper.

That’s a simple way to avoid Memory Consistency Errors in JVM. And it does the job, since all updates are made in one thread.

IMHO the real overkill is that Router unnecessarily implements FSM, whereas it’s not a finite state machine at all, there’s only one possible state NORMAL, so there are no state transitions, naturally.

My goal was to make the least invasive change, so I introduced stayUsing() method, which is in fact dangerous, because someone could unknowingly use stay() using ??? instead, and screw up the workers state.

I’d prefer to drop FSM support for Router altogether, and to make it a plain vanilla actor, drop any kind of stay() method calls, and stick to persistent data structures. In this case I would do something like this:

class Router(val nodeParams: NodeParams, watcher: typed.ActorRef[ZmqWatcher.Command], initialized: Option[Promise[Done]] = None) extends Actor {

  private val data = VolatileRouterData()
 
  def receive =case PeerRoutingMessage(peerConnection, remoteNodeId, c: ChannelAnnouncement) =>
      data.update(
 = Validation.handleChannelAnnouncement(data, watcher, RemoteGossip(peerConnection, remoteNodeId), c))

    …

    case r: RouteRequest =>
      worker ! Worker.FindRoutes(data, r, sender())

    …
  }
}

It’s not really Haskell style, but easy to understand, easy to maintain, and there’s absolutely no Java concurrent programming fun.

We should also probably motivate this change with benchmarks, like any non-trivial change made solely for performance gains.

You can find some benchmarks here #2644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants