Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance due to single thread and mutexes #1803

Open
dzvancuks opened this issue May 3, 2021 · 8 comments
Open

Poor performance due to single thread and mutexes #1803

dzvancuks opened this issue May 3, 2021 · 8 comments

Comments

@dzvancuks
Copy link

dzvancuks commented May 3, 2021

After some profiling it seems that these 2 functions impacts performance when pushing many operations:

  1. func (p *dispatcher) PushData blocks on mutex
  2. func (s *Scheduler) consumeTransactions() waits long time on queue and then long processTransaction.

Problem is that VPP agent process operations in single thread with mutex/queue blocks. Instead it should utilize cheap Go routines and sync.Pool memory.

To collect trace I used pprof tool in main() function and pushed 30000 Update commands via gRPC connection.

curl --output trace.prof http://localhost:8080/debug/pprof/trace?seconds=30
go tool trace trace.prof

Trace will show underutilized CPU and long wait time on mentioned functions

image

@dzvancuks
Copy link
Author

dzvancuks commented May 3, 2021

Few other findings:

  • you can increase a bit performance by buffering gRPC, but it is limited to trancaction channel
    s.txnQueue = make(chan *transaction, 100)
  • Memory allocation is done for each transaciton object. Increasing throughput increases Garbage collection rate that halts execution. To avoid it kvscheduler must use static memory, i.e. sync.Pool as mentioned before.

Here is profiling for buffered gRPC call. CPU is still underutilized and GC is triggered very often.

image

@ondrej-fabry
Copy link
Member

Thank you for your descriptive write-up of the issue. You are on point with inefficient processing of transactions by the scheduler. Personally, I am migrating to a new project that plans to use Ligato more extensively, so I might actually do some optimizations in this area (I planned to optimize allocation of strings in the kvscheduler).
However, if you think you could open some PRs at least for the low hanging fruit that could improve performance here, you are welcome to do so.

@dzvancuks
Copy link
Author

dzvancuks commented Aug 13, 2021

I've started code analysis to increase performance in separate fork: https://github.com/1NCE-GmbH/vpp-agent. I'll explain some pre-investigation and would like to discuss VPP-agent implementation details on how to better optimize its work flow.

First of all here are traces and profiler results: vpp-agent-profile.zip
You can open them by executing following commands:

go tool trace trace.prof
go tool pprof -http=localhost:8080 cpu.prof
go tool pprof -http=localhost:8080 mutex.prof

During analysis it became clear that KV Scheduler is single point of throttling. Launching 30000 operations via gRPC channel this one entity blocks all other threads while performing consumeTransactions() -> processTransaction() calls.
image

Multi-threaded execution effectively becomes single-threaded operation of VPP-agent
image

All other gRPC calls are waiting on one single lock. But removing this particular mutex will not solve issue as these routines would park on channel
image

So main target for optimization is KV Scheduler. I'll try to refactor this module to utilize as much CPU time as possible. But I'd like to consult with project owners and mantainers to make this process more robust.

My questions to maintainers of VPP-agent:

  1. Can VPP-agent operate with several Scheduler instances? What are downsides of this approach? (Graph synchronization? counters?)
  2. Can Scheduler operate without txnLock? Would internal Graph mutex be enough to sync data when I call graph.Write()/Read()?
  3. Can processTransaction() function's sub-steps be executed in go-routines?
  4. Will any of these approaches violate NB/SB functionality?
  5. Could you recommend what to keep an eye on? What Scheduler related module might break on changes? (my guess it is only Graph)
  6. Can I get rid of TransactionBarrier in dispatcher.PushData() and completely remove Scheduler.txnLock? txn.Commit already has sync capabilities using txnData.nb.resultChan
  7. Is it OK to set defaultPrintTxnSummary = false? Printing lock also affects performance

CC: @ondrej-fabry, @milanlenco

@dzvancuks
Copy link
Author

After solving KV Scheduler issue I'll create followup issues regarding GC and memory usage, mutexes, and txnQueue size. There is a possibility that some of them will be solved together with Scheduler problems. So let's postpone new issue creation.

dzvancuks pushed a commit to 1NCE-GmbH/vpp-agent that referenced this issue Aug 19, 2021
@dzvancuks
Copy link
Author

dzvancuks commented Aug 20, 2021

Started experimenting with multithread approach.

  1. it is possible to get rid from txnLock completely. This is possible by introducing RWlock in Registry and by adding Mutex for valStateWatchers
  2. It is possible to use go routines for processTransaction(). Graph's RWlock organizes multiple routine functionaliny. However ...
  3. ... Graph's write lock reduce all multithreading down to linear execution as it was before. On step 5. Execution the graphW := s.graph.Write(true, record) locks other threads, which gives no benefit in multithreading processTransaction().

image

To overcome this lock the Graph must be refactored. It can be done by moving mutex locking inside API methods. It would also require to decouple Node from Graph. Currently Nodes directly access Graph during Targets re-configuration. The Target struct might contain direct pointer to other nodes, or Graph itself should be responsible for Targets. Investigation is ongoing.

Any comment from maintainers would help
CC: @ondrej-fabry, @milanlenco

@ondrej-fabry
Copy link
Member

Just want to note that Milan is no longer working with me as he started working for other company and most likely will not reply here. However before he left he was in the progress of preparing refactor of KVScheduler. You could look into it to get some idea of the planned changes. There are currently no plans to work on this at the moment, but possibly with your help we could make these ready.

here is the most recent one which was supposed to introduce new descriptor api v2 that replaces key parameter with more flexible context:
master...milanlenco:kvDescriptorV2

and here is little bit older planned change which is more related to your discoveries and was supposed to actually add support for parallel processing of transactions:
master...milanlenco:exec_engine

@dzvancuks
Copy link
Author

dzvancuks commented Aug 24, 2021

@ondrej-fabry , thank you for response. I studied provided branches.

master...milanlenco:kvDescriptorV2

This one optimizes GC. I'm thinking of splitting current issue into 2 steps. One is for parallel tnx processing, other for GC and memory management. In scope of this issue I'll focus on Scheduler tnx handling and after that I'll create a followup for memory.

master...milanlenco:exec_engine

This one does relate to performance issue. However it doesn't solve root cause of exclusive Graph write locking. I'll continue investigation for Graph handling refactoring.

To refactor Graph the following steps must be performed:

  1. Decouple Node from Graph. Node has direct access to Graph on target edge reconstruction
  2. Refactor Target and edge construction. As an idea, just use direct pointer of neighboring node. Remove everything related to RelationTargetDef.
  3. Use simple atomic locks only when entity function is called. E.g., graphW.SetNode() will lock and immediately releases graph lock after function end; node.GetValue() locks and defers internal mutex unlock.
  4. Remove all exclusive locks. Functions like processTransaction() and executeTransaction() should not have exclusive access while doing operations.

@dzvancuks
Copy link
Author

Small update.
Our team has decided to implement own VPP transaction handler using GoVPP. In our project we require to handle many small transactions, instead of single complex tnx with many dependencies. We had no need in recursive graph features that VPP-agent provides, but spending time on it's refactoring would require much more effort.

@ondrej-fabry thanks for feedback on my attempts to improve Scheduler. I hope next revisions of VPP-agend would become more performant. If you would like to continue this work then I encourage you to start with refactoring the Graph module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants