Poor performance due to single thread and mutexes #1803

dzvancuks · 2021-05-03T08:32:54Z

After some profiling it seems that these 2 functions impacts performance when pushing many operations:

func (p *dispatcher) PushData blocks on mutex
func (s *Scheduler) consumeTransactions() waits long time on queue and then long processTransaction.

Problem is that VPP agent process operations in single thread with mutex/queue blocks. Instead it should utilize cheap Go routines and sync.Pool memory.

To collect trace I used pprof tool in main() function and pushed 30000 Update commands via gRPC connection.

curl --output trace.prof http://localhost:8080/debug/pprof/trace?seconds=30
go tool trace trace.prof

Trace will show underutilized CPU and long wait time on mentioned functions

The text was updated successfully, but these errors were encountered:

dzvancuks · 2021-05-03T13:06:32Z

Few other findings:

you can increase a bit performance by buffering gRPC, but it is limited to trancaction channel

vpp-agent/plugins/kvscheduler/plugin_scheduler.go

Line 182 in bd7edfd

s.txnQueue = make(chan *transaction, 100)
Memory allocation is done for each transaciton object. Increasing throughput increases Garbage collection rate that halts execution. To avoid it kvscheduler must use static memory, i.e. sync.Pool as mentioned before.

Here is profiling for buffered gRPC call. CPU is still underutilized and GC is triggered very often.

ondrej-fabry · 2021-05-06T13:42:25Z

Thank you for your descriptive write-up of the issue. You are on point with inefficient processing of transactions by the scheduler. Personally, I am migrating to a new project that plans to use Ligato more extensively, so I might actually do some optimizations in this area (I planned to optimize allocation of strings in the kvscheduler).
However, if you think you could open some PRs at least for the low hanging fruit that could improve performance here, you are welcome to do so.

dzvancuks · 2021-08-13T12:44:59Z

I've started code analysis to increase performance in separate fork: https://github.com/1NCE-GmbH/vpp-agent. I'll explain some pre-investigation and would like to discuss VPP-agent implementation details on how to better optimize its work flow.

First of all here are traces and profiler results: vpp-agent-profile.zip
You can open them by executing following commands:

go tool trace trace.prof
go tool pprof -http=localhost:8080 cpu.prof
go tool pprof -http=localhost:8080 mutex.prof

During analysis it became clear that KV Scheduler is single point of throttling. Launching 30000 operations via gRPC channel this one entity blocks all other threads while performing consumeTransactions() -> processTransaction() calls.

Multi-threaded execution effectively becomes single-threaded operation of VPP-agent

All other gRPC calls are waiting on one single lock. But removing this particular mutex will not solve issue as these routines would park on channel

So main target for optimization is KV Scheduler. I'll try to refactor this module to utilize as much CPU time as possible. But I'd like to consult with project owners and mantainers to make this process more robust.

My questions to maintainers of VPP-agent:

Can VPP-agent operate with several Scheduler instances? What are downsides of this approach? (Graph synchronization? counters?)
Can Scheduler operate without txnLock? Would internal Graph mutex be enough to sync data when I call graph.Write()/Read()?
Can processTransaction() function's sub-steps be executed in go-routines?
Will any of these approaches violate NB/SB functionality?
Could you recommend what to keep an eye on? What Scheduler related module might break on changes? (my guess it is only Graph)
Can I get rid of TransactionBarrier in dispatcher.PushData() and completely remove Scheduler.txnLock? txn.Commit already has sync capabilities using txnData.nb.resultChan
Is it OK to set defaultPrintTxnSummary = false? Printing lock also affects performance

CC: @ondrej-fabry, @milanlenco

dzvancuks · 2021-08-13T12:47:51Z

After solving KV Scheduler issue I'll create followup issues regarding GC and memory usage, mutexes, and txnQueue size. There is a possibility that some of them will be solved together with Scheduler problems. So let's postpone new issue creation.

…o#1803' of https://github.com/1NCE-GmbH/vpp-agent into Poor-performance-due-to-single-thread-and-mutexes-ligato#1803

dzvancuks · 2021-08-20T14:30:48Z

Started experimenting with multithread approach.

it is possible to get rid from txnLock completely. This is possible by introducing RWlock in Registry and by adding Mutex for valStateWatchers
It is possible to use go routines for processTransaction(). Graph's RWlock organizes multiple routine functionaliny. However ...
... Graph's write lock reduce all multithreading down to linear execution as it was before. On step 5. Execution the graphW := s.graph.Write(true, record) locks other threads, which gives no benefit in multithreading processTransaction().

To overcome this lock the Graph must be refactored. It can be done by moving mutex locking inside API methods. It would also require to decouple Node from Graph. Currently Nodes directly access Graph during Targets re-configuration. The Target struct might contain direct pointer to other nodes, or Graph itself should be responsible for Targets. Investigation is ongoing.

Any comment from maintainers would help
CC: @ondrej-fabry, @milanlenco

ondrej-fabry · 2021-08-22T15:24:59Z

Just want to note that Milan is no longer working with me as he started working for other company and most likely will not reply here. However before he left he was in the progress of preparing refactor of KVScheduler. You could look into it to get some idea of the planned changes. There are currently no plans to work on this at the moment, but possibly with your help we could make these ready.

here is the most recent one which was supposed to introduce new descriptor api v2 that replaces key parameter with more flexible context:
master...milanlenco:kvDescriptorV2

and here is little bit older planned change which is more related to your discoveries and was supposed to actually add support for parallel processing of transactions:
master...milanlenco:exec_engine

dzvancuks · 2021-08-24T07:47:48Z

@ondrej-fabry , thank you for response. I studied provided branches.

master...milanlenco:kvDescriptorV2

This one optimizes GC. I'm thinking of splitting current issue into 2 steps. One is for parallel tnx processing, other for GC and memory management. In scope of this issue I'll focus on Scheduler tnx handling and after that I'll create a followup for memory.

master...milanlenco:exec_engine

This one does relate to performance issue. However it doesn't solve root cause of exclusive Graph write locking. I'll continue investigation for Graph handling refactoring.

To refactor Graph the following steps must be performed:

Decouple Node from Graph. Node has direct access to Graph on target edge reconstruction
Refactor Target and edge construction. As an idea, just use direct pointer of neighboring node. Remove everything related to RelationTargetDef.
Use simple atomic locks only when entity function is called. E.g., graphW.SetNode() will lock and immediately releases graph lock after function end; node.GetValue() locks and defers internal mutex unlock.
Remove all exclusive locks. Functions like processTransaction() and executeTransaction() should not have exclusive access while doing operations.

dzvancuks · 2021-09-06T08:46:37Z

Small update.
Our team has decided to implement own VPP transaction handler using GoVPP. In our project we require to handle many small transactions, instead of single complex tnx with many dependencies. We had no need in recursive graph features that VPP-agent provides, but spending time on it's refactoring would require much more effort.

@ondrej-fabry thanks for feedback on my attempts to improve Scheduler. I hope next revisions of VPP-agend would become more performant. If you would like to continue this work then I encourage you to start with refactoring the Graph module.

ondrej-fabry added the 🚀 optimization label May 6, 2021

dzvancuks pushed a commit to 1NCE-GmbH/vpp-agent that referenced this issue Aug 19, 2021

Merge branch 'Poor-performance-due-to-single-thread-and-mutexes-ligat…

7e191f4

…o#1803' of https://github.com/1NCE-GmbH/vpp-agent into Poor-performance-due-to-single-thread-and-mutexes-ligato#1803

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance due to single thread and mutexes #1803

Poor performance due to single thread and mutexes #1803

dzvancuks commented May 3, 2021 •

edited

dzvancuks commented May 3, 2021 •

edited

ondrej-fabry commented May 6, 2021

dzvancuks commented Aug 13, 2021 •

edited

dzvancuks commented Aug 13, 2021

dzvancuks commented Aug 20, 2021 •

edited

ondrej-fabry commented Aug 22, 2021

dzvancuks commented Aug 24, 2021 •

edited

dzvancuks commented Sep 6, 2021

Poor performance due to single thread and mutexes #1803

Poor performance due to single thread and mutexes #1803

Comments

dzvancuks commented May 3, 2021 • edited

dzvancuks commented May 3, 2021 • edited

ondrej-fabry commented May 6, 2021

dzvancuks commented Aug 13, 2021 • edited

dzvancuks commented Aug 13, 2021

dzvancuks commented Aug 20, 2021 • edited

ondrej-fabry commented Aug 22, 2021

dzvancuks commented Aug 24, 2021 • edited

dzvancuks commented Sep 6, 2021

dzvancuks commented May 3, 2021 •

edited

dzvancuks commented May 3, 2021 •

edited

dzvancuks commented Aug 13, 2021 •

edited

dzvancuks commented Aug 20, 2021 •

edited

dzvancuks commented Aug 24, 2021 •

edited