Bulk indexer is inefficient and slow #528

schorsch · 2022-09-10T08:29:20Z

While it may be a nice idea for some codebases to use structs with json conversion, it is not the best idea when adding a couple of 100.000 entries.
In my case, to give you a rough idea, the import times for ~350.000 entries(from csv files) with ~10-20 short fields and no extensive analyzer applied, into 5 different indices takes >7m, on a recent Linux Dell Precision 5550. I'll spare you the bulk size, flush and other details because they dont change much for the basic problem.

I used oliveres elastic client, which is structured internally to also accept simple strings for bulk-requests see the Source method call here. With this in place i was able to build the bulk request body rows with some fmt.Sprintf() calls and preventing struct initialization and with it cpu & memory allocation. With such handling the import took ~6 secs. Before that i used oliveres method with struct handling and it was ~ 12 secs.
Yes, building a string which is valid JSON may be inkonvenient in some situations, but a decent programmer should be able to achieve this. And we are talking about bulk inserts which are probably used in situation where you need speed. When i see the code in your bulk_indexer worker run and dive into the meta-line i am pretty stunned of how difficult it can be to create something wrapped by braces and joined with commas.

To wrap it up, your bulk index is pretty much useless unless you are inserting only 100 entries or you opt into throwing hardware at the problem(which will probably NOT work). I am not an ultra experienced go dev, so i may have missed some low level methods in here. My advice would be to look at oliveres code-structures and adopt some of his bulk-ideas, also because your open bulk related tickets dont look promising. Unfortunately he decided to not continue his lib, so one may need to fork it and make it work with ES 8+

Anaethelion · 2022-09-12T10:01:42Z

Hi @schorsch

The performance you describe is definitively not normal.
Could you share a bit of code on how you used the bulk indexer ?

You mention building the body rows with some fmt.Sprintf, you definitively can do that with the bulk indexer.

schorsch · 2022-09-13T12:40:54Z

We are about to implement some benchmarks and dive into profiling, as such is always a good thing to learn. I'll share some examples or maybe Dont's as soon as we have identified possible bottlenecks.
Thanks for your hint to the example, i looked it up and we already used it this way.

Anaethelion · 2022-09-15T16:57:31Z

Closing this issue.

I'll be happy to reopen if you come back with code and/or examples !

schorsch · 2022-09-19T07:03:51Z

We did evaluate our code, and this makes me officially and idiot now!

While reviewing the code, which lead to this ticket, i oversaw that the FlushInterval was set to 1 second, which made the loop take forever.

We bench marked your lib against olivers and guess what .. it outperformed it. The cpu profiling showed ~28% less memory allocations and ran **~ 25% faster**

no more word than, sorry for wasting your time!!!

Anaethelion closed this as completed Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk indexer is inefficient and slow #528

Bulk indexer is inefficient and slow #528

schorsch commented Sep 10, 2022

Anaethelion commented Sep 12, 2022

schorsch commented Sep 13, 2022

Anaethelion commented Sep 15, 2022

schorsch commented Sep 19, 2022

Bulk indexer is inefficient and slow #528

Bulk indexer is inefficient and slow #528

Comments

schorsch commented Sep 10, 2022

Anaethelion commented Sep 12, 2022

schorsch commented Sep 13, 2022

Anaethelion commented Sep 15, 2022

schorsch commented Sep 19, 2022