Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for SpriteBatch when running non VertexArray VertexDataModes. (GL30 default) #7346

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Tom-Ski
Copy link
Member

@Tom-Ski Tom-Ski commented Feb 21, 2024

In flush of SpriteBatch, the index buffer gets a limit for the current size of the batch. It gets the indices buffer reference whilst marking the index buffer as dirty, so on bind of the mesh, the indices are uploaded again. Since SpriteBatch indices are static and never change, this change pre-uploads all the indices to the IndexBuffer at creation time, and skips any further updates at render.

We don't have a nice way to do this without making a bunch of new methods specifically for this case, so I'm just binding and unbinding the mesh to trigger the upload once before a render occurs.

Testers welcome! Seems to be no regressions for me in the test suite, but would be good if otherr can take a look on their systems and devices

Spritebatch defaults timings from SpriteBatchPerformanceTest

before change
gl20 0.225ms
gl30 0.280ms


after change
gl20 0.225ms
gl30 0.201ms 

Related issue
#7345

Tom-Ski and others added 2 commits February 21, 2024 20:53
… of the batch to prevent doing this each frame. Increase the performance of default SpriteBatch in gl30 where this performs worse than gl2 vertex array
@obigu obigu added this to the 1.12.2 milestone Feb 22, 2024
spriteBatch.begin();
stringBuilder.setLength(0);
stringBuilder.append("Mean Time ms: ");
stringBuilder.append(counter.getMean() / 1e6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a conditional that shows "Please Wait..." on the text dependign if counter.hasEnoughData() returns true or not as it may take a while (over 5s on a Samsung S7) to populate depending on the device you run the test and I thought the test was not working.


// fill the batch
for (int i = 0; i < 8190; i++) {
spriteBatch.draw(texture, 0, 0, 1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing but maybe instead of drawing a single point we can change it to something like 20px * 20px just for the user to see something is happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obigu i think it would bias the test since it will overload fragment shader stage, considering there are about 819.000 drawings, it would have to render 20x20 pixels each, so ~327 Mega pixels which could be a lot compared to the actual amount (less than 1 Mega pixels). In this end it would defeat isolation of this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to isolate a bit more, we could disable blending and use depth test (never pass) in order to fully bypass fragment stage.

Copy link
Contributor

@obigu obigu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this running the libGDX test suite on Android both GL20 and GL30 and on iOS. I've found no issues or differences between previous behaviour and with this change and for this reason I approve (the suggested changes would be nice to have but not required). I can't review from a functionality/correctness point of view even if they make sense to me.

EDIT: Sharing the results of the performance tests just for reference. On GL20 no changes observed.
On Android Samsung 7 device GL20:

  • Mean Without changes: 1.7s
  • Mean After changes: 1.7s

On Android Samsung 7 device GL30:

  • Mean Without changes: 1.2s
  • Mean After changes: 0.6s

@NathanSweet
Copy link
Member

I ran Spine with GL30 with and without these changes, all seems fine. I didn't measure performance differences.

I'm good with merging but I didn't press the button in case obigu's proposed changes will be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants