Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance regression while using AWS S3 file storage #19412

Closed
Vovcharaa opened this issue Jan 25, 2022 · 30 comments · Fixed by #20006
Closed

Significant performance regression while using AWS S3 file storage #19412

Vovcharaa opened this issue Jan 25, 2022 · 30 comments · Fixed by #20006
Labels
Bug Report/Open Bug report/issue

Comments

@Vovcharaa
Copy link
Contributor

Summary

Downloading, uploading of files and server startup time significantly increased after updating mattermost server to 6.1+ from 6.0.4.

Steps to reproduce

How can we reproduce the issue (what version are you using?)
Currently using 6.0.4 with no issues.
Any attemt of upgrade to 6.1+ (tried 6.1.0 and 6.3.1) leads to completly unusable state of the server.
Same behavior with fresh install and S3 file storage configured on version 6.3.1
Deployed on Ubuntu 20.04 using Docker Image https://hub.docker.com/r/mattermost/mattermost-team-edition .
Database: AWS RDS Postgres 12.7

Expected behavior

No performance regression while using AWS S3 file storage.

Observed behavior (that appears unintentional)

Data below for version 6.3.1:
Server startup time = 8 min (Server startup while downgrading to 6.0.4 = ~10 seconds)

{"timestamp":"2022-01-22 21:18:20.621 Z","level":"info","msg":"Server is initializing...","caller":"app/server.go:237","go_version":"go1.16.7"}
...
{"timestamp":"2022-01-22 21:26:39.060 Z","level":"debug","msg":"Connection to S3 or minio is good. Bucket exists.","caller":"filestore/s3store.go:178"}
{"timestamp":"2022-01-22 21:26:39.063 Z","level":"info","msg":"Starting Server...","caller":"app/server.go:1163"}
{"timestamp":"2022-01-22 21:26:39.067 Z","level":"info","msg":"Server is listening on [::]:8065","caller":"app/server.go:1236","address":"[::]:8065"}

Downloading file = 1-2 min only for downloading to start.
Uploading file = 2-3 min with processing status.
Test Connection Button in File Storage settings = 2 min with success status.

@amyblais
Copy link
Member

@Willyfrog Do you know which team could take a look at this?

(Doesn't have to be SET, I'm just looking for guidance on who would have best knowledge on this. Also, I'm wondering if this is related to our ongoing internal performance investigations, or if this is a different issue).

@Willyfrog
Copy link
Contributor

@Vovcharaa can you share the specs of what you are using? and any special config you might have?

@Vovcharaa
Copy link
Contributor Author

Vovcharaa commented Jan 26, 2022

@Willyfrog
Server deployed on AWS EC2
OS: Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-1027-aws x86_64)
Docker:

sudo docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:42 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker-compose:

docker-compose version 1.27.4, build unknown
docker-py version: 4.4.0
CPython version: 3.8.10
OpenSSL version: OpenSSL 1.1.1f  31 Mar 2020

docker-compose.yml
(For update only image tag is changed)

version: '3.8'
services:
    mattermost:
        image: 'mattermost/mattermost-team-edition:6.0.4'
        restart: unless-stopped
        ports:
         - 8065:8065
        extra_hosts:
         - 'dockerhost:127.0.0.1'
        volumes:
         - ./mattermost/config:/mattermost/config:rw
         - ./mattermost/data:/mattermost/data:rw
         - ./mattermost/logs:/mattermost/logs:rw
         #- ./mattermost/plugins:/mattermost/plugins:rw
         #- ./mattermost/client-plugins:/mattermost/client/plugins:rw
         - /etc/localtime:/etc/localtime:ro
        environment:
         # set same as db credentials and dbname
         # use the credentials you've set above, in the format:
         #- MM_SQLSETTINGS_DATASOURCE=postgres://${MM_USERNAME}:${MM_PASSWORD}@db:5432/${MM_DBNAME}?sslmode=disable&connect_timeout=10
         - MM_SQLSETTINGS_DATASOURCE=postgres://LOGIN:PASSWORD@mattermost.XXX.eu-north-1.rds.amazonaws.com:5432/mattermost?connect_timeout=10
         #sslmode=disable&

Database: AWS RDS PostgreSQL 12.7
Config: config.zip

@Willyfrog
Copy link
Contributor

@amyblais sorry, I forgot to answer your question. I'd say cloud as they probably are more used to work with AWS, second option would be SRE or Server Platform

@amyblais
Copy link
Member

@agnivade @streamer45 Let me know if you have thoughts on this report.

@agnivade
Copy link
Member

agnivade commented Feb 1, 2022

@Vovcharaa - I see that you posted some logs in your root post. Could you post the complete logs that you see from a 6.3.1 server till the "Server is listening on [::]:8065 line?

@Vovcharaa
Copy link
Contributor Author

@agnivade
startup.log

@agnivade
Copy link
Member

agnivade commented Feb 1, 2022

@isacikgoz - Since you are on SET, would you be able to take a look?

@Vovcharaa
Copy link
Contributor Author

Any updates on issue?

@amyblais
Copy link
Member

@agnivade @isacikgoz Would you like me to create a ticket for SET?

@isacikgoz
Copy link
Member

@amyblais makes sense. I couldn't find enough time to prioritize this one on my rotation.

@amyblais
Copy link
Member

@amyblais amyblais added the Bug Report/Open Bug report/issue label Feb 16, 2022
@mkraft
Copy link
Contributor

mkraft commented Feb 16, 2022

@Vovcharaa Were you using s3 file storage prior to the upgrade?

@Vovcharaa
Copy link
Contributor Author

@mkraft Yes. Currently there is around 100 GB of data and everything works fine with version 6.0.4.

@mkraft
Copy link
Contributor

mkraft commented Feb 16, 2022

@Vovcharaa Would you please be able to share your startup logs like you did before but with System Console > Environment > File Store > Enable Amazon S3 Debugging set to true (ie FileSettings.AmazonS3Trace set to true in the config).

@Vovcharaa
Copy link
Contributor Author

@mkraft S3 bucket name masked.
startupWithS3Debug.txt

@mkraft
Copy link
Contributor

mkraft commented Feb 17, 2022

@Vovcharaa You said that download, uploading, and server restart are slow, but also that the server is completely unusable. Is reading and writing posts, loading channels, changing teams, etc slow too? Or is it just file uploading, downloading, and restart. Are plugins enabled?

I added a note to the Jira ticket that my tests comparing those 2 release versions showed no notable change between upload and download of files using S3 (downloading was actually slightly faster on the newer version).

@Vovcharaa
Copy link
Contributor Author

@mkraft
Attachments, photos, user profile pictures ether loading very slowly or not loading completely, which means server can`t be used for communication purposes for our teams. Loading performance of data stored in database seems not to be affected.

Plugins are enabled but I tried to delete all of them or disable completely and had same result.
Actually, all logs that I presented here are on fresh installation.

Perhaps problem is related to AWS region? We use eu-north-1 for S3 bucket and EC2 (t3.small).

@mkraft
Copy link
Contributor

mkraft commented Feb 17, 2022

@Vovcharaa eu-north-1 is 50% slower for me to download the same test file compared to ca-central-1, but I'm in Canada. Also, that would only make sense in your case if the upgrade were coincidental (rather than the trigger). If you roll back to v6.0.4 do you regain your previous better performance?

@Vovcharaa
Copy link
Contributor Author

@mkraft Yes.

I setuped test server where this bug is in action. Would it be useful for you to take a look on it yourself? I could provide you test account credentials in some private form.

@mkraft
Copy link
Contributor

mkraft commented Feb 17, 2022

@Vovcharaa Yes please. I'm available either on our community instance or by email.

@Vovcharaa
Copy link
Contributor Author

@mkraft Sended in PM on community instance.

@mkraft
Copy link
Contributor

mkraft commented Feb 18, 2022

@Vovcharaa I tried switching to a new test S3 bucket created under our Mattermost coporate AWS account and the performance issues were immediately solved, as far as I could tell. We recommend experimenting with increased resources.

@mkraft mkraft closed this as completed Feb 18, 2022
@Vovcharaa
Copy link
Contributor Author

Vovcharaa commented Feb 18, 2022

@mkraft Did you use AWS keys directly for S3? I assume problem is with EC2 instance metadata url. Something changed with upgrade of minio-go dependency.
By default this setting is set to 1. I created new instance where i changed value to 2 (haven`t found how to change it on created instance), and issue has gone. When using directly IAM keys this settings is completly ignored.
image

@mkraft
Copy link
Contributor

mkraft commented Feb 18, 2022

@Vovcharaa Yes, I used explicit keys. We did upgrade the minio dependency from v7.0.11 to v7.0.14.

@Vovcharaa
Copy link
Contributor Author

@mkraft Could this be related to this issue?

@Vovcharaa
Copy link
Contributor Author

Vovcharaa commented Feb 18, 2022

Instance is definitely not under resourced. This issue is with changed communication logic with S3 while using EC2 instance role, which was introduced with minio dependency upgrade.

@mkraft mkraft reopened this Feb 18, 2022
@Vovcharaa
Copy link
Contributor Author

This issue was mitigated in version minio-go v7.0.24 by pr minio/minio-go#1626

@agnivade
Copy link
Member

That sounds great @Vovcharaa ! Thank you for digging into this. Would you be open to sending a PR to upgrade the minio dependency?

@Vovcharaa
Copy link
Contributor Author

@agnivade #20006

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Report/Open Bug report/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants