Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optimize finding newer active deployments for device using MongoDB Aggregation #922

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BenMeehan
Copy link

Optimize finding newer active deployments for device using MongoDB aggregation

Noticed a lot of data egress when a device checks for an update, especially if the deployment size is large.

This commit optimizes a database query using the MongoDB aggregation framework to improve performance and reduce resource usage and network data transfer. The changes include a more efficient aggregation pipeline to fetch data, resulting in faster query execution and less data transferred overall.

Changelog: All
Ticket: None

…oDB aggregation

This commit optimizes a database query using the MongoDB aggregation framework
to improve performance and reduce resource usage and network data transfer. The changes include a more efficient aggregation pipeline to fetch data, resulting in faster query execution and less data transferred overall.

Changelog: All
Ticket: None
@mender-test-bot
Copy link

Merging these commits will result in the following changelog entries:

Changelogs

deployments (optimize-find-active-deployments-using-aggregation)

New changes in deployments since master:

Features
  • Optimize finding newer active deployments for device using MongoDB aggregation

    This commit optimizes a database query using the MongoDB aggregation framework
    to improve performance and reduce resource usage and network data transfer. The changes include a more efficient aggregation pipeline to fetch data, resulting in faster query execution and less data transferred overall.

Copy link
Contributor

@alfrunes alfrunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @BenMeehan! 👋

Thank you for the contribution, and apologies for the long delay.
This is certainly an overlooked performance optimization, so thanks for taking your time to fix it.
There are some minor errors in the way the empty documents results are handled, I added some suggestions for fixing and also simplifying the implementation a little bit.
To make the CI happy, you also need to sign off your commit, simply run git commit --amend --sign-off and (force-) push the updated commit message.
Once you've amended the commits I will take care of adjusting the tests.

Cheers! 🍻

@@ -42,6 +42,7 @@ import (
"github.com/mendersoftware/deployments/store"
"github.com/mendersoftware/deployments/store/mongo"
"github.com/mendersoftware/deployments/utils"
mdriver "go.mongodb.org/mongo-driver/mongo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an (internal) developer guideline, the application layer should not import the database driver of the database layer (store).

Suggested change
mdriver "go.mongodb.org/mongo-driver/mongo"

if len(deployments) == 0 {
deployment, err := d.db.FindOldestActiveDeploymentForDevice(ctx, deviceID, lastDeployment)
if err != nil {
if err == mdriver.ErrNoDocuments {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error should be checked for inside the mongo package. Also, mdriver.Collection.Aggregate will never return this error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggestion below.

Comment on lines +2953 to +2986
// use an aggregation pipeline to fetch only the oldest active deployment
pipeline := []bson.M{
{
"$match": bson.M{
StorageKeyDeploymentCreated: bson.M{"$gt": createdAfter},
StorageKeyDeploymentActive: true,
StorageKeyDeploymentDeviceList: bson.M{"$in": []string{deviceID}},
},
},
{
"$sort": bson.M{StorageKeyDeploymentCreated: 1},
},
{
"$limit": 1,
},
}

var deployment *model.Deployment

cursor, err := c.Aggregate(ctx, pipeline)
if err != nil {
return nil, errors.Wrap(err, "failed to get deployments")
}
defer cursor.Close(ctx)

for cursor.Next(ctx) {
if err := cursor.Decode(&deployment); err != nil {
return nil, errors.Wrap(err, "failed to get deployments")
}
}

if err := cursor.Err(); err != nil {
return nil, errors.Wrap(err, "failed to get deployments")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested simplification (requires import "fmt"):

Suggested change
// use an aggregation pipeline to fetch only the oldest active deployment
pipeline := []bson.M{
{
"$match": bson.M{
StorageKeyDeploymentCreated: bson.M{"$gt": createdAfter},
StorageKeyDeploymentActive: true,
StorageKeyDeploymentDeviceList: bson.M{"$in": []string{deviceID}},
},
},
{
"$sort": bson.M{StorageKeyDeploymentCreated: 1},
},
{
"$limit": 1,
},
}
var deployment *model.Deployment
cursor, err := c.Aggregate(ctx, pipeline)
if err != nil {
return nil, errors.Wrap(err, "failed to get deployments")
}
defer cursor.Close(ctx)
for cursor.Next(ctx) {
if err := cursor.Decode(&deployment); err != nil {
return nil, errors.Wrap(err, "failed to get deployments")
}
}
if err := cursor.Err(); err != nil {
return nil, errors.Wrap(err, "failed to get deployments")
}
var deployment *model.Deployment
err := c.FindOne(ctx, bson.M{
StorageKeyDeploymentCreated: bson.M{"$gt": createdAfter},
StorageKeyDeploymentActive: true,
StorageKeyDeploymentDeviceList: bson.M{"$in": []string{deviceID}},
}, mopts.FindOne().SetSort(bson.M{StorageKeyDeploymentCreated: 1})).
Decode(&deployment)
if errors.Is(err, mongo.ErrNoDocuments) {
return nil, nil
} else if err != nil {
return nil, fmt.Errorf(
"failed to lookup latest deployment for device %q: %w",
deviceID, err,
)
}

It's important that the mongo.ErrNoDocuments is caught correctly.

return nil, nil, nil
}
return nil, nil, errors.Wrap(err,
"Failed to search for newer active deployments")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following from suggestion above.

Suggested change
}
} else if deployment == nil {
return nil, nil, nil
}

@alfrunes
Copy link
Contributor

alfrunes commented Oct 4, 2023

Ohh, and one more thing! We use semantic commits to track and group changelogs, so could you please also update the commit message to use the perf scope, and maybe adjust the message a little?

Suggestion
perf: Optimize query for finding next deployment for device
This commit optimizes a database query for getting the next deployment for a device to improve performance and reduce resource usage and network traffic. The change replaces scanning for the next deployment with a single document query that gets the next deployment directly.
Changelog: Title Ticket: None Signed-off-by: Ben Meehan <benmeehan111@gmail.com>

Cheers!

Copy link

stale bot commented Dec 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants