Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(document): avoid creating unnecessary empty objects when creating a state machine #11988

Merged
merged 7 commits into from Jul 18, 2022

Conversation

vkarpov15
Copy link
Collaborator

Re: #11541

Summary

Every document gets a state machine instance in doc.$__.activePaths to track whether a path is modified, required, init, ignored, etc. However, this can be a memory hog if there are no paths that are in a given state, especially if we have 10k+ subdocuments.

Here's the benchmark script I've been using for #11541:

'use strict';

const mongoose = require('mongoose');
const { Schema } = mongoose;

mongoose.Schema.Types.DocumentArray.set('_id', false);
mongoose.Schema.Types.Subdocument.set('_id', false);

const geoPointSchema = new Schema({
    type: {
        type: String,
        enum: ["Point"],
default: 'Point',
        required: true,
    },
    coordinates: {
        type: [Number],
        required: true,
    },
});

const journeySchema = new Schema({
    status: {
        type: String,
        enum: ["available", "completed", "cancelled"],
    },
    start_point_text: String,
    start_point_coordinates: {
        type: geoPointSchema,
        index: "2dsphere",
    },
    end_point_text: String,
    end_point_coordinates: {
        type: geoPointSchema,
        index: "2dsphere",
    },
    start_time: Date,
    end_time: Date,
});

const journeySummarySchema = new Schema({
    id: Schema.Types.ObjectId,
    private: Boolean,
    groups: [{ id: Schema.Types.ObjectId }],
    created: Date,
    start_time: Date,
    end_time: Date,
    start_point_coordinates: geoPointSchema,
    start_point_text: String,
    end_point_coordinates: geoPointSchema,
    end_point_text: String
});

const userSummarySchema = new Schema({
    role: { type: String, enum: ["driver", "requested", "passenger", "not-part-of-journey"] },
});

const journeyEventSchema = new Schema({
    time: Date,
    journey: journeySummarySchema,
    user: userSummarySchema,
    action: String, /*[
        "create",
        "cancel",
        "join"
    ],*/
    reconstructed: { type: Boolean, default: false },
});

const journeyStateUserSummarySchema = new Schema({
    ...userSummarySchema.obj,
    status: String //"confirmed" | "refunded",
});

const journeyStateJourneySummarySchema = new Schema(
    {
        ...journeySummarySchema.obj,
        request_status: {
            type: String,
            enum: ["available", "accepted"],
        },
        status: {
            type: String,
            enum: ["available", "completed", "cancelled"],
        },
    }
);

const journeyStateSchema = new Schema({
    _id: Schema.Types.ObjectId,
    journey: journeyStateJourneySummarySchema,
    complete_journey: journeySchema,
    user: journeyStateUserSummarySchema,
});

const searchEventSchema = new Schema({
    time: Date,
    start_point_coordinates: geoPointSchema,
    end_point_coordinates: geoPointSchema,
});

const userEventLogSchema = new Schema({
    schema_version: Number,
    /*user: {
        id: Schema.Types.ObjectId,
    },*/
    events: {
        journeys: [journeyEventSchema],
        searches: [searchEventSchema],
    },
    states: {
        journeys: [journeyStateSchema],
    },
});

const UserEventLog = mongoose.model('UserEventLog', userEventLogSchema);

run().catch(err => console.log(err));

async function run() {
  const doc = new UserEventLog({ events: { journeys: [], searches: [] }, states: { journeys: [] } });
  //const doc = { events: { journeys: [], searches: [] }, states: { journeys: [] } };

  setInterval(() => {
    console.log('[Timer] Memory usage:', process.memoryUsage().heapUsed / (1024 ** 2));
  }, 2_000);

  const start = Date.now();
  for (let i = 0; i < 10000; ++i) {
    doc.events.journeys.push({
      journey: {
        created: new Date(),
        start_point_coordinates: { type: 'Point', coordinates: [0, 0] }
      },
      user: {
        role: 'driver'
      }
    });
    doc.events.searches.push({
        start_point_coordinates: { coordinates: [0, 0] }
    });
    doc.states.journeys.push({
        journey: {
            created: new Date(),
            start_point_coordinates: { type: 'Point', coordinates: [0, 0] },
            request_status: 'available'
        },
        complete_journey: {
            status: 'available',
            start_point_coordinates: { type: 'Point', coordinates: [0, 0] }
        },
        user: {
            role: 'driver',
            status: 'confirmed'
        }
    });
  }

  const time = Date.now() - start;
  console.log('Events', doc.events.searches[0], !!doc.events.searches[0].start_point_coordinates.$__);
  console.log('Done', time);
}

Before this change:

Events {
  start_point_coordinates: { type: 'Point', coordinates: [ 0, 0 ] },
  _id: new ObjectId("62b751868892cca5ab60b758")
} true
Done 2159
[Timer] Memory usage: 138.9599838256836

After:

Events {
  start_point_coordinates: { type: 'Point', coordinates: [ 0, 0 ] },
  _id: new ObjectId("62b7518f54f423480550597d")
} true
Done 2227
[Timer] Memory usage: 122.81986999511719

So this represent a decent improvement in memory usage that I'd like to merge in 6.5. There's some risk because Object.keys(null) throws an error, and we've historically used Object.keys(this.$__.activePaths.states.modify) to get all modified paths. There may be plugins that copied our code that would need to update.

Examples

@vkarpov15 vkarpov15 requested a review from Uzlopak June 25, 2022 18:26
@vkarpov15 vkarpov15 self-assigned this Jun 25, 2022
Copy link
Collaborator

@AbdelrahmanHafez AbdelrahmanHafez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. 👍

Copy link
Collaborator

@Uzlopak Uzlopak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 496 to 498
if (doc.$__.activePaths.states.modify == null) {
doc.$__.activePaths.states.modify = {};
}
Copy link
Collaborator

@hasezoey hasezoey Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand this check correctly, then this could be done out of the loop (for (const path of state.modifiedPaths)) because it does not need to be repeated, only a condition would need to be added to check that state.modifiedPaths is not empty as to not unnecessary set it when empty

i did a small test with performance across 3 different points with 3 runs each:

// first i ran on branch 6.5, where
memavg = 147.31509908040366
timeavg = 1944.6666666666667

// then i ran on branch vkarpov15/gh-11541, where
memavg = 120.52366638183594
timeavg = 1887.3333333333333

// then i modified and moved the if outside of the mentioned loop, where
memavg = 121.06150309244792
timeavg = 1834.3333333333333
Run Numbers
run1mem=[145.24718475341797,151.04356384277344,145.65454864501953]
run1time=[1938,1952,1944]
run2mem=[119.71839141845703,120.27562713623047,121.57698059082031]
run2time=[1929,1861,1872]
run3mem=[120.03515625,122.43016052246094,120.71919250488281]
run3time=[1812,1849,1842]

sum = (arr) => {let sum=0;for(const a of arr) {sum+=a} return sum;}
avg = (arr) => {let asum = sum(arr); return (asum / arr.length) || 0;}

avg(run1mem);
avg(run1time);
avg(run2mem);
avg(run2time);
avg(run3mem);
avg(run3time);

Note: the if i used is:

if (state.modifiedPaths.length > 0 && doc.$__.activePaths.states.modify == null) {
  doc.$__.activePaths.states.modify = {};
}

and i had also modified the test script a bit: https://gist.github.com/hasezoey/6bc112e760b05e9a059550aa8e3dbbfa

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion 👍

@vkarpov15 vkarpov15 added this to the 6.5 milestone Jul 18, 2022
@vkarpov15 vkarpov15 merged commit 871a121 into 6.5 Jul 18, 2022
@vkarpov15 vkarpov15 deleted the vkarpov15/gh-11541 branch July 18, 2022 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants