Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add purchase date dynamic filter [MAILPOET-4986] #4760

Merged
merged 15 commits into from Apr 3, 2023

Conversation

johnolek
Copy link
Contributor

@johnolek johnolek commented Feb 27, 2023

Description

This PR implements a Purchase Date dynamic segment for MAILPOET-4986

Code review notes

Sorry the commits aren't as cohesive as I'd like.

I really like the simplicity of using a subquery for everything, but I know conventional wisdom is that joins are more likely to perform better. Does the subquery approach seem problematic for performance reasons?

Edit: the queries seem to be performing slowly locally, but I haven't done a comparison with joins. I'm assuming I'll need to update to use joins other than in the negated cases.

Edit2: I updated the code to use standard joins for all but the negated cases, where I think we will need a subquery (correct me if I'm wrong).

I think it was probably a mistake to make an abstract class for shared functionality. It would probably be better to have some kind of date field helper that we inject instead. Does that seem like a good idea or is there a better approach?

Edit: I've updated the code to use DI and more helpers instead. I may have made it more confusing by trying to add too many helpers.

I think the refactoring I did for the subscribed date filter was true refactoring and shouldn't have changed any functionality, but I'd appreciate if you could take a close look to verify.

QA notes

Please be sure to test the Subscribed Date filter as well because I refactored much of its code.

Linked PRs

A change is required in premium due to some refactoring: https://github.com/mailpoet/mailpoet-premium/pull/688

Linked tickets

MAILPOET-4986

After-merge notes

N/A

@johnolek
Copy link
Contributor Author

@JanJakes I'd love your take on this PR since I know you've been looking deeply into this code for the automations work. It's still a draft because I want to refactor the shared functionality to not rely on an abstract class, but I'd love to get your opinion on the best way to handle it. I'd love to make it possible to re-use this code for the date options for custom fields as well.

Copy link
Contributor

@JanJakes JanJakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnolek Sorry for the delay. I wasn't looking into the related code that deeply 😄 Just trying to understand how it works on a higher level. It was my first time looking into dynamic segments, so I'm no expert, but I'll try to have a look.

I really like the simplicity of using a subquery for everything, but I know conventional wisdom is that joins are more likely to perform better. Does the subquery approach seem problematic for performance reasons?

If the subquery is not correlated, then perhaps it could be equivalent to a JOIN, and we could hope that the engine would use the same query plan. But maybe I'm too optimistic, not sure if the MySQL query planner is powerful-enough.

That said, you can try this by comparing the query plans via EXPLAIN, if you want. The only issue with that is that a query plan can differ based on the number of rows in the tables.

In general, this looks good to me. I like the refactorings that result in composition via the new helpers.

For the bigger picture, that is, compatibility with automations, and overall performance, I don't really have any ideas yet. I think dynamic segments (that require fetching all the subscribers) can't be implemented using simple automation-like "filters" (that check subscribers and their attributes one-by-one in PHP rather than via SQL), which means we'll probably need to keep the existing codebase roughly as it is, only maybe optimizing some of the queries. I also reflected on that here: https://wp.me/pcNwfB-1TM#comment-1574

One thing I'm pondering is whether we could do all the query compositions on the ORM query builder level rather than DBAL. It would probably make better APIs for extensibility by avoiding using direct table and column names, etc., but at the same time, it's slightly more limiting.

@johnolek johnolek force-pushed the MAILPOET-4986/purchase_date_dynamic_filter branch from a8858e9 to 6073d51 Compare March 2, 2023 22:51
@johnolek johnolek marked this pull request as ready for review March 2, 2023 22:54
@johnolek
Copy link
Contributor Author

johnolek commented Mar 2, 2023

Thanks for your review @JanJakes, and no worries about the delay!

If the subquery is not correlated, then perhaps it could be equivalent to a JOIN, and we could hope that the engine would use the same query plan. But maybe I'm too optimistic, not sure if the MySQL query planner is powerful-enough.

I imagine this could also differ based on which version of MySQL/mariadb is used, and we probably want to play it as safe as possible.

It was pretty straightforward to refactor the code to use joins on the main querybuilder for all but the negated cases. If there's a way to use standard joins for the negated cases too I would be happy to refactor that as well.

(thanks by the way for the link to the info on correlated queries)

One thing I'm pondering is whether we could do all the query compositions on the ORM query builder level rather than DBAL. It would probably make better APIs for extensibility by avoiding using direct table and column names, etc., but at the same time, it's slightly more limiting.

This sounds great to me, but we'll still have to rely on the DBAL to some extent for things like WC data, or is that not correct? I'm guessing we may not want to give up the finer control we get from the DBAL, especially as we're looking to minimize performance concerns for users with very large amounts of data.

I believe this should be good for a final review now, and please note that there's a related premium PR due to some of the refactoring (it's very simple).

Thanks again!

Copy link
Contributor

@JanJakes JanJakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👏


private function applyConditionsToQueryBuilder(string $operator, string $date, QueryBuilder $queryBuilder): QueryBuilder {
$orderStatsAlias = $this->wooFilterHelper->applyOrderStatusFilter($queryBuilder);
$quotedDate = $queryBuilder->expr()->literal($date);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tiny detail, but is there any reason for not using a placeholder (e.g., ? or a named one) and ->setParameter(...)? I guess Doctrine DBAL supports that as well. A similar question probably applies also below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a tiny detail at all! The reason I did it this way was because of the subquery and the fact that I'm calling $subQuery->getSQL(). With a prepared statement like you get with setParameter, the value doesn't come through when you call getSQL(), and you end up having to setParameter on the main query instead. I didn't like having to call setParameter like that, and I also didn't want to actually execute the subquery at this point, preventing any optimization that might occur at the DB level.

Using a placeholder seems like a much better way to go than my current solution, and I'll do that unless you think it would be better overall to use setParameter.

Honestly I haven't worked with doctrine very much, so any tips you have on best practices are very much appreciated!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JanJakes I took a stab at an approach that lets us use setParameter in 0c6614f. It still involves a string replacement, but only if we need to get the SQL string for the negated cases. It feels like there should be a more elegant solution and I plan on looking into it, but if you happen to have any advice I'm all ears!

$subscribersTable = $this->filterHelper->getSubscribersTable();

if (in_array($operator, [DateFilterHelper::NOT_ON, DateFilterHelper::NOT_IN_THE_LAST])) {
$subQuery = $this->filterHelper->getNewSubscribersQueryBuilder();
$this->applyConditionsToQueryBuilder($operator, $date, $subQuery);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already late in the day here, so I might not be fully getting this, but would changing the conditions in the switch help?

What if we change this:

case DateFilterHelper::IN_THE_LAST:
case DateFilterHelper::NOT_IN_THE_LAST:
  $queryBuilder->andWhere("DATE($orderStatsAlias.date_created) >= $quotedDate");
  break;
case DateFilterHelper::ON:
case DateFilterHelper::NOT_ON:
  $queryBuilder->andWhere("DATE($orderStatsAlias.date_created) = $quotedDate");

To sth like this:

case DateFilterHelper::IN_THE_LAST:
  $queryBuilder->andWhere("DATE($orderStatsAlias.date_created) < $quotedDate");
  break;
case DateFilterHelper::NOT_IN_THE_LAST:
  $queryBuilder->andWhere("DATE($orderStatsAlias.date_created) >= $quotedDate");
  break;
case DateFilterHelper::ON:
  $queryBuilder->andWhere("DATE($orderStatsAlias.date_created) != $quotedDate");
  break;
case DateFilterHelper::NOT_ON:
  $queryBuilder->andWhere("DATE($orderStatsAlias.date_created) = $quotedDate");
  break;

They're inner joins, so... wouldn't it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the NOT filters should return all subscribers that aren't returned by their inverse, i.e. if you had one segment that was not a specific date and another that was a specific date, you would have exactly 100% of your subscribers represented by both segments.

With the joins, we would only be returning users who have at least one purchase.

@JanJakes JanJakes assigned johnolek and unassigned JanJakes Mar 7, 2023
@johnolek johnolek changed the title Mailpoet 4986/purchase date dynamic filter Add purchase date dynamic filter [MAILPOET-4986] Mar 9, 2023
@johnolek johnolek assigned JanJakes and unassigned johnolek Mar 9, 2023
JanJakes
JanJakes previously approved these changes Mar 14, 2023
@JanJakes JanJakes assigned veljkho and Aschepikov and unassigned JanJakes Mar 14, 2023
@johnolek johnolek force-pushed the MAILPOET-4986/purchase_date_dynamic_filter branch from 0c6614f to 6f153b6 Compare March 17, 2023 16:10
@Aschepikov
Copy link
Collaborator

Hi @johnolek I found 1 issue, I can create a segment with an empty period
image
We shouldn't allow segment creation unless the period is selected.
image

@johnolek
Copy link
Contributor Author

Good catch, @Aschepikov, thank you!

@JanJakes I fixed the validation issue in 5872c91. I also had to do some rebasing due to merge conflicts I think came from #4786. Would you mind taking one more look at this whenever you have a chance?

JanJakes
JanJakes previously approved these changes Mar 24, 2023
@@ -156,3 +143,32 @@ export function SubscribedDateFields({ filterIndex }: Props): JSX.Element {
</Grid.CenteredRow>
);
}

export function dateFieldValidator(formItems: DateFormItem): boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a super-tiny observation is that I'd include a verb in the method name, e.g. validateDateField or isDateFieldValid. The current name sounds a bit like a service. Anyway, no need to fix it now.

Btw. do we need to validate that the date makes sense for any reason? Like that month is not 13, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out, @JanJakes! I would like to move this PR forward so I'm going to hold off on more changes for now, but I will make a note to refactor this when I work on one of the other segmentation tickets.

As for the 13, I think it's very unlikely we would get such a value because we're relying on the date picker, and I'm also just re-using existing logic that we haven't been having issues with so far to my knowledge, so I'm inclined to leave it as is.

@JanJakes
Copy link
Contributor

@johnolek LGTM, approved. Feel free to assign it to QA, I left only one small comment.

@JanJakes JanJakes removed their assignment Mar 24, 2023
@johnolek johnolek assigned veljkho and unassigned johnolek Mar 27, 2023
@johnolek
Copy link
Contributor Author

There were some conflicts due to #4799, so I rebased and pushed again.

@johnolek johnolek added the priority in QA This label is for QA engineers to know which PR to test first. label Mar 29, 2023
Copy link
Collaborator

@Aschepikov Aschepikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GJ!

@Aschepikov Aschepikov merged commit 1d87c43 into trunk Apr 3, 2023
2 checks passed
@Aschepikov Aschepikov deleted the MAILPOET-4986/purchase_date_dynamic_filter branch April 3, 2023 08:20
@Aschepikov Aschepikov assigned johnolek and unassigned veljkho and Aschepikov Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority in QA This label is for QA engineers to know which PR to test first.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants