New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add purchase date dynamic filter [MAILPOET-4986] #4760
Conversation
@JanJakes I'd love your take on this PR since I know you've been looking deeply into this code for the automations work. It's still a draft because I want to refactor the shared functionality to not rely on an abstract class, but I'd love to get your opinion on the best way to handle it. I'd love to make it possible to re-use this code for the date options for custom fields as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnolek Sorry for the delay. I wasn't looking into the related code that deeply 😄 Just trying to understand how it works on a higher level. It was my first time looking into dynamic segments, so I'm no expert, but I'll try to have a look.
I really like the simplicity of using a subquery for everything, but I know conventional wisdom is that joins are more likely to perform better. Does the subquery approach seem problematic for performance reasons?
If the subquery is not correlated, then perhaps it could be equivalent to a JOIN
, and we could hope that the engine would use the same query plan. But maybe I'm too optimistic, not sure if the MySQL query planner is powerful-enough.
That said, you can try this by comparing the query plans via EXPLAIN
, if you want. The only issue with that is that a query plan can differ based on the number of rows in the tables.
In general, this looks good to me. I like the refactorings that result in composition via the new helpers.
For the bigger picture, that is, compatibility with automations, and overall performance, I don't really have any ideas yet. I think dynamic segments (that require fetching all the subscribers) can't be implemented using simple automation-like "filters" (that check subscribers and their attributes one-by-one in PHP rather than via SQL), which means we'll probably need to keep the existing codebase roughly as it is, only maybe optimizing some of the queries. I also reflected on that here: https://wp.me/pcNwfB-1TM#comment-1574
One thing I'm pondering is whether we could do all the query compositions on the ORM query builder level rather than DBAL. It would probably make better APIs for extensibility by avoiding using direct table and column names, etc., but at the same time, it's slightly more limiting.
a8858e9
to
6073d51
Compare
Thanks for your review @JanJakes, and no worries about the delay!
I imagine this could also differ based on which version of MySQL/mariadb is used, and we probably want to play it as safe as possible. It was pretty straightforward to refactor the code to use joins on the main querybuilder for all but the negated cases. If there's a way to use standard joins for the negated cases too I would be happy to refactor that as well. (thanks by the way for the link to the info on correlated queries)
This sounds great to me, but we'll still have to rely on the DBAL to some extent for things like WC data, or is that not correct? I'm guessing we may not want to give up the finer control we get from the DBAL, especially as we're looking to minimize performance concerns for users with very large amounts of data. I believe this should be good for a final review now, and please note that there's a related premium PR due to some of the refactoring (it's very simple). Thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👏
|
||
private function applyConditionsToQueryBuilder(string $operator, string $date, QueryBuilder $queryBuilder): QueryBuilder { | ||
$orderStatsAlias = $this->wooFilterHelper->applyOrderStatusFilter($queryBuilder); | ||
$quotedDate = $queryBuilder->expr()->literal($date); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tiny detail, but is there any reason for not using a placeholder (e.g., ?
or a named one) and ->setParameter(...)
? I guess Doctrine DBAL supports that as well. A similar question probably applies also below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a tiny detail at all! The reason I did it this way was because of the subquery and the fact that I'm calling $subQuery->getSQL()
. With a prepared statement like you get with setParameter
, the value doesn't come through when you call getSQL()
, and you end up having to setParameter
on the main query instead. I didn't like having to call setParameter
like that, and I also didn't want to actually execute the subquery at this point, preventing any optimization that might occur at the DB level.
Using a placeholder seems like a much better way to go than my current solution, and I'll do that unless you think it would be better overall to use setParameter
.
Honestly I haven't worked with doctrine very much, so any tips you have on best practices are very much appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JanJakes I took a stab at an approach that lets us use setParameter
in 0c6614f. It still involves a string replacement, but only if we need to get the SQL string for the negated cases. It feels like there should be a more elegant solution and I plan on looking into it, but if you happen to have any advice I'm all ears!
$subscribersTable = $this->filterHelper->getSubscribersTable(); | ||
|
||
if (in_array($operator, [DateFilterHelper::NOT_ON, DateFilterHelper::NOT_IN_THE_LAST])) { | ||
$subQuery = $this->filterHelper->getNewSubscribersQueryBuilder(); | ||
$this->applyConditionsToQueryBuilder($operator, $date, $subQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already late in the day here, so I might not be fully getting this, but would changing the conditions in the switch help?
What if we change this:
case DateFilterHelper::IN_THE_LAST:
case DateFilterHelper::NOT_IN_THE_LAST:
$queryBuilder->andWhere("DATE($orderStatsAlias.date_created) >= $quotedDate");
break;
case DateFilterHelper::ON:
case DateFilterHelper::NOT_ON:
$queryBuilder->andWhere("DATE($orderStatsAlias.date_created) = $quotedDate");
To sth like this:
case DateFilterHelper::IN_THE_LAST:
$queryBuilder->andWhere("DATE($orderStatsAlias.date_created) < $quotedDate");
break;
case DateFilterHelper::NOT_IN_THE_LAST:
$queryBuilder->andWhere("DATE($orderStatsAlias.date_created) >= $quotedDate");
break;
case DateFilterHelper::ON:
$queryBuilder->andWhere("DATE($orderStatsAlias.date_created) != $quotedDate");
break;
case DateFilterHelper::NOT_ON:
$queryBuilder->andWhere("DATE($orderStatsAlias.date_created) = $quotedDate");
break;
They're inner joins, so... wouldn't it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the NOT
filters should return all subscribers that aren't returned by their inverse, i.e. if you had one segment that was not a specific date and another that was a specific date, you would have exactly 100% of your subscribers represented by both segments.
With the joins, we would only be returning users who have at least one purchase.
0c6614f
to
6f153b6
Compare
Hi @johnolek I found 1 issue, I can create a segment with an empty period |
Good catch, @Aschepikov, thank you! @JanJakes I fixed the validation issue in 5872c91. I also had to do some rebasing due to merge conflicts I think came from #4786. Would you mind taking one more look at this whenever you have a chance? |
@@ -156,3 +143,32 @@ export function SubscribedDateFields({ filterIndex }: Props): JSX.Element { | |||
</Grid.CenteredRow> | |||
); | |||
} | |||
|
|||
export function dateFieldValidator(formItems: DateFormItem): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a super-tiny observation is that I'd include a verb in the method name, e.g. validateDateField
or isDateFieldValid
. The current name sounds a bit like a service. Anyway, no need to fix it now.
Btw. do we need to validate that the date makes sense for any reason? Like that month is not 13
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out, @JanJakes! I would like to move this PR forward so I'm going to hold off on more changes for now, but I will make a note to refactor this when I work on one of the other segmentation tickets.
As for the 13
, I think it's very unlikely we would get such a value because we're relying on the date picker, and I'm also just re-using existing logic that we haven't been having issues with so far to my knowledge, so I'm inclined to leave it as is.
@johnolek LGTM, approved. Feel free to assign it to QA, I left only one small comment. |
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
MAILPOET-4986
5872c91
to
f612d47
Compare
There were some conflicts due to #4799, so I rebased and pushed again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GJ!
Description
This PR implements a Purchase Date dynamic segment for MAILPOET-4986
Code review notes
Sorry the commits aren't as cohesive as I'd like.
I really like the simplicity of using a subquery for everything, but I know conventional wisdom is that joins are more likely to perform better. Does the subquery approach seem problematic for performance reasons?Edit: the queries seem to be performing slowly locally, but I haven't done a comparison with joins. I'm assuming I'll need to update to use joins other than in the negated cases.Edit2: I updated the code to use standard joins for all but the negated cases, where I think we will need a subquery (correct me if I'm wrong).
I think it was probably a mistake to make an abstract class for shared functionality. It would probably be better to have some kind of date field helper that we inject instead. Does that seem like a good idea or is there a better approach?Edit: I've updated the code to use DI and more helpers instead. I may have made it more confusing by trying to add too many helpers.
I think the refactoring I did for the subscribed date filter was true refactoring and shouldn't have changed any functionality, but I'd appreciate if you could take a close look to verify.
QA notes
Please be sure to test the Subscribed Date filter as well because I refactored much of its code.
Linked PRs
A change is required in premium due to some refactoring: https://github.com/mailpoet/mailpoet-premium/pull/688
Linked tickets
MAILPOET-4986
After-merge notes
N/A