Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added role whitelist to internals of wp-cli guest author generation command. #575

Open
wants to merge 321 commits into
base: develop
Choose a base branch
from

Conversation

smistephen
Copy link
Contributor

Previously, as noted in #553, wp co-authors-plus create-guest-authors created a guest author for every user in the database.

This was a massive performance suck, as well as being largely unnecessary. Adding a subscriber as a guest author is not a common use case.

Now, only users having the role of administrator, editor, and author have guest authors generated for them.

The query to filter users by role is a touch heavy (it performs LIKEs on meta values), but testing it on a site with 1000 users showed no major issues.

It was also compared to simply querying all the users, and doing the filtering in PHP. Head to head, there was no significant difference in performance between the two approaches, and using the WordPress API is always a better bet than writing your own one-off custom filter algorithm.

Rebecca Hum and others added 30 commits May 31, 2017 15:19
…r-posts

Fixed WP CLI create-terms-for-posts if no co-authors found
…uthor

Replace hardcoded 'author' with $this->$coauthor_taxonomy
Fixes warning: `Warning: sprintf(): Too few arguments in /path/wp-content/plugins/co-authors-plus/php/class-coauthors-guest-authors.php on line 487`

Warning surfaces when deleting a guest author that is mapped to a WP user
Move parenthesis to fix esc_html and sprintf
…lete_guest_author

# Conflicts:
#	php/class-coauthors-guest-authors.php
The option defaults to creating guest authors for administrators, editors, authors, and contributors. Roles are provided in a comma separated list such as `--roles=author,freelancer`.

If the user provides a role that does not exist in their instance, execution is halted and an error is displayed such as `Role freelancer does not exist.`. This error message is made available for translation.
Some sites can have millions of subscribers. Querying for that many users presents a serious potential performance challenge.

Rather than take the risk of having an end user specify them in the --roles flag, they have now been explicitly banned.
Filtering users by role on the database has the potential to make for a very expensive query, as it performs a LIKE on an unindexed field.

But if the users were all brought over at once, unfiltered, the potential exists for an array with millions of elements, straining the application server's memory limit.

So the compromise is: batching.

Users are fetched from the database, unfiltered, in batches of user-defined size. They are then filtered, and a new batch is brought in.

This algorithm looks to strike a balance between load on the database server, and memory limits on the application server.
Copy link
Contributor

@sboisvert sboisvert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt a few comments inline.
There doesn't seem to be a dry run option and there doesn't seem to be a way to capture the log of what was changed. Could we add those?

// Subscribers cannot be made guest authors en masse
// due to potential performance issues (sites can have millions of subscribers).
if ( 'subscriber' === $role ) {
WP_CLI::error( __( 'For performance reasons, subscribers cannot be made guest authors via wp-cli. Please use the admin interface.', 'co-authors-plus' ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a "force" option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add that

// There are no arguments at this time
);
'roles' => array( 'administrator', 'editor', 'author', 'contributor' ),
'batch-size' => 1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would folks need to specify this? What would be the difference if we were to limit it to batches of 100 at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to stop the script from fetching ALL of the users at once and crashing servers with low memory limits, while also allowing servers with greater resources to open up the throttle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to limit it to batches of 100 at a time, it would take longer but use less memory (a tradeoff I've had to make many times in the past running on low-end equipment, haha)

php/class-wp-cli.php Outdated Show resolved Hide resolved
php/class-wp-cli.php Outdated Show resolved Hide resolved
Previously, the translated string was broken up into two fragments, with the $role variable concatenated in between.

To enable easier translation, the string is now generated in one piece via sprintf (interpolated variables inside double quotes are not allowed).
…ber guest authors.

In the event that an end user accepts the performance hit of generating guest authors for all their subscribers, this flag allows them to do so.

Named --force-subscribers instead of --force to allow for future development.
…rms data modifications.

By default, the command outputs the IDs of the users it would have attempted to generate guest authors for, had this not been a dry run.

Passing in the --not-a-dry-run flag tells the code to actually make the attempts.

Props sboisvert for the idea to make this command dry by default.
…pt results.

This flag results in one of two potential log messages:

"User ID 1 is now linked to Guest Author 2"
"Error while attempting to generate guest author for user ID 1: errormessagehere"

The command logs to STDOUT by default, but can be easily redirected to the location of the end user's choice.
php/class-wp-cli.php Outdated Show resolved Hide resolved
php/class-wp-cli.php Outdated Show resolved Hide resolved
php/class-wp-cli.php Outdated Show resolved Hide resolved
if ( is_wp_error( $result ) ) {
$skipped++;
if ( true === $log_output ) {
foreach ( $result->get_error_messages() as $error_message ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it even be possible that the error from create_guest_author_from_user_id() has multiple error messages? Any scenario in which this may happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just me programming defensively.

WP_Error->get_error_messages() (https://developer.wordpress.org/reference/classes/wp_error/) returns an array of error messages even if there's just one.

I could use WP_Error->get_error_message(), which returns an int or string, but that makes me incredibly nervous, as it silently suppresses all other error messages even if they exist.

So yes, currently create_guest_author_from_user_id() only ever returns one error message. But I can't predict that it always will, so I coded it this way :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible. I was skeptical just because I have never seen that in any plugin/core piece I ever looked at, but yes :)

@TheCrowned
Copy link
Contributor

Hey! We may want to improve the inline documentation of the function to comply with WP_CLI PHPDoc standards. Resources that have been very helpful for me when I dealt with similar issue have been this Commands Cookbook - Annotating with PHPDoc and How to write custom WP_CLI commands for WordPress automation (scroll to PHPDoc comments and controls section).

For example, we'd want to split the description in short and long; all optional args should be wrapped in < >; add a synopsis; etc.

Also, boolean options shouldn't be treated as all others. For example, specifying [--<dry-run>] will automatically force it to be a boolean arg, and will allow to disable with --no-dry-run. This way, you should also manage to get rid of all the is_bool() checks, and any boolean option can be turned off with a no- in front of it :) This way everything is simpler and we avoid having a not-a-dry-run => false` by default, which is a bit confusing.

On a different point, I don't think that the dry run mode should be the default one - let developers toggle it if they want :) (that is also as other CLI commands work)

Finally, if I understand @sboisvert intentions correctly, he meant we should log the routine progress, but not necessarily give an option to do so. The logging should happen by default. Then, if someone doesn't want it, they can redirect it to /dev/null. Adding an option for that would really just be adding another way to write > /dev/null, and I don't think it's necesary :)

(I also left inline comments :))

@smistephen
Copy link
Contributor Author

I've made the formatting changes.

For the dry by default, that was in keeping with what @sboisvert said during training yesterday - he stated that cli scripts should be dry by default, and I can get behind that. I've modified it to --dry-run, which is true by default, and can be overridden by --no-dry-run.

Also, for the logging, could you clarify, @sboisvert? --log-output was my interpretation of your request for "a way to capture the log of what was changed", but I can change it if need be.

Thanks for all your input @TheCrowned! :D

Despite there being an index on the user_login column in wp_users, MySQL does not use it and instead performs a filesort.

So now the query orders by ID.

Props sboisvert for the subtle poke that something wasn't quite right with the query.
Copy link
Contributor

@TheCrowned TheCrowned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final thing: I'm not sure we want to include the command expected output in the example lines? I think it's enough to have a (brief) description of what it does, and then the command, without the related output :)

$force_subscribers = true;
} else {
$force_subscribers = false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we can just have a $force_subscribers = $this->args['force_subscribers']; , and the same applies to the other bool options :)

if ( is_wp_error( $result ) ) {
$skipped++;
if ( true === $log_output ) {
foreach ( $result->get_error_messages() as $error_message ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible. I was skeptical just because I have never seen that in any plugin/core piece I ever looked at, but yes :)

@smistephen
Copy link
Contributor Author

The output is in keeping with documentation specification.

Due to changes in how the flags are read, we can now condense our checks into a series of oneliners.
@GaryJones
Copy link
Contributor

The key changes are going to need to be manually picked out of this, but there's value to be had. Adding to a future milestone.

@GaryJones GaryJones changed the base branch from main to develop August 25, 2023 15:23
@alecgeatches alecgeatches modified the milestones: 3.6, 3.7 Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet