Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2499 Add Opinion Ordering #2821

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

2499 Add Opinion Ordering #2821

wants to merge 32 commits into from

Conversation

flooie
Copy link
Contributor

@flooie flooie commented Jun 15, 2023

@mlissner

This PR - is a small change.

Add support for Django Ordered Model
Adds ordering for opinions inside opinion cluster.
Updates all fixtures to include a default value of 1 for order.

Once we reprocess Columbia - this should actually really shrink the code for the harvard merger and for the HTML template code displaying opinions.

@flooie flooie requested a review from mlissner June 15, 2023 19:44
Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a quick look, but I won't be able to deploy it until I get to Montreal, so a deep look will have to wait until then.

A few quick thoughts though:

  1. This library looks good, but one thing it says it does is add default ordering to the Meta class. That's logical for this library, but I don't really like default ordering because it slows down queries that don't need ordering. For example, this query is rough on the poor DB on a good day:

     Opinion.objects.all()
    

    But after this change, it becomes:

     Opinion.objects.all().order_by('order')
    

    Because the order_by is always applied by default. Adding ordering when you don't need it, naturally, hurts performance in the DB.

    I think it's much better to have developers explicitly request ordering when they need it, rather than having to remember to disable ordering when they don't.

  2. Did you check that the order field is properly added to any DRF responses?

@flooie
Copy link
Contributor Author

flooie commented Jun 16, 2023

@mlissner

Working backwards - Order does show up in DRF - filtering to fields order and id to make it easier to see

Screenshot 2023-06-16 at 11 22 48 AM

By default it sorts by order - so if we dont
want that feature we simply need to override
the django order with a custom ordered manager
in on the opinion class.

(I think)
@flooie
Copy link
Contributor Author

flooie commented Jun 16, 2023

@mlissner back to you if you can

I tried a couple things but override the custom ordering with a custom ordering manager was (appears to be) the right move.

@mlissner
Copy link
Member

@flooie I'm looking over my outstanding tasks and I think this one is back to you. It needs a rebase, conflicts resolved, and tests to pass, and I forget what it needed wrt to tweaks to the actual code that we were discussing when I was in an airport. :)

Mind giving it a careful look and letting me know when it's ready for another review?

# Conflicts:
#	poetry.lock
#	pyproject.toml
@quevon24
Copy link
Member

@flooie Well, after inspecting django-ordered-model I came to the conclusion that the only way to disable the default ordering is to overwrite some of the code in the library (and that is not an option because with major changes we will need to update the overridden code, I opened an issue in their repo to ask the reason to force the order, maybe we can contribute with that). You could leave the default behavior since it is not common to fetch all the objects in the frontend, the problem would be in the API, since by default it would return the opinions ordered by the ordering field.

I also found a library named django-orderable (https://github.com/incuna/django-orderable) which adds manual sort order to django objects via an abstract base class and admin classes, this library has no methods to move objects, its main goal is to be integrated into the admin and use a drag and drop interface to reorder the objects.

The best alternative is to manually add the ordering field in the opinion model (integerfield or floatfield), to do this, what we need to answer is:

  1. We plan to move the objects in a simpler way using methods? Like: move_up, move_down, move_top, move_bottom, etc. Doing this will require to update the position in the previous and/or next objects if we use ints, if we use floats we can generate smaller numbers between where you want to place the object.
  2. Or will we manually indicate the position in certain scripts like management commands? This will require to update the positions of previous/next objects directly in the command.
  3. We plan to have the migration auto-populate the field? (it could be copy the pk as the position since opinions are generally created in order)
  4. if you use this option, would we implement all the methods within the Opinion model? Or would we create an abstract model that could be reused elsewhere?
  5. This ordering only applies to the frontend when we show the cluster and its opinions?

So far these are my comments, any comment is welcome

@mlissner
Copy link
Member

Yeah, my reaction is that we're over thinking this and the various libraries don't really fit our needs. I think what we need is:

  1. A way of storing the ordering information:
    • A table to join to or as a field on the model?
    • Floats or ints?
  2. Some methods/functions that allow you to set the ordering for an item.
  3. Some tweaks to the front end to show items in the correct order.

I think that's mostly it. I don't think we need functions for move_up or move_top, fancy admin integration, or anything like that. I imagine when you guys are doing the import, you'll just need a setter, right?

Notionally:

opinion.order_position = 1
opinion.save()

Right? The libraries feel either too big or full of features we don't really want, so let's take from them what we do want, and leave the rest behind. I, for one, will be glad to have one fewer library using memory and having to be upgraded and maintained.

If this all easily works via a mixin, great. If not, we only need it for one model so far, so it's probably fine to not abstract it out, at least not yet.

What do you think?

@quevon24
Copy link
Member

quevon24 commented Jul 26, 2023

Yeah, my reaction is that we're over thinking this and the various libraries don't really fit our needs. I think what we need is:

  1. A way of storing the ordering information:

    • A table to join to or as a field on the model?
    • Floats or ints?
  2. Some methods/functions that allow you to set the ordering for an item.

  3. Some tweaks to the front end to show items in the correct order.

I think that's mostly it. I don't think we need functions for move_up or move_top, fancy admin integration, or anything like that. I imagine when you guys are doing the import, you'll just need a setter, right?

Notionally:

opinion.order_position = 1
opinion.save()

Right? The libraries feel either too big or full of features we don't really want, so let's take from them what we do want, and leave the rest behind. I, for one, will be glad to have one fewer library using memory and having to be upgraded and maintained.

If this all easily works via a mixin, great. If not, we only need it for one model so far, so it's probably fine to not abstract it out, at least not yet.

What do you think?

I think the same, I think it would be enough to add a field to the Opinion model and maybe some methods to handle the ordering of the object. Maybe modify the save method or use a signal in case it is necessary to move the opinions from the same cluster.

I also don't know if it would be a good idea, but I was thinking of using a custom migration to fill in the new field based on the pk of the cluster opinions, for example if we have the opinions [<Opinion 258965>, <Opinion 258966>, < Opinion 258967>] for a specific cluster we can fill the field in order: 1, 2, 3 since generally the opinions from sources such as Columbia or Harvard are created in the order of appearance. This can be done in the migration or with a management command to fill the field.

I think that using integers would be enough, to update the order of several opinions in the cluster we could use update() and F functions, for example:

To move next objects
qs = Opinion.objects.filter(cluster_id=cluster_id, order_position__gte=updated_object.order_position ).update(order_position=models.F('order_position ') + 1)

To move previous objects
qs = Opinion.objects.filter(cluster_id=cluster_id, order_position__lte=updated_object.order_position ).update(order_position=models.F('order_position ') - 1)

@flooie
Copy link
Contributor Author

flooie commented Jul 26, 2023

I'll die on the other hill. I dont see the major harm from just using the default ordering code. I prefer integers so I can say give me the second opinion - or the first opinion and know I can just also order it 1 2 3 and not worry about finding the right float value to make it work.

If you say itll be a performance issue down the line or the API - I guess I would cave, but I dont see it.

@quevon24
Copy link
Member

quevon24 commented Jul 26, 2023

I'll die on the other hill. I dont see the major harm from just using the default ordering code. I prefer integers so I can say give me the second opinion - or the first opinion and know I can just also order it 1 2 3 and not worry about finding the right float value to make it work.

If you say itll be a performance issue down the line or the API - I guess I would cave, but I dont see it.

Usually there were performance hits when you have a very large data set in postgresql and you needed to apply ordering and pagination (I mention it because of the API and the deeper you go in the results), but I just remembered that we already updated to postgresql 15 and in this version there were improvements in this aspect (https://www.percona.com/blog/postgresql-15-new-features-to-be-excited-about/#:~:text=Improved%20sorting%20algorithms,performance%20gains%20%E2%80%A6%20they%20just%20work!) and the pagination in the API is limited to a certain number of pages.

And my fault, I didn't realize that the API already applies an order by -id in Opinion endpoint (queryset in OpinionViewSet is forced to order by -id ). So it's probably not a big problem to use the library.

@mlissner @flooie My biggest concern was the API (working a lot with it makes me think about its performance), after seeing the last thing about the API, I no longer see any problem in using the library, but which do you think is the best option?

@quevon24
Copy link
Member

quevon24 commented Jul 26, 2023

And what do you think is the best approach to fill the field with the correct value? With a migration or with a management command?

@flooie
Copy link
Contributor Author

flooie commented Jul 26, 2023

I think we need to update all of harvard and Columbia - so a management command - I think - would be needed to parse the respective XMLs and assign an order.

@mlissner
Copy link
Member

Kevin, if we use the library you suggest (sorry, which one is it?), it'll add default ordering to the Opinion table, right? Would that mean that the API would be ordered by the new ordering table, or does it mean that we just have to make sure the API has ordering (which it does)?

For the migration, I'd suggest just using what we have now as a first pass, if necessary, then adding the correct ordering data when we re-add this data. Why do it in two passes if we know the ordering and are about to do a huge ingestion that will correct this?

@quevon24
Copy link
Member

quevon24 commented Jul 29, 2023

Kevin, if we use the library you suggest (sorry, which one is it?), it'll add default ordering to the Opinion table, right? Would that mean that the API would be ordered by the new ordering table, or does it mean that we just have to make sure the API has ordering (which it does)?

For the migration, I'd suggest just using what we have now as a first pass, if necessary, then adding the correct ordering data when we re-add this data. Why do it in two passes if we know the ordering and are about to do a huge ingestion that will correct this?

@mlissner After inspecting it a lot, the library that @flooie suggested Django Ordered Model works good, it requires to add the ordering option to model meta class, for example:

class Meta:
    ordering = ("order",)

This ordering would only affect the queryset when we retrieve objects using for example:

Opinion.objects.all()
Opinion.objects.filter(type="010combined")

but it won't apply if you execute:

Opinion.objects.filter(type="010combined").order_by("id")
Opinion.objects.order_by("date_created")

Also, this won't apply to every endpoint in the API because we are using ModelViewSet for Opinions endpoint which extends GenericAPIView and according to the documentation it requires to explicitly define the queryset, so every filter or ordering will be applied to the defined queryset, in this case the queryset is defined as:

queryset = (
        Opinion.objects.select_related("cluster", "author")
        .prefetch_related("opinions_cited", "joined_by")
        .order_by("-id")
    )

So if you apply a filter to the Opinions endpoint it will still return the results ordered by id unless you pass an order_by to the endpoint.

On the other hand, if you look at the API endpoint for the clusters, in the serializer we use Opinion.objects.all() for the queryset, so when you go to /api/rest/v3/clusters/, the sub_opinions will be displayed in order.

And regarding how to fill the order field, we continue to discuss it to see how to integrate it in the best possible way, for the moment we need to have the ordering field ready.

What do you think is the best option? To continue working on it and have the PR ready

@mlissner
Copy link
Member

Sorry, Kevin, it took me all day to get to your comment. Yes, this all sounds great! I don't think we have any more blockers as to strategy. Let's go with our original approach and let me know when this is ready for code review. (There's an expression about having to go around the world to discover how nice home is....I feel like we did this here!)

Test added for django-ordered-model library
Optimize imports in search/tests.py
@quevon24
Copy link
Member

quevon24 commented Aug 1, 2023

Thanks @mlissner, the code is ready for review

@quevon24 quevon24 requested a review from mlissner August 1, 2023 02:00
Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you. Let's see how it goes.

@mlissner
Copy link
Member

mlissner commented Aug 1, 2023

The current plan is to hold off on this until Harvard merger is finished, and then supplemented by the missing Columbia data. Once those are in, we'll merge this, then do a pass to backfill it into existing cases.

@mlissner
Copy link
Member

Kevin, you requested a review here, but according to my last comment, we were planning to wait on this until Colombia and Harvard data are merged, right?

@quevon24
Copy link
Member

Kevin, you requested a review here, but according to my last comment, we were planning to wait on this until Colombia and Harvard data are merged, right?

Sorry, it was my mistake, I was reviewing all my pending PRs, merging the PRs with main and correcting conflicts and I think I accidentally hit it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants