Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add similarity information to entries #108

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

LorenzoMinto
Copy link
Member

@LorenzoMinto LorenzoMinto commented Sep 16, 2022

To each entry in the feed, we add an index field and similar_entries field that reports the indexes of other entries whose titles are similar above a certain threshold (set in config.py).

{
    "category": "Top News",
    "publish_time": "2022-09-21 15:42:34",
    "url": "https://edition.cnn.com/2022/09/21/politics/joe-biden-united-nations-general-assembly/index.html",
    "img": "https://cdn.cnn.com/cnnnext/dam/assets/220915115300-04-biden-rose-garden-0915-super-169.jpg",
    "title": "Russia's war to extinguish Ukraine 'should make your blood run cold,' Biden says",
    "description": "President Joe Biden returns to the green-marbled United Nations stage Wednesday hours after Russia's president announced in a provocative speech an escalation in his war effort in Ukraine, setting up a rhetorical showdown between the two leaders on the international stage.",
    "content_type": "article",
    "publisher_id": "caae10247386499c8496985ac0ad863ebabe95f760370cdb72c8e7d68d0355ad",
    "publisher_name": "CNN",
    "creative_instance_id": "",
    "url_hash": "708b46d85691bbe71cc65f294cf9e8a7a60512eb5f4358d3233c15d69ab5b8cc",
    "index": "4",
    "padded_img": "https://cdn.cnn.com/cnnnext/dam/assets/220915115300-04-biden-rose-garden-0915-super-169.jpg",
    "score": 30.037019041241365,
    "similar_entries": [
      "8",
      "13",
      "23",
      "31",
      "36",
      "43",
      "45"
    ]
  }

This information can be used by the client to improve the quality of the feed by either: (1) increasing feed diversity by spacing out entries that have similar titles, (2) grouping together similar entries.

@LorenzoMinto LorenzoMinto self-assigned this Sep 16, 2022
@LorenzoMinto LorenzoMinto marked this pull request as ready for review September 21, 2022 16:40
@LorenzoMinto LorenzoMinto marked this pull request as draft December 4, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant