Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain location dynamically from link #1039

Open
jetlime opened this issue Jan 4, 2024 · 1 comment
Open

Obtain location dynamically from link #1039

jetlime opened this issue Jan 4, 2024 · 1 comment

Comments

@jetlime
Copy link
Contributor

jetlime commented Jan 4, 2024

In some sites such as the linkedin transparency reports, the terms of interest are located in dynamically named endpoints that could for example be determined by time (e.g. October-2023-LinkedIn-DSA-Transparency-Report10.pdf). These dynamic endpoints of interest are in most cases located in fixed locations. It thus makes sense to introduce the new declaring term dynamic-fetch.

This term will fetch the document located on the dynamic endpoint dynamic-fetch.variable defined at dynamic-fetch.location. It will be complimentary to fetch.

It could potentially be defined as follows,

{
  "name": "Linkedin",
  "documents": {
      "Transparency Ad Library": {
          // This shall fetch the pdf doc at https://content.linkedin.com/content/dam/help/linkedin/en-us/October-2023-LinkedIn-DSA-Transparency-Report10.pdf
           "dynamic-fetch": {
             "variable": "div[class=\"t-14 article-content__rich-text hue-default-color\"] > ul > li:first-child > a.getAttribute('href')",
             "location": "https://www.linkedin.com/help/linkedin/answer/a1678508?hcppcid=search"
           }
      }
  }
}

As I am pretty new to this tool, I would be happy to hear some feedback about this proposition!
If you share my vision, I would be happy to implement it :)

@MattiSG
Copy link
Member

MattiSG commented Mar 14, 2024

Thanks @jetlime for this suggestion!

Indeed, it happens sometimes that terms are only available as a downloadable file behind a link. The idea of obtaining the URL dynamically from the DOM is a smart answer to that problem 👍

The main question we need to answer to decide if it would be worth adding a new type of fetch is: are the location and DOM from which we obtain the link any more stable than the link itself? In the case at hand, DSA Transparency Reports are published every 6 months. We'd need to demonstrate that the location and DOM from which the link can be obtained change significantly less often than twice a year, otherwise the maintenance burden will be the same on collection maintainers, and we would have increased software complexity for nothing 😰

The next investigation steps I see are:

  1. Identify at least 2 other cases where such a system would be used.
  2. Measure with the Wayback Machine (or any other reliable history tool) how often the location or link selector changed (l) vs how often the target of the link changed (t) in at least the last 2 years.

If t > e ⨉ l, where e is some arbitrary multiplier encoding the effort it would take to implement this feature, we'll consider it 🙂

@MattiSG MattiSG changed the title New Declaring term: Dynamic Fetching Dynamic Fetching Mar 14, 2024
@MattiSG MattiSG changed the title Dynamic Fetching Obtain location dynamically from link Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants