Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider extraction errors as expected operational errors without crashing the engine #1069

Merged
merged 21 commits into from May 8, 2024

Conversation

Ndpnt
Copy link
Member

@Ndpnt Ndpnt commented May 2, 2024

Changed

  • Consider extraction errors as expected operational errors without crashing the engine

Added

  • Added label empty content to reported issues on GitHub when server returns empty content or when PDF content is unextractable
  • Added label invalid selector to reported issues on GitHub when CSS selector is invalid

Fixed

  • Prevent engine crashes caused by invalid URLs in source document links
  • Ensure the CLI option --types is not ignored
  • Fix counts of terms when they are specified via the CLI option --types
  • Fixed display issues for service ID and terms type associated with errors

Other technical improvements:

  • Make logger silent in test env
  • Also allow passing no params to the track function
  • Ensure all errors from fetch function will bubble as FetchDocumentError
  • Ensure all errors from extract function will bubble as ExtractDocumentError
  • Minor improvements in logged messages

@Ndpnt Ndpnt requested review from MattiSG and clementbiron May 2, 2024 15:35
@Ndpnt Ndpnt force-pushed the extraction-errors-as-operationnal-errors branch from d8383eb to 5cf29dc Compare May 2, 2024 15:37
@Ndpnt Ndpnt force-pushed the extraction-errors-as-operationnal-errors branch from 5cf29dc to 9bead1c Compare May 2, 2024 16:00
Copy link
Member

@clementbiron clementbiron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds great and, well done!

src/archivist/extract/index.js Outdated Show resolved Hide resolved
Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like very impactful and well-thought-out changes! I didn't review src/archivist/extract/index.test.js.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
src/archivist/extract/errors.js Outdated Show resolved Hide resolved
src/archivist/extract/index.js Show resolved Hide resolved
src/archivist/extract/index.js Show resolved Hide resolved
src/reporter/index.js Outdated Show resolved Hide resolved
src/reporter/labels.json Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rename: termsWithNoText or termsWithoutText.

src/archivist/index.js Show resolved Hide resolved
src/archivist/index.js Show resolved Hide resolved
Co-authored-by: Matti Schneider <matti@opentermsarchive.org>
Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming typos and copywriting changes are taken into account, LGTM! 👏
So many fixes and improvements, nice! There were a few riders though, such as the terms types filtering and count correction; it's always better to ship them separately 😉

src/archivist/index.js Show resolved Hide resolved
src/archivist/index.js Show resolved Hide resolved
src/archivist/fetcher/fullDomFetcher.js Show resolved Hide resolved
src/archivist/fetcher/index.js Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
src/archivist/extract/index.js Show resolved Hide resolved
Ndpnt and others added 5 commits May 8, 2024 10:47
Co-authored-by: Clément Biron <clement.biron@gmail.com>
Co-authored-by: Matti Schneider <matti@opentermsarchive.org>
Co-authored-by: Matti Schneider <matti@opentermsarchive.org>
@Ndpnt Ndpnt enabled auto-merge May 8, 2024 08:57
@Ndpnt Ndpnt merged commit 572511c into main May 8, 2024
9 checks passed
@Ndpnt Ndpnt deleted the extraction-errors-as-operationnal-errors branch May 8, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants