Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report request: items where one or more files is missing file size information #4990

Closed
andrewjbtw opened this issue May 14, 2024 · 3 comments
Assignees
Labels
cocina reports request for report on metadata content

Comments

@andrewjbtw
Copy link

andrewjbtw commented May 14, 2024

Required so that we can implement: sul-dlss/cocina-models#708

We need file sizes in Cocina to keep the embed viewer happy. Some items, probably all pre-dating the Fedora migration, may have files where file size was never recorded.

Ideal output:
druid,list,of,files,without,size

Acceptable output:
druid (only)

@andrewjbtw andrewjbtw added the cocina reports request for report on metadata content label May 14, 2024
@jcoyne jcoyne self-assigned this May 17, 2024
jcoyne added a commit that referenced this issue May 17, 2024
jcoyne added a commit that referenced this issue May 17, 2024
jcoyne added a commit that referenced this issue May 17, 2024
@jcoyne
Copy link
Contributor

jcoyne commented May 17, 2024

There were only four items:

dor_services@dor-services-app-prod-b:~/dor_services/current$ bin/rails r -e production "FilesWithNoSize.report" > files_with_no_size_attribute.csv
dor_services@dor-services-app-prod-b:~/dor_services/current$ wc -l files_with_no_size_attribute.csv
5 files_with_no_size_attribute.csv
dor_services@dor-services-app-prod-b:~/dor_services/current$ cat files_with_no_size_attribute.csv
item_druid
druid:pb386cr1662
druid:np008tj3137
druid:hg262qg8763
druid:yx310tq1633

@lwrubel
Copy link
Contributor

lwrubel commented May 20, 2024

@andrewjbtw what would you advise about these four druids?

@andrewjbtw
Copy link
Author

andrewjbtw commented May 20, 2024

Two of these items have problems related to being partially but not actually accessioned:

  • druid:hg262qg8763
  • druid:pb386cr1662

Both are from the same project. I've contacted the content owner about one of them and wasn't aware of the other one before. I can try to work out what to do. The issue is that the files are listed in the Cocina because accessioning must have been started, but accessioning was never completed.

These two I think are fine now and came up in the report because they were still in accessioning:

  • druid:np008tj3137
  • druid:yx310tq1633

In the big picture, I don't think we can require file size in Cocina (for sul-dlss/cocina-models#708) unless we change accessioning so that file size is generated differently. In assemblyWF, I don't think file size is computed until exif-collect. That's probably because assemblyWF doesn't have the full set of files for an object until after jp2-create has run. This must be why we never required file size in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cocina reports request for report on metadata content
Projects
None yet
Development

No branches or pull requests

4 participants