Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Caption and Transcript Uploads Via API #5801

Open
3 of 10 tasks
joncameron opened this issue Apr 23, 2024 · 5 comments
Open
3 of 10 tasks

Support Caption and Transcript Uploads Via API #5801

joncameron opened this issue Apr 23, 2024 · 5 comments
Assignees

Comments

@joncameron
Copy link
Contributor

joncameron commented Apr 23, 2024

Description

Following from #5710; caption and transcripts should be manipulable via the Avalon API so that these documents can be uploaded and attached to records programatically.

As a user, I want to programmatically add captions, transcripts and supplemental files to master files on a media object.

API

Requirements

For a masterfile:

  • Upload/attach and set metadata:

    • Captions
      • Label
      • Language
      • Machine Generated?
      • Treat as Transcript?
    • Transcripts
      • Label
      • Language
      • Machine Generated?
    • Supplemental Files
      • Label
  • Deliberately nice API responses would be nice; that could be an extension past this work

    • We don't have a specific standard we're using right now for JSON responses/requests
    • This could offer pagination, other niceties

Routes

Questions

  • does Content Type need to be explicitly added for requests? VTT, DOCX, etc.

    • We probably don't require this, since Avalon is using the extension to determine mime type
  • Could we do both single request upload and multi-part upload with metadata in json?

    • Yes, but we can do the multiple request style for now, and maybe implement the multi-part upload style later if desired
  • Does the request URL need an extra parameter to return JSON? As in /master_files/#{fedora_id}/supplemental_files?format=json. I'm guessing we don't want to clobber the existing responses from the route.

    • Yes, we will need this to make sure JSON is returned and not a form data response
  • There is already some json handling (not sure what the extent is; may have been autogenerated with scaffolding), so we'll need to make sure existing usage doesn't get changed or is moved to a different route if needed

Request: Add new supplemental file to masterfile

POST /master_files/#{fedora_id}/supplemental_files
Content-Type: application/octet-stream
Content-Disposition: file; filename="filename.jpg"

[… binary file data …]

Request: replace existing supplemental file with a new binary file

PUT /master_files/#{fedora_id}//supplemental_files/#{id}
Content-Type: application/octet-stream
Content-Disposition: file; filename="filename.jpg"

[… binary file data …]

Request: Get data on supplemental file

GET /master_files/#{fedora_id}//supplemental_files/#{id}.json

    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    }

Request: Update data on supplemental file

PUT /master_files/#{fedora_id}//supplemental_files/#{id}.json

    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    }

Request: Get listing of supplemental files on masterfile

GET /master_files/ns064602j/supplemental_files.json
Returns an array of supplemental files

[
    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    },
    {
        "id": "141",
        "type "transcript",
        "label": "Labelforit.vtt",
        "language": "French",
        "machine_generated": "1"
    }
]

Request: Delete supplemental file

DELETE /master_files/#{fedora_id}//supplemental_files/#{id}
Deletes a supplemental file from the masterfile

HTTP Response; no JSON returned

Done Looks Like

  • Routes implemented as per above
  • Routes respect Avalon API key authorization (in HTTP header)
  • Parent media object is saved and indexed as needed when supplemental files have new or updated information/files

QA

  • label, language and machine-generated fields populate from metadata when creating a new supplemental file
  • treat as transcript field populates from metadata when creating a new caption file
  • able to upload caption, transcript, supplemental file based on metadata parameters
  • PUT request is able to upload both metadata (label, language, machine-generated, treat as transcript) and supplemental file itself. This includes being able to change a caption to a transcript, etc.
  • for captions, should only be able to ingest .vtt or .srt files (should get an error)
  • when creating a new caption, can create with metadata only, or with metadata + file, or just file
  • if only a file is provided, will be created as a new supplemental file (not caption or transcript)

Current Caption Upload Example

-----------------------------121240327742272709152918634858
Content-Disposition: form-data; name="authenticity_token"

lPz5ffZw/vXCFXNyCZdepS9+UZnf8BcFhAG0bgi8sBHGuZadKvDIfZsA/QHP/7eK46qzEFkd3Rh2rJkVI9ymaw==
-----------------------------121240327742272709152918634858
Content-Disposition: form-data; name="supplemental_file[tags][]"

caption
-----------------------------121240327742272709152918634858
Content-Disposition: form-data; name="supplemental_file[file]"; filename="lunchroom manners.srt"
Content-Type: text/x-srt

1
00:00:01,200 --> 00:00:21,000
[music]

2
00:00:22,200 --> 00:00:26,600
Just before lunch one day, a puppet show 
was put on at school.

3
00:00:26,700 --> 00:00:31,500
It was called "Mister Bungle Goes to Lunch".

... (rest of the file here)

-----------------------------121240327742272709152918634858--

Current Form Data send on POST

{
	"_method": "put",
	"supplemental_file[label]": "Hipparchus (146 to 127 B.C.).vtt",
	"supplemental_file[language]": "French",
	"treat_as_transcript_131": "1",
	"machine_generated_131": "1",
	"cancel_edit_label": "",
	"save_label": ""
}
@joncameron
Copy link
Contributor Author

joncameron commented May 14, 2024

Need to adjust things on the language support for the API; Language value is set on the model based on default value. We'll need to update that behavior for this purpose, so that the value can be set up front when the supplemental file is created.

Should look at the JSON API standard and other schemas for API architecture.

GET to /supplemental_file/#{id} returns the binary; we don't have a place to get the metadata from one of these routes.
GET to /supplemental_file/#{id}.json (with appropriate headers) should get the JSON metadata about the file... and maybe offer a URL to the binary (/supplemental_file/#{id} or /supplemental_file/#{id}/caption etc.).
If mirroring for create/update, PUT or POST to .json should be the metadata and PUT or POST to /#{id} should be the binary.
For creation, we'd have to figure out what order we want to do this. Submit both things at the same time? What does our model expect or require?

@joncameron
Copy link
Contributor Author

The model has a couple required metadata fields but with Active Storage attachment, you can create the supplemental file object but not have a file attached immediately. We probably wouldn't be able to create the file without any associated metadata. At the very least it would need to be metadata first, file second. Ideally, though, find a way to bundle it all together.

@joncameron
Copy link
Contributor Author

joncameron commented May 14, 2024

@joncameron
Copy link
Contributor Author

Ex:
POST to /media_objects
get the masterfile ID
POST to /master_file/id/supplemental_files

  • Add a "parent_id" value to the JSON response (media object or masterfile)

When you POST to supplemental_files, it will save or update the media object, OR don't worry about it because it's handled by the handling on the masterfile.

Supplemental file create/update should ensure that saves and index updates are done accordingly as needed.

@masaball
Copy link
Contributor

masaball commented Jun 6, 2024

Request: Add new supplemental file to masterfile

Create with file and metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -F "file=@content_filepath" -F "metadata=<metadata_filepath" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with file and inline metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -F "file=@content_filepath" -F metadata='{"label": "Lunchroom", "language": "French"}' https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with just file:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -F "file=@content_filepath" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with just metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X POST -d @metadata_filepath -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Create with just metadata inline:

curl -H "Avalon-Api-Key:abcdef123456" -X POST '{"label": "Lunchroom", "language": "French"}' -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json

Response:
{ "supplemental_file": :supplemental_file_id }

Request: replace existing supplemental file with a new binary file

Update attached file:

curl -H "Avalon-Api-Key:abcdef123456" -X PUT -F "file=@content_filepath" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id

Response:
{"supplemental_file": :supplemental_file_id}

Request: Get data on supplemental file

curl -H "Avalon-Api-Key:abcdef123456" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Response:

    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    }

Request: Update data on supplemental file

Updating metadata requires ALL existing fields except language to be included in the payload. Any existing non-language metadata field that is left out of the payload will be removed.

Update metadata:

curl -H "Avalon-Api-Key:abcdef123456" -X PUT -d @metadata_filepath -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Payload file:

    {
        "type": "transcript",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "0",
        "machine_generated": "1"
    }

Update metadata inline:

curl -H "Avalon-Api-Key:abcdef123456" -X PUT -d '{"label": "label", "language": "French"}' -H "Content-Type:application/json" -H "Accept:application/json" https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Response:
{"supplemental_file": :supplemental_file_id}

Request: Get listing of supplemental files on masterfile

curl -H "Avalon-Api-Key:abcdef123456" "https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json"

Returns an array of supplemental files

[
    {
        "id": "131",
        "type": "caption",
        "label": "Hipparchus (146 to 127 B.C.).vtt",
        "language": "English",
        "treat_as_transcript": "1",
        "machine_generated": "1"
    },
    {
        "id": "141",
        "type "transcript",
        "label": "Labelforit.vtt",
        "language": "French",
        "machine_generated": "1"
    }
]

Paginated:

curl -H "Avalon-Api-Key:abcdef123456" "https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files.json?per_page=1&amp;page=2"

Returns array containing per_page results:

[
    {
        "id": "141",
        "type "transcript",
        "label": "Labelforit.vtt",
        "language": "French",
        "machine_generated": "1"
    }
]

Request: Delete supplemental file

curl -H "Avalon-Api-Key:abcdef123456" -X DELETE https://avalon-dev.dlib.indiana.edu/master_files/:master_file_id/supplemental_files/:id.json

Deletes a supplemental file from the masterfile

HTTP Response; no JSON returned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants