Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Content-Disposition for output file name in Webcmdlets #19385

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

CarloToso
Copy link
Contributor

@CarloToso CarloToso commented Mar 21, 2023

PR Summary

  • Consider Content-Disposition in the autonaming of downloaded files
  • Add handling of edge case in naming
  • Add tests

PR Context

Contributes to #11671 follow up to #19007

PR Checklist

@CarloToso
Copy link
Contributor Author

CarloToso commented Mar 22, 2023

This PR works correctly, the new tests don't

In HttpBin this code works as expected:

$ContentDisposition = [System.Net.Http.Headers.ContentDispositionHeaderValue]::new("attachment")
$ContentDisposition.FileName = 'DownloadedFile.txt'

$x = Invoke-WebRequest https://httpbin.org/response-headers?Content-Disposition=$ContentDisposition -OutFile .\ -PassThru -Verbose
#--> DownloadedFile.txt

@ImportTaste
Copy link
Contributor

ImportTaste commented Mar 22, 2023

Though it named the file correctly, the Verbose message was VERBOSE: File Name: Downloads (Downloads is the name of the directory in this case, not the resulting file).

CarloToso and others added 3 commits March 23, 2023 09:22
return Path.Join(_qualifiedOutFile, contentDisposition);
}

if (response.RequestMessage.RequestUri.PathAndQuery != "/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why we need PathAndQuery. Maybe add a comment in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it to handle this case:

Invoke-WebRequest https://www.google.com/ -OutFile .\  #PathAndQuery == "/"
#--> OutFile: www_google_com

@iSazonov iSazonov changed the title Webcmdlets -OutFile content-disposition Use Content-Disposition for output file name in Webcmdlets Mar 24, 2023
@iSazonov iSazonov added Documentation Needed in this repo Documentation is needed in this repo CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log labels Mar 24, 2023

return Directory.Exists(_qualifiedOutFile) ? Path.Join(_qualifiedOutFile, lastUriSegment) : _qualifiedOutFile;
// File name not found use sanitized Host name instead
return Path.Join(qualifiedOutFile, response.RequestMessage.RequestUri.Host.Replace('.', '_'));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want we explicitly confirm this fallback.
/cc @SteveL-MSFT @mklement0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared with curl.
Curl simply fails autonaming if the url has no last segment-
Cattura4

{
// Get file name from last segment of Uri
string lastUriSegment = System.Net.WebUtility.UrlDecode(response.RequestMessage.RequestUri.Segments[^1]);
internal static string GetOutFilePath(HttpResponseMessage response, string qualifiedOutFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be more smart - check File.Exists for result file name and if there is a conflict add (1) or (2) and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl doesn't check for existing files and just overwrites

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this too. Perhaps NoClobber should be added.

@CarloToso
Copy link
Contributor Author

TODO in follow up PR: add sanitation such as #11671 (comment)

Comment on lines 918 to 919
// Resume requires OutFile and can't be used with OutFolder..
if (Resume.IsPresent && OutFile is null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Resume can't be used with OutFolder? I'd expect we get the same file name from the request.

And typo.

Suggested change
// Resume requires OutFile and can't be used with OutFolder..
if (Resume.IsPresent && OutFile is null)
// Resume requires OutFile and can't be used with OutFolder.
if (Resume.IsPresent && OutFile is null)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be used because the filename for OutFolder is calculated after the response and Resume is before the request

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not for the PR. We could make first request with range 0-0 (first byte) and see if we got the filename. It is not expensive.

@CarloToso
Copy link
Contributor Author

Unrelated test failures

@pull-request-quantifier-deprecated

This PR has 119 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Medium
Size       : +86 -33
Percentile : 43.8%

Total files changed: 7

Change summary by file extension:
.cs : +28 -16
.resx : +8 -5
.ps1 : +50 -12

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@mklement0
Copy link
Contributor

@SteveL-MSFT, the suggested logic in the WG decision makes sense, but introducing a new parameter -OutFolder, alongside -OutFile would be confusing, given that since v7.4.0 -OutFile already accepts a directory path (in which case it uses the ultimate target URL's last segment as the file name).

Wouldn't it make more sense to add the functionality to -OutFile when given a directory path? (Technically, that would be a breaking change from 7.4.0, however)

@SteveL-MSFT
Copy link
Member

@mklement0 maybe it would be better to have -OutPath with -OutFile alias.

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Review - Needed The PR is being reviewed label May 13, 2024
@ImportTaste
Copy link
Contributor

ImportTaste commented May 13, 2024

Wouldn't it make more sense to add the functionality to -OutFile when given a directory path? (Technically, that would be a breaking change from 7.4.0, however)

Why not add a -ContentDisposition switch? (-CD, -CDisp, & -Disp aliases; there really aren't enough parameter alises in PowerShell.) Then it'll change the behavior of -OutFile to derive its filename from the Content-Disposition header.

And I agree with Steve that making -OutPath the true parameter name and making -OutFile an alias would be the best approach for clarity.

@mklement0
Copy link
Contributor

@ImportTaste:

I can see the appeal of the additional -ContentDisposition switch allowing you to explicitly choose between using the content-disposition filename and the last-ultimate-URL-segment filename when a directory path is passed to -OutFile / -OutPath.

But to me the question is whether this level of control is needed in practice; perhaps the suggested implied logic (use content-disposition filename, if present, fall back to the last URL segment) is enough? The appeal is that you'll get the server-suggested filename by default, if available.

That said, as noted, this would technically be a breaking change to the current -OutFile behavior; if that is a concern, an additional parameter is a must.

@SteveL-MSFT SteveL-MSFT added WG-NeedsReview Needs a review by the labeled Working Group and removed WG-Reviewed A Working Group has reviewed this and made a recommendation labels May 15, 2024
@SteveL-MSFT
Copy link
Member

The WG discussed this. Looks like in 7.4 if -OutFile goes to a folder, then then last segment of the URL will be used as the filename in that path
Otherwise, if -outFile is to a file, then that is used as the file to write to disk
With this PR, we can recommend:

  • If content-disposition header is set and -outfile goes to a folder, then use that name instead of last segment
  • If content-disposition header is set and -outfile goes to a file, use the user specified file
  • Rename -outfile to -outpath and add -outfile alias

Order of precedence:

  • user specified file with -outpath
  • content-disposition header if exists
  • last segment of URL

We believe this is a bucket 3 breaking change only for the case where content-disposition is set and user expected the last segment to be used. Expectation is that user already uses -PassThru if they expected the last segment to be used to know for sure the name of the file that was downloaded. As the ability to specify a folder with -outfile was only introduced in 7.4, the number of users that could be breaking is greatly diminished.

@SteveL-MSFT SteveL-MSFT assigned iSazonov and unassigned PaulHigin May 15, 2024
@SteveL-MSFT SteveL-MSFT added WG-Reviewed A Working Group has reviewed this and made a recommendation and removed WG-NeedsReview Needs a review by the labeled Working Group labels May 15, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Review - Needed The PR is being reviewed label May 23, 2024
Copy link
Contributor

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Documentation Needed in this repo Documentation is needed in this repo Medium Needs-Triage The issue is new and needs to be triaged by a work group. Review - Needed The PR is being reviewed WG-Cmdlets general cmdlet issues WG-Reviewed A Working Group has reviewed this and made a recommendation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants