Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 list_objects_v2 paginator MaxItems only counts keys (Contents) not prefixes (CommonPrefixes) #2376

Open
bsmedberg-xometry opened this issue Apr 7, 2020 · 14 comments
Labels
documentation This is a problem with documentation. feature-request This issue requests a feature. p2 This is a standard priority issue pagination s3

Comments

@bsmedberg-xometry
Copy link

Describe the bug

When using boto3 to iterate an S3 bucket with a Delimiter, MaxItems only counts the keys, not the prefixes. So if you have a bucket with only prefixes, MaxItems will never stop searching and may take unbounded time.

Steps to reproduce

Set up a bucket with 20000 keys of the form result1/results.txt ... result20000/results.txt

Run this code:

import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
    for prefix in result.get('CommonPrefixes', []):
        print("prefix {}".format(prefix['Prefix']))
    for key in result.get('Contents', []):
        print("key {}".format(key['Key'])

Expected behavior
The above program should return a maximum of 2000 keys. It actually returns all 20,000 keys, because MaxItems doesn't count prefixes.

@bsmedberg-xometry bsmedberg-xometry added the needs-triage This issue or PR still needs to be triaged. label Apr 7, 2020
@swetashre swetashre self-assigned this Apr 8, 2020
@swetashre swetashre added bug This issue is a confirmed bug. s3 and removed needs-triage This issue or PR still needs to be triaged. labels Apr 8, 2020
@swetashre
Copy link
Contributor

@bsmedberg-xometry - I am able to reproduce the issue. Thank you for pointing it out. Marking this as bug.

@swetashre
Copy link
Contributor

@bsmedberg-xometry - After some digging into the code base i found that this is the expected behavior. When it is not a first request(means starting token is not included) then we are only considering the first result key response to truncate. So in this case we are only considering to truncate the response of Contents not CommonPrefixes.

Can you please confirm whether its the number of keys you are getting , more than MaxItems or the number of commonprefixes ?

@swetashre swetashre added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed bug This issue is a confirmed bug. labels Apr 13, 2020
@bsmedberg-xometry
Copy link
Author

In this case, there will be nothing in Contents ever, there will only be CommonPrefixes.

And the problem is not the truncation: the problem is that even after getting 2000 CommonPrefixes, it keeps making calls forever.

I have worked around this locally by doing something like this:

import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
found_keys = 0
found_prefixes = 0
for result in paginator.paginate(Bucket='mybucket', Delimiter='/', PaginationConfig={'MaxItems': 2000}):
    found_prefixes += len(result.get('CommonPrefixes', []))
    found_keys += len(result.get('Contents', []))
    if found_prefixes + found_keys > 2000:
        break # stop iterating here to prevent eternal iteration

However I don't believe that this is or should be the expected behavior of the boto3 paginator. If this is is the expected behavior of the paginator, then the paginator docs need to be updated to warn of this behavior.

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Apr 13, 2020
@swetashre swetashre assigned kdaily and unassigned swetashre Mar 25, 2021
@shepazon
Copy link
Contributor

shepazon commented Apr 1, 2021

While I agree with @bsmedberg-xometry (hey man, what's up? 😁) that this behavior seems really wrong (what good is the MaxItems option if the thing will just ignore it?), I am looking into what I can do to the documentation to explain this if in fact it's truly intended to work this way.

That said, I think it would be very helpful to be able to explain why it works this way. Can someone explain that to me, so I can write this up properly?

@bsmedberg-xometry
Copy link
Author

Or even better: add a behavior to the S3 listobjects API call that actually does the right thing. I'm happy for this to turn into an S3 feature request instead of a boto problem if the problem is actually that S3 doesn't provide a logical pagination API.

@shepazon
Copy link
Contributor

@kdaily LMK whether this should be documented as per the current behavior or if this is going to be fixed or otherwise addressed in the code, so I can slot in updating the documentation.

@kdaily kdaily added bug This issue is a confirmed bug. documentation This is a problem with documentation. and removed bug This issue is a confirmed bug. labels Jun 9, 2021
@github-actions
Copy link

github-actions bot commented Jun 9, 2022

Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 9, 2022
@bsmedberg-xometry
Copy link
Author

I believe that this issue is still valid. Neither of the docs at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#customizing-page-iterators or https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2 document this behavior, and there is still no way to limit the pagination to both prefixes and keys.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 10, 2022
@aBurmeseDev aBurmeseDev added the p2 This is a standard priority issue label Nov 11, 2022
@tim-finnigan tim-finnigan added feature-request This issue requests a feature. pagination labels Nov 21, 2022
@shepazon
Copy link
Contributor

@nateprewitt: please note whether this is something that needs to be addressed in boto3, or if it's a documentation issue and how it should be handled.

@faruk-imerit
Copy link

Hi there, issue still persists. Is there any plan for resolution?

@shepazon
Copy link
Contributor

I am working on collecting information to update content to reflect this information but it is taking time as I want to be sure I say the truth and not just a best guess at what everything means. Getting this resolved is on my current list of tasks, but it has been for a few weeks. It has not been forgotten!

@shepazon
Copy link
Contributor

@bsmedberg-xometry I just read through this again -- you say that the paginator loops infinitely? I can't imagine why that would be expected behavior. Can someone please confirm this again for me? I can't reproduce that. Instead, I get 20,000 CommonPrefixes and 1 item found and placed in Contents, then the paginator exits.

@shepazon
Copy link
Contributor

shepazon commented Apr 2, 2024

I have completed updating the documentation update for Botocore, and that PR is waiting to be merged. I'm now starting work on the same changes to the Boto3 guide.

@shepazon
Copy link
Contributor

shepazon commented Apr 3, 2024

The PR for the boto3 guide changes is also now complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This is a problem with documentation. feature-request This issue requests a feature. p2 This is a standard priority issue pagination s3
Projects
None yet
Development

No branches or pull requests

7 participants