Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully implement PEP 527 #6792

Closed
di opened this issue Oct 8, 2019 · 13 comments · Fixed by #7529
Closed

Fully implement PEP 527 #6792

di opened this issue Oct 8, 2019 · 13 comments · Fixed by #7529

Comments

@di
Copy link
Member

di commented Oct 8, 2019

We still retain the ability for some projects to upload the legacy filetypes listed in PEP 527:

https://github.com/pypa/warehouse/blob/aa0d54019c322b52cb6428780808816d417abbd1/warehouse/packaging/models.py#L124

https://github.com/pypa/warehouse/blob/e1d0e4e41738fd07a0edeb77f95fedba0fdd41f8/warehouse/forklift/legacy.py#L463-L480

https://github.com/pypa/warehouse/blob/e1d0e4e41738fd07a0edeb77f95fedba0fdd41f8/warehouse/forklift/legacy.py#L1167-L1174

We should audit which projects currently have this ability, and whether they are still publishing deprecated filetypes. For example, Pillow is no longer publishing bdist_wininst files.

@pradyunsg
Copy link
Contributor

pradyunsg commented Oct 9, 2019

IMO we should also push these projects to drop usage of the legacy formats, and if not, at least get a good understanding if the issue is a "the current toolchain doesn't satisfy workflows like the legacy formats" or "oh, we don't need it" or something else entirely.

@hugovk
Copy link
Contributor

hugovk commented Oct 9, 2019

How long is the list?

If not too long, let's create issues at these projects, it's likely they're not aware of the deprecation. I'd expect many are already using wheels and can ditch the legacy formats.

For example with Pillow, I didn't know they were deprecated until it came up at https://discuss.python.org/t/deprecate-bdist-wininst/1929/12?u=hugovk, and we already distribute wheels so it was "oh, we don't need it".

And how about adding a deprecation warning to Twine when uploading them?

@di
Copy link
Member Author

di commented Oct 9, 2019

How long is the list?

IIRC the "list" was any project that had previously uploaded one of these filetypes, essentially we only blocked new projects.

A better list would be every project that has this ability that has actually published one of these filetypes in the last N months.

I'm guessing that this list is short enough that adding a deprecation notice to Twine would be unnecessary, but hard to say until we actually make an audit.

@di
Copy link
Member Author

di commented Oct 28, 2019

There are currently 4,678 projects that have allow_legacy_files set:

warehouse=> select count(*) from projects where allow_legacy_files;
 count
-------
  4678
(1 row)

Recent uploads for individual deprecated filetypes:

bdist_dmg:

warehouse=> select filename, upload_time from release_files where packagetype='bdist_dmg' order by upload_time desc limit 10;
                    filename                    |        upload_time
------------------------------------------------+----------------------------
 python_igraph-0.7.1.post6-py2.7-macosx10.9.dmg | 2015-06-05 20:58:15.702734
 python_igraph-0.7.1.post6-py2.6-macosx10.9.dmg | 2015-06-05 20:56:47.202244
 python_igraph-0.7.1_4-py2.7-macosx10.9.dmg     | 2015-03-05 21:11:20.376493
 python_igraph-0.7.1_4-py2.6-macosx10.9.dmg     | 2015-03-05 21:10:22.376507
 python_igraph-0.7.1_3-py2.7-macosx10.9.dmg     | 2015-03-05 20:30:28.362479
 python_igraph-0.7.1_3-py2.6-macosx10.9.dmg     | 2015-03-05 20:28:18.772667
 python_igraph-0.7.1_2-py2.7-macosx10.9.dmg     | 2015-02-10 20:29:57.860577
 python_igraph-0.7.1_2-py2.6-macosx10.9.dmg     | 2015-02-10 20:12:25.660451
 python_igraph-0.7.1_1-py2.7-macosx10.9.dmg     | 2015-02-10 07:56:59.793387
 python_igraph-0.7.1_1-py2.6-macosx10.9.dmg     | 2015-02-09 20:48:42.300196
(10 rows)

bdist_dumb:

warehouse=> select filename, upload_time from release_files where packagetype='bdist_dumb' order by upload_time desc limit 10;
                    filename                     |        upload_time
-------------------------------------------------+----------------------------
 airspeed-0.5.13.macosx-10.14-x86_64.tar.gz      | 2019-10-22 00:49:01.646779
 py_nifty_cloud-0.9.5.macosx-10.14-x86_64.tar.gz | 2019-09-28 14:32:32.022959
 algorithmia-1.2.0.linux-x86_64.tar.gz           | 2019-08-02 19:12:17.972642
 htrc-0.1.51.macosx-10.7-x86_64.tar.gz           | 2019-07-30 14:19:25.724787
 htrc-0.1.51b1.macosx-10.7-x86_64.tar.gz         | 2019-07-24 17:42:03.053786
 htrc-0.1.51b0.macosx-10.7-x86_64.tar.gz         | 2019-07-24 15:36:59.304947
 airspeed-0.5.12.macosx-10.14-x86_64.tar.gz      | 2019-07-24 06:32:12.329112
 pysodium-0.7.2.linux-x86_64.tar.gz              | 2019-06-25 14:30:59.086794
 htrc-0.1.50.macosx-10.7-x86_64.tar.gz           | 2019-06-21 14:43:12.68958
 htrc-0.1.50b0.macosx-10.7-x86_64.tar.gz         | 2019-06-20 17:23:49.411197
(10 rows)

bdist_msi:

warehouse=> select filename, upload_time from release_files where packagetype='bdist_msi' order by upload_time desc limit 10;
              filename               |        upload_time
-------------------------------------+----------------------------
 pywincffi-0.5.0.win32-py3.6.msi     | 2017-11-18 18:55:27.694295
 pywincffi-0.5.0.win32-py3.5.msi     | 2017-11-18 18:55:26.221545
 pywincffi-0.5.0.win32-py3.4.msi     | 2017-11-18 18:55:25.067028
 pywincffi-0.5.0.win32-py3.3.msi     | 2017-11-18 18:55:23.828515
 pywincffi-0.5.0.win32-py2.7.msi     | 2017-11-18 18:55:22.319013
 pywincffi-0.5.0.win-amd64-py3.6.msi | 2017-11-18 18:55:21.128075
 pywincffi-0.5.0.win-amd64-py3.5.msi | 2017-11-18 18:55:20.016371
 pywincffi-0.5.0.win-amd64-py3.4.msi | 2017-11-18 18:55:18.597794
 pywincffi-0.5.0.win-amd64-py3.3.msi | 2017-11-18 18:55:17.357411
 pywincffi-0.5.0.win-amd64-py2.7.msi | 2017-11-18 18:55:16.060902
(10 rows)

bdist_rpm:

warehouse=> select filename, upload_time from release_files where packagetype='bdist_rpm' order by upload_time desc limit 10;
              filename               |        upload_time
-------------------------------------+----------------------------
 Aglyph-3.0.0-1.noarch.rpm           | 2018-03-16 02:39:14.304743
 Aglyph-3.0.0-1.src.rpm              | 2018-03-16 02:39:08.195363
 toughradius-5.0.0.6-1.noarch.rpm    | 2017-11-19 04:45:14.203082
 toughradius-5.0.0.6-1.src.rpm       | 2017-11-19 04:45:09.635322
 python-otopi-mdp-0.2.2-1.noarch.rpm | 2017-10-02 10:23:32.79819
 python-otopi-mdp-0.2.2-1.src.rpm    | 2017-10-02 10:23:23.707941
 toughradius-5.0.0.5-1.noarch.rpm    | 2017-08-20 07:48:30.370971
 toughradius-5.0.0.5-1.src.rpm       | 2017-08-20 07:48:26.664831
 cx_Oracle-6.0rc1-py35-1.x86_64.rpm  | 2017-06-17 00:14:38.838786
 cx_Oracle-6.0rc1-py27-1.x86_64.rpm  | 2017-06-17 00:14:20.067593
(10 rows)

bdist_wininst:

warehouse=> select filename, upload_time from release_files where packagetype='bdist_wininst' order by upload_time desc limit 10;
                filename                 |        upload_time
-----------------------------------------+----------------------------
 GPy-1.9.9.win-amd64-py3.7.exe           | 2019-10-17 08:37:05.494866
 GPy-1.9.9.win-amd64-py3.6.exe           | 2019-10-17 08:28:54.341581
 GPy-1.9.9.win-amd64-py3.5.exe           | 2019-10-17 08:10:01.336052
 GPy-1.9.9.win-amd64-py2.7.exe           | 2019-10-17 08:01:52.12953
 Trac-1.0.19.win-amd64.exe               | 2019-10-15 00:36:35.161141
 Trac-1.0.19.win32.exe                   | 2019-10-15 00:36:30.520495
 xrayutilities-1.5.3.win-amd64-py3.7.exe | 2019-10-09 10:12:48.375835
 xrayutilities-1.5.3.win-amd64-py3.6.exe | 2019-10-09 10:12:44.698438
 xrayutilities-1.5.3.win-amd64-py3.5.exe | 2019-10-09 10:12:41.362779
 xrayutilities-1.5.3.win-amd64-py2.7.exe | 2019-10-09 10:12:38.179182
(10 rows)

Of these it looks like bdist_dmg, bdist_msi, and bdist_rpm can just be shut off.

The bdist_wininst filetype is still getting a lot of uploads, but PEP 527 says that this is misleading:

It's quite easy to look at the low usage of bdist_dmg and bdist_msi and conclude that removing them will be fairly low impact, however bdist_wininst has several orders of magnitude more usage. This is somewhat misleading though, because although it has more people uploading those files the actual usage of those uploaded files is fairly low. Taking a look at the previous 30 days, we can see that 90% of all downloads of bdist_winist files from PyPI were generated by the mirroring infrastructure and 7% of them were generated by setuptools (which can currently be better covered by bdist_egg files).

Also bdist_dumb is still getting the occasional upload, but these projects would probably be better served by uploading wheels if they want platform-specific built distributions.

@hugovk
Copy link
Contributor

hugovk commented Nov 5, 2019

Thanks. From the PEP's removal process:

Finally, an email will be generated to the maintainers of all projects still given the legacy flag, which will inform them of the upcoming new restrictions on uploads and tell them that these restrictions will be applied to future uploads to their projects starting in 1 month. Finally, after 1 month all projects will have the legacy file type flag removed, and support for uploading these types of files will cease to exist on PyPI.

Would now be a good time to send out the email?

And to just bdist_dmg, bdist_msi, and bdist_rpm users first, or to all legacy format users?

@pradyunsg
Copy link
Contributor

ooooh! Nice find @hugovk! ^>^

I'm in favor of dropping all legacy formats if the PEP has a clear mechanism to do so.

@di
Copy link
Member Author

di commented Nov 5, 2019

I don't really think it's necessary to email all 4,678 projects when only a small fraction have actually used their legacy flag recently. Perhaps we should set a timeframe instead: if they've uploaded a deprecated distribution type in the last year?

@tawannacab87

This comment has been minimized.

@hugovk
Copy link
Contributor

hugovk commented Nov 5, 2019

Yes, that sound reasonable.

@pradyunsg
Copy link
Contributor

Gentle nudge on this, given https://discuss.python.org/t/3115.

@di
Copy link
Member Author

di commented Feb 3, 2020

OK, next steps would be:

  1. querying to get the subset of the 4,678 projects that have allow_legacy_files set which have had an upload in the last year
  2. querying for email addresses for all their maintainers
  3. drafting the email
  4. sending a bulk email

Would anyone like to help with #​3?

@hugovk
Copy link
Contributor

hugovk commented Feb 3, 2020

  1. Something along the lines of this?

Hello,

We're emailing because you're listed as the maintainer for a package that has uploaded a legacy file type to PyPI in the past year:

bdist_dmg
bdist_dumb
bdist_msi
bdist_rpm
bdist_wininst

Following PEP 527, it will soon not be possible to upload legacy file types. Existing uploads will remain on PyPI, but soon new ones cannot be uploaded.

https://www.python.org/dev/peps/pep-0527/

This restriction will apply to new uploads after 2020-04-01 [TODO decide exact date, must be at least 1 month from email date].

See PEP 527 for suggestions of replacement file types, and if you have any questions, please visit #6792 [TODO or https://discuss.python.org/somewhere or somewhere else?].

Thank you,

[TODO]

@di
Copy link
Member Author

di commented Mar 12, 2020

OK, I've sent the notices to everyone that's uploaded one of these packages in the last year. The shutoff date is 30 days from today (2020-04-12).

For posterity, here's the SQL script I used to generate the affected users/projects:

SELECT 
  user_id, 
  projects.name as project_name, 
  packagetype 
FROM 
  (
    SELECT 
      roles.user_id as user_id, 
      roles.project_id as project_id, 
      packagetype 
    FROM 
      (
        SELECT 
          project_id, 
          packagetype 
        FROM 
          (
            SELECT 
              release_id, 
              packagetype 
            FROM 
              release_files 
            WHERE 
              (
                packagetype IN (
                  'bdist_dmg', 'bdist_dumb', 'bdist_msi', 
                  'bdist_rpm', 'bdist_wininst'
                ) 
                AND "upload_time" > (
                  localtimestamp - interval '365 days'
                )
              ) 
            GROUP BY 
              release_id, 
              packagetype
          ) f 
          JOIN releases ON releases.id = f.release_id 
        GROUP BY 
          project_id, 
          packagetype
      ) release 
      JOIN roles ON release.project_id = roles.project_id 
    GROUP BY 
      user_id, 
      roles.project_id, 
      packagetype
  ) p1 
  JOIN projects ON p1.project_id = projects.id;

Ran that like so:

psql service=pypi -t -A -F"," -f pep527.sql > pep527.csv

Then used the following script to turn that output into a CSV of mass emails:

import csv
from collections import defaultdict

users = defaultdict(list)
subject = "[PyPI] Notice: Deprecation of underused file types/extensions"
body_template = """
Hello,

We're emailing because you're listed as a maintainer or owner for a package that has uploaded a legacy file type to PyPI in the past year:

{project_list}

Following PEP 527, it will soon not be possible to upload legacy file types.

https://www.python.org/dev/peps/pep-0527/

This restriction will apply to new uploads after 30 days from today (2020-04-12). Existing uploads will remain on PyPI, but soon new ones cannot be uploaded.

See PEP 527 for suggestions of replacement file types, and if you have any questions, please comment on the tracking issue for this deprecation:

https://github.com/pypa/warehouse/issues/6792

Thank you,
The PyPI Administrators
"""

with open("pep527.csv") as f:
    reader = csv.DictReader(f)

    for row in reader:
        users[row["user_id"]].append((row["project_name"], row["packagetype"]))

with open("pep527-complete.csv", "w") as f:
    writer = csv.DictWriter(f, fieldnames=["user_id", "subject", "body_text"])
    writer.writeheader()

    for user_id, projects in users.items():
        project_list = "\n".join(
            f"* Project: {project_name}, package type: {packagetype}"
            for project_name, packagetype in projects
        )

        writer.writerow(
            {
                "user_id": user_id,
                "subject": subject,
                "body_text": body_template.format(project_list=project_list),
            }
        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants