Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for type hints #439

Open
roeekl opened this issue Feb 12, 2020 · 20 comments · May be fixed by #695
Open

Support for type hints #439

roeekl opened this issue Feb 12, 2020 · 20 comments · May be fixed by #695

Comments

@roeekl
Copy link

roeekl commented Feb 12, 2020

It would be nice to add a support for type hints. I know most of the functions have str parameters and returning Dict but it still can help in other functions like prompt_for_user_token.

@stephanebruckert
Copy link
Member

Hey @roeekl, looks like this is new in python3.5. If there is a way to add it while keeping compatibility with older python versions, we would definitely merge such PR!

@IdmFoundInHim
Copy link
Contributor

IdmFoundInHim commented Apr 11, 2021

@stephanebruckert What is the minimum version of Python that spotipy 3 will need to support (considering 3.5 is already past EoL and the 3.6 EoL will probably pass before spotipy 3 comes out)?

3.7 introduced from __future__ import annotations, which helps with type hinting and may allow some post-3.7 improvements to be incorporated without breakage

@stephanebruckert
Copy link
Member

stephanebruckert commented Apr 11, 2021

@IdmFoundInHim I think we should choose what's easy for us. Developers should always think of upgrading their own version of python3. Starting at 3.7 sounds good to me?

@akathorn
Copy link

I use pylance strict type-checking in my project and the lack of type hints in Spotipy was irking me, so I started playing around with my own .pyi stubs using stubgen as a starting point.
I have annotated things lazily as I needed them, and soon I ran into the problem of representing the type of the JSON return values.
After playing around for a bit I realized that since Spotify responses have a fixed schema, they can be represented as TypedDicts with really nice results. I scrapped Spotify's documentation and generated a TypedDict for each return type. Fun stuff.

I have a small example of the type-checking here.

My favorite thing about this approach is that vscode (and certainly other tools) autocompletes the fields of the response:
Capture
Which means that you don't have to go to the Spotify docs or examine the dictionary at runtime to figure out what fields are inside, and what types they contain.
At runtime, the behavior is totally unmodified and they are still dicts. The new code is only used for type-checking.

Point is: if there is interest in this approach, I would be willing to go further with it and contribute it :)

@Peter-Schorn
Copy link
Contributor

@akathorn Open a pull request on the v3 branch with these changes. Type hints are an incredibly useful feature.

@akathorn akathorn linked a pull request Jun 23, 2021 that will close this issue
@stephanebruckert stephanebruckert linked a pull request Jun 26, 2021 that will close this issue
@akathorn
Copy link

akathorn commented Jul 1, 2021

I ran into an issue when representing Spotify JSON objects using TypedDict.
I didn't realize at first that some keys will be missing in certain circumstances. For instance, the key "restrictions" of AlbumObject is only present if there are restrictions for the album. This can't be represented with TypedDict yet, where it is assumed that all keys are always present.

Since there is more complexity in representing Spotify JSON objects, I'm going to split this work in two and send a simpler pull request annotating just the function parameters. I'll leave the return types as Dict[str, Any], which is not very expressive but better than nothing.

The current pull request will be for the more complex work of representing Spotify objects in the type system.

Regarding Spotify objects

As I mentioned, the problem right now is that keys in a TypeDict are always assumed to exist. This means that in here:

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials())
album = sp.album("4aawyAB9vmqN3uQ7FjRGTy")
restrictions = album["restrictions"]

The type checker will report that the type of restrictions is AlbumRestriction, which is misleading since during runtime this might end with a KeyError.

I'll think about it, but at this point I see two options:

  1. Warn the user that it might be necessary to double-check for the presence of certain keys. Of course this is tricky.
  2. PEP 655 solves this exact problem, so an option is to just wait until they implement it in typing_extensions.

@Peter-Schorn
Copy link
Contributor

Peter-Schorn commented Jul 2, 2021

This can't be represented with TypedDict yet, where it is assumed that all keys are always present.

Actually, there is a total parameter that allows you to control whether all keys must be present. See here. There isn't a simple way to mark individual keys as optional, but I think this solution is better than just using Dict[str, Any].

We could also take a completely different approach to the spotify objects: We could decode them into data classes instead of using dictionaries. tekore has already done this. I admit I like this idea more.

@IdmFoundInHim
Copy link
Contributor

Implementing dataclasses is a much bigger task than TypedDict: it would change how the data is accessed at runtime, breaking almost all spotipy-dependent code. It would also add the overhead of creating instances of a class every time a request is made. That could be justified if we leveraged the features of classes, e.g. abstracting paginated objects, but that would involve rewriting most of client.py. In short, it goes far beyond static type checking.

I would rather us not implement data from spotipy in a way that forces other projects to use an abundance of classes. I think we implement as total = false for now, and switch to NotRequired[...] as soon as it doesn't break anything. (It might be doable with the future import already.) Unfortunately, it may be difficult to deduce from the API documentation which values are sometimes absent.

@akathorn
Copy link

akathorn commented Jul 2, 2021

I thought about total=False, but it has the opposite problem in that all keys are assumed to be possibly missing. This means that to appease the gods of type-checking you must (as far as I know) write code like this:

if "name" in album:
    name = album["name"]

Or alternatively name = album.get("name", "default"). Which means that the typing scheme would encourage you to go against "ask for forgiveness, not for permission".

I personally think this makes using total=False a bit awkward. It is wonderful for autocomplete, but the type-checker is going to be complaining constantly, forcing to either ignore it or write defensive code that will be unnecessary once we can use NotRequired[...].

On the other hand, with Dict[str, Any] you lose a lot (A LOT) of information and safety, but the type-checker won't complain as easily:

restriction = album["restrictions"]  # Type: Any
print("things are restricted! Because: " + restriction["reason"])  # Type-checks

This is basically saying "I don't have a clue about what is inside", while using total=False is misleading since it says "this key might or might not be inside this dict", when in fact it is often safe to assume it is.

My point is that I would prefer a solution that gives very little information to one that provides somewhat-precise information, until NotRequired[...] is implemented. If going for TypeDict, I would prefer to have it with total=True. I think it is a better tradeoff between good type-checking and runtime safety.
I leave this decision on your hands though, I'm happy to implement this with TypeDict (either total or not total) right now if you think it is the best path.

@IdmFoundInHim
Copy link
Contributor

How about we use the workaround mentioned in the PEP's Motivation section? That is, use two TypedDict, one with the required keys and another with the not-required keys, liked through inheritance. When the ideal solution is supported, it should be relatively easy to programatically upgrade to the new syntax.

@akathorn
Copy link

akathorn commented Jul 2, 2021

*facepalm* 🤦🏻‍♂️ I totally missed this. This will certainly work, I'm gonna give it a go :)

@Peter-Schorn
Copy link
Contributor

By the way, I've created a Spotify web API library in Swift which has a struct for each object in the Spotify object model. Swift requires you to explicitly mark which properties might be None, so you can use these types as a guide for which keys can be missing.

If you're not familiar with swift, here's a quick guide on how to read the syntax:

public struct Artist: Hashable {
    
    let name: String
    
    // The '?' indicates this could be `None`
    let uri: String?
    
    let id: String?
 
    // Either an array of `SpotifyImage`, or `None`
    let images: [SpotifyImage]?
    
    let popularity: Int?
    
    // Either a dictionary of with keys of type `String` and values
    // of type `URL`, or `None`.
    let externalURLs: [String: URL]?
    
    let followers: Followers?
    
    let genres: [String]?
    
    let href: URL?
    
    let type: IDCategory
    
}

@akathorn
Copy link

akathorn commented Jul 7, 2021

Thank you, this is going to be useful.

Am I understanding it correctly, that the JSON decoder uses nil to represent both missing keys and null values? Since those need to be represented differently in Python, I'm still at a loss in some cases. Sometimes it is clear in the official docs, but not always. For instance, the releaseDate of an Album appears as optional in the Swift library, but the official docs don't say anything about it being missing/null.

Any suggestions on how to figure out what could be missing and what could be null? The only thing I can think of is checking out other libraries out there, but I imagine it could be quite common that they abstract missing keys and null values in a similar way. Tekore seems to be doing it too, as far as I can see.

@Peter-Schorn
Copy link
Contributor

Am I understanding it correctly, that the JSON decoder uses nil to represent both missing keys and null values?

Yes, that is correct. However, I will look into ways of distinguishing between these cases.

For instance, the releaseDate of an Album appears as optional in the Swift library, but the official docs don't say anything about it being missing/null.

Yes, it's extremely frustrating. Apparently, the developers were too lazy to document each property that might be null. I had to write extensive unit tests in order to manually determine which properties might be null or missing.

@akathorn
Copy link

akathorn commented Jul 8, 2021

Can I take a look at those tests? I wrote some code to automatize comparing a response with the expected TypedDict type. It detects the difference between null and missing, but of course the problem is being exhaustive. Perhaps I can programmatically adapt some of your test cases to use them with my code.

@Peter-Schorn
Copy link
Contributor

The tests are here, in the same repository. However, they are unlikely to be useful to you as they are also written in Swift. Essentially, they involve making calls to each of the API endpoints and ensuring the response is successfully decoded into the expected type. They aren't going to help you distinguish between null values and missing keys because once the JSON data is decoded into a Swift struct, that distinction is lost.

@akathorn
Copy link

akathorn commented Jul 9, 2021

I'm hoping for good examples of calls that return missing keys or null values. Right now I have tests such as:

def test_track_urn(self):
    track = self.spotify.track(self.creep_urn)
    self.assertType(track, self.spotify.track)

assertType checks the response against the method's signature, and reports any differences:

<Track>
  {
    required [dict]:
      * Missing required keys: {'linked_from'}
      {
        preview_url [primitive]:
          * None: expected <class 'str'> but got <class 'NoneType'>
      }
  }

Which shows that the key linked_from can be missing in the response, and preview_url can be null. These tests are quick to write since you only need to know the arguments for the call, the rest is automated.

My idea is finding a lot of examples, trying to be as exhaustive as possible, and correct the types until they fit every known case. There would certainly be a lot of false negatives because of edge cases and outliers for which there are no tests. But if we can't find an alternative, maybe fitting most cases would be good enough?
For the remaining keys a safe fallback could be to to set them both as not required and optional, which would mean having to do two checks but would ensure runtime safety.

@Peter-Schorn
Copy link
Contributor

Peter-Schorn commented Jul 16, 2021

Writing extensive unit tests is the only reliable way to validate the type hints. That's what I did for my library. Here's one tip: Make sure you run tests with local tracks. These tracks contain lots of missing data. They may not even have URIs! You have to add them to a playlist and then make a request for that playlist. Here's one of my playlists with local tracks:

https://open.spotify.com/playlist/0ijeB2eFmJL1euREk6Wu6C

@leleogere
Copy link

Having typed objects for each possible output Spotify API is able to return would greatly improve development workflow in IDEs.

@akathorn
Copy link

I didn't wrap this up in the end, in great part due to a chronic illness.
But anyway this inconsistency between null values and missing keys was really annoying me. I don't want to give wrong annotations because they give a false sense of safety, and testing all edge-cases seemed rather daunting to me.
I could try, but I don't know if I'll find the time and energy to finish this up anytime soon. I wrote a "type-checker" to automatically test annotations against the Spotify endpoints (using reflection), maybe if I can clean that up someone could eventually continue testing endpoints and tweaking the annotations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants