Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VisionKit bindings #592

Open
AuroraWright opened this issue Feb 10, 2024 · 5 comments
Open

Add VisionKit bindings #592

AuroraWright opened this issue Feb 10, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@AuroraWright
Copy link

Is your feature request related to a problem? Please describe.
The VisionKit APIs seem to be more actively supported, as an example starting in Sonoma the text recognition now supports vertical text for CJK languages (Japanese, Chinese, Korean) which is not yet supported in Vision.

Describe the solution you'd like
VisionKit bindings to be available for use on 13.0+

Describe alternatives you've considered
There's no real alternative right now other than using the less updated Vision api or invoking external command line tools.

Additional context
The docs state that VisionKit "is only available in Catalyst" but that doesn't seem to be the case (anymore?) from Ventura onwards. There are apps using the new APIs on macOS (eg https://github.com/Shakshi3104/LiTeX, TextSniper also seems to use it according to a friend's reverse engineering). Apple's API docs claim it's available on macOS as well https://developer.apple.com/documentation/visionkit/imageanalyzer

@AuroraWright AuroraWright added the enhancement New feature or request label Feb 10, 2024
@ronaldoussoren
Copy link
Owner

The documentation claims that it is available on macOS 13, but ...

  • The headers included in the SDK (Xcode 15.3 beta 2) are empty
  • The Objective-C classes documented on the website are only available on iOS or through Mac Catalyst.
  • The class you link to is a Swift class that cannot be used in Objective-C

Currently PyObjC can only be used with interfaces that can be used in Objective-C code. It might be possible to expose Swift frameworks as well, but this likely requires significant engineering to design and implement.

@AuroraWright
Copy link
Author

AuroraWright commented Feb 10, 2024

@ronaldoussoren right, sorry for not realizing and wasting your time!

On another note (this is probably not something to officially support/implement I guess) I noticed there's an underlying objective C implementation for the stuff I need in VisionKit, it's not documented but WebKit does use it directly
I got most of the way through I think, this code seems to work up to the processRequest bit because I'm not sure how to properly do the registerMetaDataForSelector (I don't really know much about objective C):

import Cocoa
import objc

ns_image = Cocoa.NSImage.alloc().initWithContentsOfFile_("/Users/aurora/Downloads/tg_image_1633323779.jpeg")

objc.loadBundle('VisionKit', globals(), '/System/Library/Frameworks/VisionKit.framework')
req=VKImageAnalyzerRequest.alloc().initWithImage_requestType_(ns_image, 1)
req.setLocales_('ja-JA')
objc.registerMetaDataForSelector(
        b"VKImageAnalyzer",
        b"processRequest:updateHandler:completionHandler:",
        {
            "arguments": {
                4: {
                    "callable": {
                        "retval": {"type": b"v"},
                        "arguments": {
                            0: {"type": b"^v"},
                            1: {"type": b"@"},
                            2: {"type": b"@"},
                            3: {"type": b"@"},
                        },
                    }
                }
            }
        },
)
analyzer=VKImageAnalyzer.alloc().init()

def update(self, progress:float):
    pass

def process(self, analysis:VKImageAnalysis):
    pass

analyzer.processRequest_updateHandler_completionHandler_(req, update, process)

According to WebKit source processRequest is defined like this:
(VKImageAnalysisRequestID)processRequest:(VKImageAnalyzerRequest *)request progressHandler:(void (^_Nullable)(double progress))progressHandler completionHandler:(void (^)(VKImageAnalysis *_Nullable analysis, NSError *_Nullable error))completionHandler;

How should I define it in registerMetaDataForSelector?

@ronaldoussoren
Copy link
Owner

@ronaldoussoren right, sorry for not realizing and wasting your time!

No need to apologise, it wouldn't be the first time that I missed a new API.

On another note (this is probably not something to officially support/implement I guess) I noticed there's an underlying objective C implementation for the stuff I need in VisionKit, it's not documented but WebKit does use it directly I got most of the way through I think, this code seems to work up to the processRequest bit because I'm not sure how to properly do the registerMetaDataForSelector (I don't really know much about objective C):

import Cocoa
import objc

ns_image = Cocoa.NSImage.alloc().initWithContentsOfFile_("/Users/aurora/Downloads/tg_image_1633323779.jpeg")

objc.loadBundle('VisionKit', globals(), '/System/Library/Frameworks/VisionKit.framework')
req=VKImageAnalyzerRequest.alloc().initWithImage_requestType_(ns_image, 1)
req.setLocales_('ja-JA')
objc.registerMetaDataForSelector(
        b"VKImageAnalyzer",
        b"processRequest:updateHandler:completionHandler:",
        {
            "arguments": {
                4: {
                    "callable": {
                        "retval": {"type": b"v"},
                        "arguments": {
                            0: {"type": b"^v"},
                            1: {"type": b"@"},
                            2: {"type": b"@"},
                            3: {"type": b"@"},
                        },
                    }
                }
            }
        },
)
analyzer=VKImageAnalyzer.alloc().init()

def update(self, progress:float):
    pass

def process(self, analysis:VKImageAnalysis):
    pass

analyzer.processRequest_updateHandler_completionHandler_(req, update, process)

According to WebKit source processRequest is defined like this: (VKImageAnalysisRequestID)processRequest:(VKImageAnalyzerRequest *)request progressHandler:(void (^_Nullable)(double progress))progressHandler completionHandler:(void (^)(VKImageAnalysis *_Nullable analysis, NSError *_Nullable error))completionHandler;

How should I define it in registerMetaDataForSelector?

You got it almost right, but the method has two arguments that are blocks. Both return "void", the first one has a single argument of type double, the second has to arguments and both are Objective-C objects:

objc.registerMetaDataForSelector(
        b"VKImageAnalyzer",
        b"processRequest:updateHandler:completionHandler:",
        {
            "arguments": {
                3: {
                  "callable": {
                   "retval": { "type": "v" },
                    "arguments": {
                      0: { "type": "^v" },
                      1: { "type": "d" },
                     }
                },
                4: {
                    "callable": {
                        "retval": {"type": b"v"},
                        "arguments": {
                            0: {"type": b"^v"},
                            1: {"type": b"@"},
                            2: {"type": b"@"},
                        },
                    }
                }
            }
        },
)

I haven't used the Vision framework myself yet, but it does seem to have some options for recognizing text, see https://developer.apple.com/documentation/vision/vnrecognizetextrequest?language=objc and https://developer.apple.com/documentation/vision/recognizing_text_in_images?language=objc (both have sample code in Swift, but hopefully that has enough context to be clear how to reproduce this in Python)

@AuroraWright
Copy link
Author

AuroraWright commented Feb 11, 2024

objc.registerMetaDataForSelector(
        b"VKImageAnalyzer",
        b"processRequest:updateHandler:completionHandler:",
        {
            "arguments": {
                3: {
                  "callable": {
                   "retval": { "type": "v" },
                    "arguments": {
                      0: { "type": "^v" },
                      1: { "type": "d" },
                     }
                },
                4: {
                    "callable": {
                        "retval": {"type": b"v"},
                        "arguments": {
                            0: {"type": b"^v"},
                            1: {"type": b"@"},
                            2: {"type": b"@"},
                        },
                    }
                }
            }
        },
)

I haven't used the Vision framework myself yet, but it does seem to have some options for recognizing text, see https://developer.apple.com/documentation/vision/vnrecognizetextrequest?language=objc and https://developer.apple.com/documentation/vision/recognizing_text_in_images?language=objc (both have sample code in Swift, but hopefully that has enough context to be clear how to reproduce this in Python)

Thanks, that did the trick! For what it's worth I do have a Vision fraemwork option in my OCR program but since it's for Japanese and vertical text is really helpful wanted to try getting the VisionKit stuff working too (it seems Apple updated VisionKit with vertical text in Sonoma but Vision still doesn't support it - actually, while in Ventura it tried to read vertical text horizontally in Sonoma it returns an empty array for the results).

This is the working VisionKit code:

import Cocoa
import objc
from PyObjCTools.AppHelper import runConsoleEventLoop, stopEventLoop

ns_image = Cocoa.NSImage.alloc().initWithContentsOfFile_("/Users/aurora/Downloads/Untitled.jpg")
objc.loadBundle('VisionKit', globals(), '/System/Library/Frameworks/VisionKit.framework')
req=VKCImageAnalyzerRequest.alloc().initWithImage_requestType_(ns_image, 1)
req.setLocales_(['ja','en'])
analyzer=VKCImageAnalyzer.alloc().init()
objc.registerMetaDataForSelector(
    b"VKCImageAnalyzer",
    b"processRequest:progressHandler:completionHandler:",
    {
        "arguments": {
            3: {
              "callable": {
               "retval": { "type": "v" },
                "arguments": {
                  0: { "type": "^v" },
                  1: { "type": "d" },
                 }
            }
            },
            4: {
                "callable": {
                    "retval": {"type": b"v"},
                    "arguments": {
                        0: {"type": b"^v"},
                        1: {"type": b"@"},
                        2: {"type": b"@"},
                    },
                }
            }
        }
    },
)

def update(progress:float):
    pass

def process(analysis:VKCImageAnalysis, error:NSError):
    lines = analysis.allLines()
    for line in lines:
        print(line.string())
    stopEventLoop()

analyzer.processRequest_progressHandler_completionHandler_(req, update, process)
runConsoleEventLoop()

The only drawback is that it takes a couple seconds for objc.loadBundle() but I assume can't do much about that

@ronaldoussoren
Copy link
Owner

The WebKit SPI header for this: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/PAL/pal/spi/cocoa/VisionKitCoreSPI.h

That appears to use a private framework, see https://github.com/WebKit/WebKit/blob/7cd082919192095d0b017c6e5f7a36a47135bb8c/Source/WebCore/PAL/pal/cocoa/VisionKitCoreSoftLink.mm#L36

Exposing this through PyObjC shouldn't be too hard, but I don't know yet if I'll do so because I don't like exporting private APIs (mostly because those might break between releases of the OS).

The Swift interface for the framework also doesn't look to complicated, with some luck it is possible to expose that to Python. But as said, this does require some engineering because I currently don't interface to Swift framework. I don't known when I'll get around to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants