-
First, let me state that I have indeed RTFM'd and found that the thread-"unsafety" of the I am trying to read the metadata from a large number of image files—we could say tens of thousands or whatnot. To improve performance, I read the metadata from multiple files concurrently on multiple threads. According to the documentation, I should serialize calls to I happen to have some files with embedded XMP metadata which share some common xml namespace prefixes that map to different namespaces among the documents. To give my particular use case, one file (file "A") maps the prefix "vr" to "http://www.communicatingastronomy.org/repository/1.0/". Another (file "B") maps the same prefix to "http://www.communicatingastronomy.org/repository/1.1/". The same conflation is true among multiple files—the definition of "vr" bounces back and forth. When I call
I don't see an easy/simple way to work around this issue. One simple means would be to enforce serial access to the What do you think? Anyone have advice on this? Is this behavior a bug in exiv2? Any help is much appreciated. Thank you. (FYI, I've been building exiv2 from the main branch. The binary I'm using I built from commit 55712d4.) (Also, let me add that the underlying call to Line 787 in 55712d4 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
@mallman Michael. Thank you for raising this topic. I believe we've spoken in the past. Welcome back. I think you might be in trouble here. The most obvious point to make is, perhaps you have to avoid multi-threading in this case. What about multi-processing combined with multi-threading? If you can segment your data into collections of 1.0 and 1.1 files, you could process the collections in separate processes (both of which can be multi-threaded). Exiv2 is fast. I have 80,000 images on my web-site and have scripts that occassionally read them all. I'm not bothered if they takes 2 or 3 hours to run. I've just run a little test on my 8 year old mac mini with a spinning disk. 50 images/second. My MacBook Pro with the m1 chip and SSD, achieves 1000+ images/second. Another thought is to "fix" your files once and for all. XMP is xml and usually easy to spot in your images. You could use the utility I don't know much about the XMPsdk, however I know that Adobe changed the API concerning prefix. You give the ns URI and preferred prefix and it returns the prefix to be used. If |
Beta Was this translation helpful? Give feedback.
-
I've had another thought about this. readMetadata() finds the XMP/xml "block" and then informs the Adobe XMPsdk. You could inject code to modify The benefit of putting this code in readMetata() or the XMPsdk file parser is that you are certain that your code is working on your behalf. The crude approach using |
Beta Was this translation helpful? Give feedback.
@mallman Michael. Thank you for raising this topic. I believe we've spoken in the past. Welcome back. I think you might be in trouble here.
The most obvious point to make is, perhaps you have to avoid multi-threading in this case. What about multi-processing combined with multi-threading? If you can segment your data into collections of 1.0 and 1.1 files, you could process the collections in separate processes (both of which can be multi-threaded).
Exiv2 is fast. I have 80,000 images on my web-site and have scripts that occassionally read them all. I'm not bothered if they takes 2 or 3 hours to run. I've just run a little test on my 8 year old mac mini with a spinning disk. 50 images/seco…