-
I'm trying to use simdjson to extract data out of a ndjson-like log. I wrote some "naïvely idiomatic" code (just like I would've written it in Python or any other high-level language) and now I'm getting OUT_OF_ORDER_ITERATION exceptions. I read the "Field Access" paragraph in the simdjson API Basics doc, but I'm having trouble understanding what exactly I can and cannot do. This is the (reduced) sample of the code I wrote:
Here, I'm getting an exception on the Given this example, what am I doing wrong? I'd like to understand the exact rules that I have to follow and I'd like some pointers on how to rewrite this code to use simdjson correctly but retaining readability. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Thanks for your interest.
See point two, from a screenshot of our documentation: Using the On Demand API, the rule is that you should be in one array or one object at a time, and once you hold an array (for example), then you need to consume it. This may involve storing it in your own data structure. You cannot collect arrays and then parse these arrays later. You must consume them immediately. For example: #include "simdjson.h"
#include <iostream>
#include <string_view>
using namespace std::literals;
using namespace simdjson;
int parse_timestamp(std::string_view v) {
// could use
// https://lemire.me/blog/2023/07/01/parsing-time-stamps-faster-with-simd-instructions/
return 0; // do something
}
int main() {
padded_string json = R"(
{"timestamp": "20230701205436",
"data": [
{"description": "Corsair HX1000i",
"status": {"Current uptime":313321.11}}
]}
{"timestamp": "20230701205536",
"data": [
{"description": "Corsair HX1000i",
"status": {"Current uptime":312321.111}}
]}
)"_padded;
auto parser = ondemand::parser{};
ondemand::document_stream stream = parser.iterate_many(json);
for (auto doc : stream) {
auto ts = parse_timestamp(doc["timestamp"]);
for (auto device : doc["data"].get_array()) {
if (device["description"].get_string() == "Corsair HX1000i"sv) {
double uptime = device["status"]["Current uptime"].get_double();
std::cout << "uptime: " << uptime << std::endl;
}
}
}
}
If you need to use the JSON document as a database, then the DOM API might suit your needs better. Even so, I think that On Demand might suits your need fine. You just need to parse the content into your own data structures. You use simdjson just as a way to grab the content, not as a database that allows you to go back and forth. |
Beta Was this translation helpful? Give feedback.
Thanks for your interest.
See point two, from a screenshot of our documentation:
Using the On Demand API, the rule is that you should be in one array or one object at a time, and once you hold an array (for example), then you need to consume it. This may involve storing it in your own data structure.
You cannot collect arrays and then parse these arrays later. You must consume them immediately.
For example: