Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json processing in postgres notation #1810

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

lazarevnik
Copy link
Contributor

Hello!
Here I've implemented :

  • JSON data type processing based on fasterXML/jackson library
  • Functions to work with documents
  • Special operations for querying with Postgres syntax

Also here are some queries shows how it works.

Unfortunately, I've got some problems with memory and network tests.
I'll glad to discuss their decision.

Copy link
Contributor

@katzyn katzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this feature.

I didn't check all changes yet, so it's not a complete review.

H2 should work without any third-party libraries. JSON data type may not work without them, but all other features should not require them.

Also it's not a good idea to allow usage of third-party data types in getObject() / setObject() / etc., Jakson may be removed and replaced with something else. Perhaps JSON should be mapped to String or byte[]. If you need such support, reimplement it with a reflection.

throw getSyntaxError();
}
int len = args.length;
Value[] arr = (Value[]) Array.newInstance(Value.class, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Value[len]

String[] args;
if(param.startsWith("{") && param.endsWith("}")) {
param = param.substring(1, param.length() - 1);
args = param.replaceAll(" ", "").split(",");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that Parser is a good place for such logic.

int len = args.length;
Value[] arr = (Value[]) Array.newInstance(Value.class, len);
for (int i = 0; i < len; i++) {
Value v = StringUtils.isNumber(args[i]) ? ValueInt.get(new Integer(args[i])) : ValueString.get(args[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use new Integer(String), it is deprecated.

@@ -1759,6 +1788,212 @@ protected Value getValueWithArgs(Session session, Expression[] args) {
String msgText = v1.getString();
throw DbException.fromUser(sqlState, msgText);
}
case JSON_FIELD: {
if(v0.getValueType() == Value.JSON) {
JsonNode json = ((ValueJson) v0).getObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All such code with third-party classes should be isolated into separate methods or classes.

@@ -1112,6 +1151,8 @@ public static int getTypeFromClass(Class <?> x) {
return Value.TIMESTAMP;
} else if (LocalDateTimeUtils.OFFSET_DATE_TIME == x || LocalDateTimeUtils.INSTANT == x) {
return Value.TIMESTAMP_TZ;
} else if (com.fasterxml.jackson.databind.JsonNode.class == x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not create hard run-time dependencies of third-party libraries.

h2/src/main/org/h2/value/Value.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/value/Value.java Show resolved Hide resolved
h2/src/main/org/h2/value/ValueJson.java Show resolved Hide resolved
@Override
public int compareTypeSafe(Value v, CompareMode mode) {
// TODO Auto-generated method stub
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here

@@ -0,0 +1,50 @@
SELECT CAST('{"tag1":"simple string"}' AS JSON);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header.

private static final ObjectMapper mapper = new ObjectMapper();

ValueJson(String str) throws IOException {
int memFirst = Utils.getMemoryUsed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use this method in the main code, it's not reliable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear how to evaluate various json objects.
Here used string length, but it may work not always.

this.json=mapper.createObjectNode();
this.string = this.json.toString();
}
this.string = str.replaceAll("\n", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps JSON should be fully normalized into some canonical form.

h2/src/main/org/h2/value/ValueJson.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/value/ValueJson.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/value/ValueJson.java Outdated Show resolved Hide resolved
@@ -427,7 +427,7 @@ private void process(String sql, boolean allowReconnect) throws Exception {
if (statements != null) {
statements.add(sql);
}
if (sql.indexOf('?') == -1) {
if (sql.indexOf('?') == -1 || sql.indexOf("?'") != -1 || sql.indexOf("?|") != -1 || sql.indexOf("?&") != -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such check is not reliable. Characters next to ? should be tested instead. Not very critical for now, but at least a TODO comment should be placed here to indicate a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current check used to exclude collision between JSON operators and parameter token because it's not obvious how this can be resolved at the parser level.

/**
* The PostgreSQL token "||"
*/
private static final int JSON_CONCAT = JSON_EXISTS_ALL + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it is distinguished from a string || concatenation and why these constants are not consistent with TOKENS array?

addFunction("JSON_EXISTS_ANY", JSON_EXISTS_ANY, 2, Value.BOOLEAN);
addFunction("JSON_EXISTS_ALL", JSON_EXISTS_ALL, 2, Value.BOOLEAN);
addFunction("JSON_CONCAT", JSON_CONCAT, 2, Value.JSON);
addFunction("JSON_DELETE_FIELD", JSON_DELETE_FIELD, 2, Value.JSON);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where all these functions came from? Neither SQL Standard nor PostgreSQL documentation have them with an exception for JSON_EXISTS that is described in the standard (but I'm not sure that it is compatible with your implementation).

If you need them to support some operations, it's better to use own class similar to BinaryOperation instead. Operations can be mapped to functions only when a similar function exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now these functions are used for the work of operators. With the development of the functionality, it is planned to add new ones, but I'll try to correlate names with the standard or the PostgreSQL dialect.

@katzyn
Copy link
Contributor

katzyn commented Mar 16, 2019

JSON support was standardized in SQL:2016, but PostgreSQL is not compatible with the standard. I wonder how many complexity these weird PostgreSQL-only operations create for us when we decide to support standard features.

#> and #>> may create problems in Oracle and MS SQL compatibility modes (they aren't tested well, so no failures on Travis doesn't mean that everything works as expected). Oracle has standard JSON features, BTW. I'm not sure about all these ->>, @>, ?|, etc. @grandinj, may be you have some opinion?

@grandinj
Copy link
Contributor

@lazarevnik Thank you very much for this!

@katzyn I don't really like adding these operators, I'd prefer functions, they are much easier to extend and much less likely to cause trouble with parsing.

@lazarevnik Are you deliberately aiming at Postgresql compatibilty?

@lazarevnik
Copy link
Contributor Author

@katzyn Thank you for the review, it is very helpful.
@grandinj Yes, I work with the implementation of this particular dialect for further using it in Apache Ignite to caching. It is also planned to develop functionality and supported dialects.

@katzyn
Copy link
Contributor

katzyn commented Mar 18, 2019

There is a SQL Standard. Why you need non-standard syntax for Ignite? Sorry, I don't understand it. H2 is not a PostgreSQL emulator. Many databases have own legacy syntax constructions in different areas, but usually they begin to support standard features too.

@lazarevnik
Copy link
Contributor Author

I want to work towards caching with read-/write-through. Ignite allows you to use SQL queries for this on a par to ordinary cache mechanism. So, to work with not abstract standard, but with concrete RDBMS, I used PostgreSQL syntax and operators.

@katzyn
Copy link
Contributor

katzyn commented Mar 18, 2019

It means that you don't have a real reason, because you're not restricted to that proprietary syntax. Oracle uses the standard, MySQL partially compatible with the standard, there were some attempts to made PostgreSQL compatible with the standard in 2017, but it looks like they aren't yet merged. I don't understand why you didn't discuss your addition with H2 community earlier, but what's done is done.

  1. New data type is fine.
  2. Strong dependency on third-party libraries that were introduced here is not acceptable. H2 must work without them (with possible loss of JSON support).
  3. Non-standard operators are bad idea, we may accept them, but for now I don't see a reason to have them. They may create more problems than benefits for us.

Could you extract ValueJson and all related changes in Value, storage backends, etc. into a separate pull request? Without all these functions and operators? I think that we can merge the data type by itself, clean up its implementation if necessary, and continue our discussion about functions and operators.

@lazarevnik
Copy link
Contributor Author

Yes, I'll do it in near future

@lazarevnik
Copy link
Contributor Author

Here new pull request.

<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.9.8</version>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: probably it will be good to change scope to provided for this dependencies, to avoid dependency conflicts

Suggested change
<version>2.9.8</version>
<version>2.9.8</version>
<scope>provided</scope>

Class<?> j;
try {
j = JdbcUtils.loadUserClass(JSON_CLASS_NAME);
} catch (Exception e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please catch exceptions which you expect but not so big as this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants