Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function proposal - groupBy() #7

Open
glenveegee opened this issue May 20, 2020 · 4 comments
Open

Function proposal - groupBy() #7

glenveegee opened this issue May 20, 2020 · 4 comments

Comments

@glenveegee
Copy link

Given an array of objects, group the array on a key or expression such that:

// Object array with string
    let returnValue = await applyFunction('fn:jmespath', 'groupBy(@, `a`)', [
      { a: 1, b: 2 },
      { a: 1, b: 3 },
      { a: 2, b: 2 },
      { a: null, b: 999 },
    ]);
    expect(returnValue).toStrictEqual({
      1: [
        { a: 1, b: 2 },
        { a: 1, b: 3 },
      ],
      2: [{ a: 2, b: 2 }],
      null: [{ a: null, b: 999 }],
    });

// Object array with expression reference
    returnValue = await applyFunction('fn:jmespath', 'groupBy(@, &a)', [
      { a: 1, b: 2 },
      { a: 1, b: 3 },
      { a: 2, b: 2 },
      { a: null, b: 999 },
    ]);
    expect(returnValue).toStrictEqual({
      1: [
        { a: 1, b: 2 },
        { a: 1, b: 3 },
      ],
      2: [{ a: 2, b: 2 }],
      null: [{ a: null, b: 999 }],
    });

// Object array with inconsistent schema
    returnValue = await applyFunction('fn:jmespath', 'groupBy(@, &a)', [
      { a: 1, b: 2 },
      { a: 1, b: 3 },
      { b: 4 },
      { a: null, b: 999 },
    ]);
    expect(returnValue).toStrictEqual({
      1: [
        { a: 1, b: 2 },
        { a: 1, b: 3 },
      ],
      null: [{ b: 4 }, { a: null, b: 999 }],
    });

// Inconsistent array member types
    try {
      returnValue = await applyFunction('fn:jmespath', 'groupBy(@, &a)', [
        { a: 1, b: 2 },
        `{ a: 1, b: 3 }`,
        { b: 4 },
        1234,
      ]);
    } catch (error) {
      expect(error.message).toEqualText(
        'TypeError: unexpected type. Expected Array<object> but received Array<object | string | number>',
      );
    }

In javascript (typescript) this might implemented as follows something like:

registerFunction(
  'groupBy',
  function (this: any, [memberList, exprefNode]: [JSONObject[], ExpressionNodeTree | string]) {
    if (!this._interpreter) return {};

    if (typeof exprefNode === 'string') {
      return memberList.reduce((grouped, member) => {
        if (exprefNode in member) {
          const key = member[exprefNode] as string;
          const currentMembers = (grouped[key] as any[]) || [];
          grouped[key] = [...currentMembers, member];
        }
        return grouped;
      }, {});
    }
    const interpreter = this._interpreter;
    const requiredType = Array.from(new Set(memberList.map(member => this.getTypeName(member))));
    const onlyObjects = requiredType.every(x => x === TYPE_OBJECT);
    if (!onlyObjects) {
      throw new Error(
        `TypeError: unexpected type. Expected Array<object> but received Array<${requiredType
          .map(type => this.TYPE_NAME_TABLE[type])
          .join(' | ')}>`,
      );
    }

    return memberList.reduce((grouped, member) => {
      const key = interpreter.visit(exprefNode, member) as string;
      const currentMembers = (grouped[key] as any[]) || [];
      grouped[key] = [...currentMembers, member];
      return grouped;
    }, {});
  },
  [{ types: [TYPE_ARRAY] }, { types: [TYPE_EXPREF, TYPE_STRING] }],
);
@glenveegee glenveegee changed the title Function proposal - groupBy Function proposal - groupBy() May 20, 2020
@innovate-invent
Copy link

innovate-invent commented Mar 13, 2022

Is this too high level? an alternate approach could be to add to_set() and zip().

  • to_set() takes a list and returns a list of unique elements
  • zip() behaves the same as python zip()
search([
      { a: 1, b: 2 },
      { a: 1, b: 3 },
      { a: 2, b: 2 },
      { a: null, b: 999 },
    ],
   let({root: @, k: sort(to_set([*].a))}, &zip(k, map(&root[? a == @ ], k)))
)

@glenveegee
Copy link
Author

As an inexperienced JMESPath user, that would definitely scare me off. As a side note I do think to_set and zip nevertheless are needed.

As a community of enthusiastic devs, I look forward to having discussions about the "why" of function support in JMESPath. I used to think we just needed to throw more functions at the problems but nowadays I much more skeptical.

@glenveegee
Copy link
Author

glenveegee commented Mar 14, 2022

I'm guilty in the past, of proposing new JMESPath functions in the spec because it was a solution to a specific individual issue I was facing. It would have been better to reign in my enthusiasm with some checkpoints before drafting a proposal:

  1. Does it solve a fundamental gap in the JMESPath design that would otherwise make JMESPath more useful (The example I'd use here is the let() or $ proposal that facilitate more than one scope in a query)
  2. Does it provide an obvious missing piece in the API (The example here would be items() - JMESPath has keys(), values() but not something that allows both keys and values in the same query context)
  3. Is it an overwhelmingly and frequently requested feature (in the community) that is lacking - The example here being something like constructing keys from source values in projections
  4. Will it greatly simplify/improve the developer experience/query complexity

@innovate-invent
Copy link

innovate-invent commented Mar 14, 2022

I agree with all of those points. # 4 is the item I generally struggle with, leaning on the conservative side. I am not strongly opposing this proposal, I just would prefer if any possible alternative is considered (contrived or otherwise) to better understand the problem being solved.

My main concern is that the output of group_by is going to need to be re-transformed immediately after. My proposed alternative allows customization of the keys, values, value merge criteria (merge values rather than insert them in a list), and allows pulling in additional merge elements (think "three-way group_by") during the transformation.

The other issue is that group_by would only work for lists of mappings of values with an equivalence relation and string conversion and no other data structure.

We have started a wiki for common patterns to aid newer users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants