Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facets or similar but on all values in a subset? #16144

Open
jhancock4d opened this issue Mar 29, 2023 · 2 comments
Open

Facets or similar but on all values in a subset? #16144

jhancock4d opened this issue Mar 29, 2023 · 2 comments

Comments

@jhancock4d
Copy link

jhancock4d commented Mar 29, 2023

References:

https://www.telerik.com/kendo-angular-ui/components/grid/data-operations/
https://www.telerik.com/kendo-angular-ui/components/grid/data-binding/basics/
https://www.telerik.com/kendo-angular-ui/components/grid/data-binding/remote-data/
https://www.nuget.org/packages/ClientFiltering/

Result:

{
   TotalCount: 1205,
   Results: [], // Array of a flat type (could have hierarchial data but it isn't grouped abstractly
   Groups: [
       {
            FieldName: 'Field',
            Aggregates: [
                 {
                        FieldName: 'Field',
                        AggregateType: Sum,
                        Value: 99
                 }
            ]
       }
   ],
   Summaries: [
         {
                FieldName: 'Field',
                AggregateType: Sum,
                Value: 1099
         }
   ]
}

(After the first request, Groups and Summaries are null as you already have them and don't need to calculate them for subsequent page requests, just return the secondary page)

Problem:

When you use a grid control such as Telerik's Kendo etc. and you want to do everything server side so that the entire dataset isn't spewed over the wire, this is straight forward until you start grouping results. And it gets worse when you want to also have a chart based on the group.

All data has to be first security filtered so that they only see what they're supposed to, and then filtered by the filters defined in the grid columns.

I.e. if I have a list of users and I group the grid by an attribute of those users (i.e. let's say an organizational group) I have 4 problems that I have to solve:

  1. What data do I actually need to return to the client?
  2. How do I return all of the aggregates requested for the group to display in the chart?
  3. None of this can be done ahead of time. It has to be done based on whatever the user does to the grid.
  4. How to get summary aggregates that take the entire filtered dataset from above, and summarize on it without it being an N request problem for every aggregate?
  5. How to map the DTO field names against the data source field names given that you can't do a where on a projection.

The first one is relatively straight forward: If you order your resultset by the groups in the direction that the groups were requested to be displayed BEFORE you order by whatever other requested fields, you can then use skip/take to get the page of data that is needed for display. It's important that this occurs in this way because you want to return strongly typed data over the wire that isn't of arbitrary shape, otherwise you end up with subtle bugs over time if the server starts returning slightly different data. Your client won't know it and you won't get compile time errors (we strongly generate our client from the server using swagger for this reason.)

I have solved the Filtering problem and the ordering problem by using an AsyncDocumentQuery because of the excellent extension methods that let me use the WhereEquals etc. with a field name.

Problem 2 is much more difficult. Especially if the grid is grouped by one field and then a second etc. Basically you need to take the group, find all unique values in that group, and then aggregate by the requested fields and aggregate types on the security and grid filtered greater dataset.

The problem comes taking the pre-filtered dataset and then getting the groups with aggregates.

Then I discovered facets and I thought "Woohooo! This solves the whole problem!" But I can't figure out how to use facets to operate on the entire value on the field instead of breaking the values apart into words too. i.e. I want to get the same as facets but just the same as GroupBy(fieldname) instead of based on the individual words of the group by.

So, question No 1 is simple: Is it possible to create facets that don't break apart the values in the field and just operate on the unique values in the dataset by the field? (i.e. Select(fieldname) => aggregate on these values). This would allow me to programmatically build up the aggregations against the group.

If question No 1 is not possible, do you have a suggestion on how to go about getting the aggregates programmatically on a group by?

Then, on problem No 4. What you ideally want to do is GroupBy(true).Select({Aggregates}) or something similar, but I can't find a way to programmatically build up aggregates. It appears that I might be able to use facets but that breaks down by basically groups, whereas this problem is on the entire set, give me the aggregates of the given fields with the given aggregate type.

Finally, is there a way, or have you considered, creating an index that takes an entire document collection and enables this type of interaction? I.e. take the functionality of the Kendo Grid + a chart at the top that has the X axis based on the grouping of the grid and then defines the aggregate, pass that information into a rest api and have RavenDb be able to handle the entire thing automatically? Right now I have to create an index that includes all of the fields in the dataset that is flattened which is non-optimal for many reasons. Obviously I could put an OLAP server in front of this, but that just seems like overkill for creating interactive reporting when it appears that RavenDb is really close to just being able to do this.

I created a library that has both client side and server-side code that does all of the functionality against linq. (hence my other bug. See the nuget reference above) that the Telerik structure can be converted into, and then C# can read, convert to expression trees and execute. But it doesn't work against Raven because the DTO structure can be different than the source structure and Automapper AsDataSource.For doesn't work against RavenDb correctly (if you're using async) thus meaning that fields have to be converted to their original from the DTO fields that the client knows about, and grouping and building these aggregates also are not reliable. It does work against EF Core, but I see why Raven doesn't like it because EF does a select on the projection and does a where on the projection, which is slow. So I'm trying to adapt this to the methodology of RavenDb and get the same functionality so that I can use interactive Grids and Pivots to generating summary charts that the executives LOVE in companies. They love their power BI but they love it more when they can just go in and drag stuff around themselves without having to know anything about Power BI.

If I'm indeed not blind it would be ideal to get the following functionality on DocumentQuery:

  1. GroupAndAggregate...ThenGroupAndAggregate - This would allow n depth grouping that would take every unique value in the Group, list it, list every aggregate for each value and then sub group and do the same if applicable.
  2. AggregateBy that takes a list of fields and aggregate types and returns each of those as a result in a list.

These 2 would address No 2 and No 4 above.

And a way to create a "Report Index" that would take an entire collection as is and create some sort of index on it that allowed all fields to be filtered, grouped, ordered by and aggregated would be amazing. I was thinking that the way this would work is that it would take 3 generics: SourceType, OperatingType, OutputType (DTO) and then have a method on it called CreateReportIndex() or something similar. The SourceType is obviously the type of the document collection, the OperatingType is what the map generates and the outputType is what the reduce should create that is the same as the client sees as the DTO. And the reduce could be a straight projection and OperatingType could in many cases be the same as SourceType unless you were doing grouping etc on reduce. This solves No 5 and without CreateReportIndex ultimately would be a great way to create Projections and you could even use Mapperly/Automapper or similar in the second phase here to do the mapping. It would create a structured way of mapping that would be reversable back to the source so that filtering etc. could come in as the DTO values and be converted to the Source expressions necessary. And all of this could use a source generator in C# to precompile all of the mapping data from OutputType to SourceType I would think.

Is there a blog article for you here?? 😉

This image gives you a visual of what I'm trying to achieve.

image

@garayx
Copy link
Contributor

garayx commented Apr 3, 2023

Hi,

Can you post a sample document and the query result you are trying to achieve?

@jhancock4d
Copy link
Author

Here's a document:

{
    "Lesson": {
        "Id": 158089,
        "Name": "Opening Family Files",
        "Description": "How to open an RFA family file in Revit.",
        "Length": 212,
        "Tags": "family file, family, open family, ملف عائلي, عائله, عائله مفتوحة, 家庭档案, 家庭, 开放家庭, familie bestand, familie, open familie, fichier familial, famille, famille ouverte, Familiendatei, Familie, offene Familie, תיק משפחתי, משפחה, משפחה פתוחה, file di famiglia, famiglia, famiglia aperta, ファミリ ファイル、 ファミリ、オープン ファミリ, پرونده خانواده ، خانواده ، خانواده باز, arquivo familiar, família, família aberta, familia, familia, familia abierta, familje fil, Familj, öppen familj, qorDu' poS qorDu' teywI', qorDu'., 가족 파일, 가족, 개방형 가족, семейное досье, семья, открытая семья, Rodzina, Rodzina, otwarta rodzina, Rodinný soubor, Rodina, otevřená rodina, aile dosyası, aile, açık aile, ffeil teulu, teulu, teulu agored, Сімейний файл, родина, відкрита родина, gia đình, gia đình, mở gia đình"
    },
    "Course": {
        "Id": 100647,
        "Name": "Revit",
        "Description": "CADLearning for Autodesk® Revit® covers the use of Revit, the premier Building Information Modeling (BIM) tool for design and construction professionals in the Architecture, Engineering and Construction industry. This content has been devised to help users quickly learn BIM methods for creating and modifying geometry, organizing data, performing analysis, and presenting designs using the specialized functions within Revit.",
        "Length": 333920,
        "Tags": ""
    },
    "Topic": null,
    "Project": null,
    "Product": {
        "Id": 100199,
        "Name": "Revit",
        "Description": "Autodesk® Revit® is the premier Building Information Modeling (BIM) tool for engineering and design professionals in the industries of architecture, structural engineering, and construction. Revit is the most popular architectural design tool from Autodesk, as its versatile functionality includes model creation, data organization and analysis, and the creation of impressive documentation and presentations. And, because Revit works within the world of BIM, one of its greatest benefits is it gives users the ability to easily coordinate the model across disciplines.",
        "VersionId": 3523,
        "Version": "2023"
    },
    "PlayList": null,
    "Command": null,
    "LegacyId": null,
    "DateTime": "2023-03-13T18:16:38.9645640+00:00",
    "Origin": "127.0.0.1",
    "ClientId": "Portal",
    "User": {
        "Id": 179939,
        "UserName": "jhancock@cadlearning.com",
        "FullName": "James Hancock",
        "EmailAddress": "jhancock@cadlearning.com",
        "IsActive": true
    },
    "Organization": {
        "Id": 100219,
        "Name": "ACME Training Corp.",
        "OrganizationType": 2,
        "Division": null,
        "Group": "New Group",
        "Groups": [],
        "Level": null,
        "Country": null,
        "Location": null,
        "Title": null
    },
    "IsNew": true,
    "@metadata": {
        "@collection": "ActivityLessons",
        "Raven-Clr-Type": "Data.RavenDb.Activity.Lessons.LessonActivity, Data.RavenDb"
    }
}

I want to programmatically return all unique Lesson.Name as a group with each value and then a set of aggregates for that value.

I.e.:

[
   {
      value: 'Opening Family Files',
      aggregates: [
          {
             field: 'Course.Name',
             aggregateType: AggregateTypes.Count,
             value: 190
          },
          
          {
             field: 'Lesson.Length',
             aggregateType: AggregateTypes.Sum,
             value: 1000000000
          }
      ]
]

Note that the group values are the unique values in the fields, not all unique words in a field which is the only difference between what I'm asking for and facets. (I believe) So even a parameter on the facet option to not break apart by word would solve the problem and I could just use facets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants