New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial try to improve SearchSource consumtion in alerting rules #183211
base: main
Are you sure you want to change the base?
Conversation
/ci |
/ci |
💔 Build FailedFailed CI Steps
Test Failures
Metrics [docs]Public APIs missing comments
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
const fields = await createFields(searchSourceFields); | ||
const createSearchSourceFn = async ( | ||
searchSourceFields: SerializedSearchSourceFields = {}, | ||
useDataViewLazy: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we need to handle the hydration of searchSourceFields index differently for alerting, if we don't do this, e.g. in Discover and there are just the fields loaded, that are need, those are missing when a saved search is being loaded. We can work around of this, but I'd say, focus on the alerting relevant part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to create and make available both DataView and DataViewLazy objects during the transition.
Ack, no, that wouldn't work since the DataView would load all the fields...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's what I was thinking. That's why I was thinking, focusing on solving one problem, the more important one (alerting), keep the current way it works for the other one, so there are no tears
public async loadDataViewFields() { | ||
const dataView = this.getField('index'); | ||
if (!dataView || !(dataView instanceof DataViewLazy)) { | ||
return; | ||
} | ||
// eslint-disable-next-line no-console | ||
console.log('loadDataViewFields because of DataViewLazy'); | ||
const request = this.mergeProps(this, { body: {} }, ['query', 'filter']); | ||
let fields = dataView.timeFieldName ? [dataView.timeFieldName] : []; | ||
const sort = this.getField('sort'); | ||
if (sort) { | ||
const sortArr = Array.isArray(sort) ? sort : [sort]; | ||
for (const s of sortArr) { | ||
const keys = Object.keys(s); | ||
fields = fields.concat(keys); | ||
} | ||
} | ||
for (const query of request.query) { | ||
if (query.query) { | ||
const nodes = fromKueryExpression(query.query); | ||
const queryFields = getKueryFields([nodes]); | ||
fields = fields.concat(queryFields); | ||
} | ||
} | ||
const filters = request.filters; | ||
if (filters) { | ||
const filtersArr = Array.isArray(filters) ? filters : [filters]; | ||
for (const f of filtersArr) { | ||
fields = fields.concat(f.meta.field); | ||
} | ||
} | ||
fields = fields.filter((f) => Boolean(f)); | ||
|
||
if (dataView.getSourceFiltering() && dataView.getSourceFiltering().excludes.length) { | ||
// if source filtering is enabled, we need to fetch all the fields | ||
await dataView.getFields({ fieldName: ['*'] }); | ||
} else if (fields.length) { | ||
const fieldSpec = await dataView.getFields({ | ||
fieldName: fields, | ||
// fieldTypes: ['date', 'date_nanos'], | ||
}); | ||
const spec = Object.values(fieldSpec.getFieldMap()).map((field) => field.spec); | ||
dataView.setFields(spec); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this is new, and just needed for the alerting execution, where in this POC a DataViewLazy is being used. in this code we figure out when fields we need to load, that's the timefield, sort field, query fields, filter fields. once we know which fields we need, we get those, and they can build the query to ES
createLazy: (searchSourceFields: SerializedSearchSourceFields = {}) => { | ||
const fn = createSearchSource(indexPatterns, dependencies); | ||
return fn(searchSourceFields, true); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if tried to add just a configuration to the create function which uses a DataViewLazy then, for whatever reason it didn't work. with a dedicated function it did. it's a POC, it doesn't need to be polished
/** | ||
* Field list, in extended array format | ||
*/ | ||
public fields?: IIndexPatternFieldList & { toSpec: () => DataViewFieldMap }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a change for DataViewLazy for the alerting case. we just load the field we need for the alert, and then it's used for the alert ES query execution. works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think the general technique is good, adding a field list to DataViewLazy of the same sort as DataView is likely to lead to confusion and problems elsewhere. The DataViewLazy instances are cached so its possible that two different code paths could be setting the field list with different content. Instead, we could create a DataView locally (without the data view service!) and populate the field list. I think you suggested this elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this could be an approach, instead of DataViewLazy. Just keep in mind in the alerting context caching ain't a problem, it because fields are just loaded once on every execution. That's why the given code already works. It's handling Alerting execution differently than other executions. This allows us to fix the main problem without having to worry about other usages, they continue working
getComputedFields({ fieldName = ['*'] }: { fieldName: string[] }) { | ||
const scriptFields: Record<string, estypes.ScriptField> = {}; | ||
if (!this.fields) { | ||
return { | ||
scriptFields, | ||
docvalueFields: [] as Array<{ field: string; format: string }>, | ||
runtimeFields: {}, | ||
}; | ||
} | ||
|
||
const fieldMap = ( | ||
await this.getFields({ | ||
fieldName, | ||
fieldTypes: ['date', 'date_nanos'], | ||
scripted: false, | ||
runtime: false, | ||
}) | ||
).getFieldMap(); | ||
|
||
const docvalueFields = Object.values(fieldMap).map((dateField) => { | ||
// Date value returned in "_source" could be in a number of formats | ||
// Use a docvalue for each date field to ensure standardized formats when working with date fields | ||
// dataView.flattenHit will override "_source" values when the same field is also defined in "fields" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied code from DataView to here, to make this functional. which it does. alternatively we could also turn a DataViewLazy into a regular DataView without more fields being loaded. would also work.
@@ -56,11 +57,10 @@ export async function fetchSearchSourceQuery({ | |||
const { logger, searchSourceClient } = services; | |||
const isGroupAgg = isGroupAggregation(params.termField); | |||
const isCountAgg = isCountAggregation(params.aggType); | |||
|
|||
const initialSearchSource = await searchSourceClient.create(params.searchConfiguration); | |||
const initialSearchSource = await searchSourceClient.createLazy(params.searchConfiguration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the actual consumption of this POC, a searchSource is created with a lazy data view, made compatible like a regular DataView, doesn't work 100%, like the test failures show, but with a bit more effort it would
@@ -127,3 +130,21 @@ export { | |||
} from './src/utils'; | |||
|
|||
export type { ExecutionContextSearch } from './src/expressions/types'; | |||
|
|||
export function getKueryFields(nodes: KueryNode[]): string[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a function I just copied from
kibana/x-pack/plugins/observability_solution/apm/common/utils/get_kuery_fields.ts
Line 11 in d228f48
export function getKueryFields(nodes: KueryNode[]): string[] { |
it should be sufficient to extract fields from a KQL query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lukas is addressing this here - #183573 - its a bit more thorough and has a couple more tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's great!
if (!useDataViewLazy) { | ||
if (typeof searchSourceFields.index === 'string') { | ||
fields.index = await indexPatterns.get(searchSourceFields.index); | ||
} else { | ||
fields.index = await indexPatterns.create(searchSourceFields.index); | ||
} | ||
} else { | ||
fields.index = await indexPatterns.create(searchSourceFields.index); | ||
if (typeof searchSourceFields.index === 'string') { | ||
fields.index = await indexPatterns.getDataViewLazy(searchSourceFields.index); | ||
} else { | ||
fields.index = await indexPatterns.createDataViewLazy(searchSourceFields.index); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattkime let me shortly and hopefully explain in a better way what needs to be considered to get rid of the field_caps requests for alerting execution and why this code part is important
Originally, when a search source is created and an index is given, which is usually the case, it leads to a field caps request. I was interested what happens when I switch DataView with DataViewLazy, so no data view field caps request is triggered, and those fields, that are necessary, are fetched when they are needed. with a bit of copying code around, so DataViewLazy worked more similar to DavaView (except that no automatic field caps request is triggered), this worked. So in the alerting case just the fields that were necessary were fetched.
However, when a a saved search with an id is fetched in Discover, then this lazy data view with just the necessary fields, is being used as the main data view, which mean, just a few fields were mapped, rest unmapped. When a new saved search was created, unpersisted, this was not the case, because the index / data view was already there. Works as expected. This triggered my "Rabbit Hole Detection Alarm", because changing the behavior generally of SearchSource, high likely means e.g. changing the way SavedSearch are handled in Discover, and maybe more. That's why I tried to handle SearchSource differently depending on context, which in alerting rule context meant, DataViewLazy FTW, just fetch what's needed. while better let Discover etc search source consumption to be the same .. for now ... interested in hear your thought or how you intended to work around this. because, of course, I could be wrong
console.log('loadDataViewFields because of DataViewLazy'); | ||
const request = this.mergeProps(this, { body: {} }, ['query', 'filter']); | ||
let fields = dataView.timeFieldName ? [dataView.timeFieldName] : []; | ||
const sort = this.getField('sort'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I find this code very useful although I need to see how it compares to the various computations in the SearchSource.flatten
method. I think it likely covers all cases where fields need to be loaded but I need to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, of course this needs to be evaluated, I might have missed something, however, in local testing, adding a alert from discover with query, sorting, filters, it seemed to work (I just created the alert executing every few seconds, watching kibana log output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really only concerns scripted fields, so it doesn't need to be fetched. That said, skipping the field fetch for this is an optimization that can be left for later.
fields = fields.filter((f) => Boolean(f)); | ||
|
||
if (dataView.getSourceFiltering() && dataView.getSourceFiltering().excludes.length) { | ||
// if source filtering is enabled, we need to fetch all the fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was this arrived at? I just don't see why its needed based on existing search source code but I might be overlooking something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly where we do this in search source, didn't look, but for sure it needs to be done
then there is data view source filtering enabled, we need to send all other fields that we don't need to filter out ... using the fields API (default), works differently in using _source
. You could configure a data view with a field excluded, add a debugger in the search source code, and watch what and where it happens in the code
however, the solution is pretty simple. if there are fields to be filtered, fetch all fields. here we can't work around until ES supports excluding fields in this request
root = this, | ||
searchRequest: SearchRequest = { body: {} }, | ||
filter?: string[] | ||
): SearchRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that I don't really understand what these mergeProps changes are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also didn't
but now
What do I do with it? It's needed in loadDataViewFields
const request = this.mergeProps(this, { body: {} }, ['query', 'filter']);
for getting fields to fetch I need merged props, if mergeProps without filtering down to those needed, it runs into code that needs fields.
If I just merge query & filter, I get what's needed for the field retrieval.
@@ -894,6 +977,8 @@ export class SearchSource { | |||
} else { | |||
body.fields = filteredDocvalueFields; | |||
} | |||
// @ts-ignore | |||
body.fields = body.fields.filter((field) => Boolean(field)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this is filtering out empty values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made it work, I had undefined coming from somewhere, and filtered it out. should be solved differently, but I didn't have so much time I was like DataViewLazy, didn't do all the work :)
const request = this.mergeProps(this, { body: {} }, ['query', 'filter']); | ||
let fields = dataView.timeFieldName ? [dataView.timeFieldName] : []; | ||
const sort = this.getField('sort'); | ||
if (sort) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was poking through the code to determine if fields referenced in sort
need to be loaded. Did you figure this out? I dug for a bit but didn't come to a conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if you remove it, it breaks. I think it's because of this:
kibana/src/plugins/data/common/search/search_source/search_source.ts
Lines 673 to 679 in 844f515
case 'sort': | |
const sort = normalizeSortRequest( | |
val, | |
this.getField('index'), | |
getConfig(UI_SETTINGS.SORT_OPTIONS) | |
); | |
return addToBody(key, sort); |
which leads to this:
kibana/src/plugins/data/common/search/search_source/normalize_sort_request.ts
Lines 44 to 58 in 40303a8
if (indexPattern && typeof indexPattern !== 'string') { | |
const indexField = indexPattern.fields.find(({ name }) => name === sortField); | |
if (indexField && indexField.scripted && indexField.sortable) { | |
return { | |
_script: { | |
script: { | |
source: indexField.script, | |
lang: indexField.lang, | |
}, | |
type: castSortType(indexField.type), | |
...order, | |
}, | |
}; | |
} | |
} |
it seems to be necessary for sorting by a scripted field
@@ -110,7 +112,7 @@ export interface SearchSourceFields { | |||
/** | |||
* {@link IndexPatternService} | |||
*/ | |||
index?: DataView; | |||
index?: DataView | DataViewLazy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This affects any code that reads a data view from search source, causing typescript errors. I'm going to examine using DataViews internally with a custom field list.
It seems this wasn't immediately clear since type check often only reports a subset of type problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick note without having the ability to go into depth, this change would be redundant if for alerting no data view would be assigned initially , and just for the query a DataView with just the necessary fields that were fetched before would be assigned before the query is built .
Summary
This is a quick POC evaluating how we could make use of DataViewLazy to reduce field_caps requests without having to rewrite a ton of code.
Checklist
Delete any items that are not applicable to this PR.
Risk Matrix
Delete this section if it is not applicable to this PR.
Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.
When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:
For maintainers