You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many cases where modeling a real-world access pattern requires support for linearized operations. This includes the oft-termed client-side join, which is a very common pattern in distributed systems. For example, a user account may be looked up via some natural identifier, and immediately following, related rows may be looked up depending on the status of the account. These subsequent operations may depend on some local or surrogate type of key which is only known to the system. Further, there may be multiple look-ups across a number of data sources or partitions, concurrently or serially.
There are two mechanisms at play here which are not fully supported in NoSQLBench across drivers:
Capturing fields from a successful operation (for example, a user_id from a ResultSet).
Allowing concurrent dispatch of a set of related operations.
This discussion is meant to elaborate on the first of these specifically, so that interested contributors can help identify the best way to implement this for users.
Thanks
Some of the ideas in this topic were seeded in conversations with long-standing NB contributors Sebastian Estevez and Wei Deng. Thank you!
Precedents
NB already has the baseline capabilities needed to enable field capture.
Op Sequencing
NB already supports serialized execution of a set of operations. In fact, this is the default mode when there are multiple operations active in an activity. Each thread executes each of the operations according to the order and ratios provided, serially. This means that, although they may not be linearized in terms of dependency, they are already sequenced as if they were by default. Extending this to allow for simple extraction of values is not a significant change.
Caching Values
NB already supports caching data per-thread, by name. This is used with some bindings which save and load values. However, to use this effectively, users have to have a projective model of how their data will relate and in what order the bindings will be invoked and initialized. This makes data-dependent access patterns almost too complicated to manage. Yet, the thread-local variable cache provides a suitable way to manage state. A future version may want to provide some stack-local or interstitial storage between linearized operations for better data-flow encapsulation and resiliency.
Terms
Bind Point (extant term in NB) - A point in a data template at which data is injected
Capture Point (new term) - A point in a data template which designates data to be captured or extracted
Capture Field (new term) - A named field which is designated to be captured from an operation
Capture Variable (new term) - An instance of a named value which has been captured
Capture Syntax
The existing template forms allow for injection of data using a simple syntax for data binding points. It make sense to use the same basic form for specifying data capture. Bind points can be set with {var} or {{Identity()}} syntax. The capture syntax should be similar, but distinct: [varname].
For some protocols, statement forms will already provide a suitable backdrop upon which to use the new capture syntax. For example, select field1,field5 from ... already specifies the logical name of a field in way which can be implemented in a specific and predictable way. You could simply flag a variable for capture as select [field1],field5 from .... In this way, the capture syntax is effectively marking up existing operations in a visually obvious way.
Direct capture fields
For some other protocols which do not provide a suitable text on which to use the above syntax, it may be necessary to specify captured fields directly. An op template could specify this as:
op:
# a typical bind point
get_key: {{Mod(15000L); ToString()}}
# an explicitly requested capture field
capture: last_login_date
# or, alternately
capture:
- last_login_date
Capture As
Anywhere a capture field is specified, it should be possible as well to indicate what variable name the value should be stored under. This is very analogous to the "as" syntax in CQL or SQL.
Using Values
A simplified syntax for injecting captured values back into other operations is suggested as {[name]}, which is essentially a place holder for a {{Load('name')}} call. This can be implemented initially as a simple pre-processor rewrite rule. This has the advantage of distinguishing this pattern for specific improvements separate and isolated from the extant thread-local variable cache.
Example
ops: |userkey:
raw: "select [user_key as key1] from user_accounts where user_email='{email}'"acctmeta: | raw: "select * from acct_meta where user_key={[key1]}"
In this example, the first statement will be parsed, and then submitted to the driver as simply select user_key from user_accounts where user_email='user@example.org'. All traces of capture points or bind points are removed during op template pre-processing.
Chained Dependency and Error Handling: For now, the notion of error handling within an op flow is not explicitly called out.
The extant error handling mechanism can suffice if the op flow is contained within a cycle.
If operations span cycles, then errors which occur before dependent operations should be able to cancel those operations and mark them with a distinct error.
Sanity Checks: References or unused captures are at least called out to the user.
Value Cardinality: It may be necessary to provide multiple-valued capture, for example in result sets with multiple rows. If multiple values are captured, users would need a way to specific what element, portion, or totality they are referencing.
Statement synthesis: It may be necessary to generate operations within the cycle based on result set cardinality.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
There are many cases where modeling a real-world access pattern requires support for linearized operations. This includes the oft-termed client-side join, which is a very common pattern in distributed systems. For example, a user account may be looked up via some natural identifier, and immediately following, related rows may be looked up depending on the status of the account. These subsequent operations may depend on some local or surrogate type of key which is only known to the system. Further, there may be multiple look-ups across a number of data sources or partitions, concurrently or serially.
There are two mechanisms at play here which are not fully supported in NoSQLBench across drivers:
This discussion is meant to elaborate on the first of these specifically, so that interested contributors can help identify the best way to implement this for users.
Thanks
Some of the ideas in this topic were seeded in conversations with long-standing NB contributors Sebastian Estevez and Wei Deng. Thank you!
Precedents
NB already has the baseline capabilities needed to enable field capture.
Op Sequencing
NB already supports serialized execution of a set of operations. In fact, this is the default mode when there are multiple operations active in an activity. Each thread executes each of the operations according to the order and ratios provided, serially. This means that, although they may not be linearized in terms of dependency, they are already sequenced as if they were by default. Extending this to allow for simple extraction of values is not a significant change.
Caching Values
NB already supports caching data per-thread, by name. This is used with some bindings which save and load values. However, to use this effectively, users have to have a projective model of how their data will relate and in what order the bindings will be invoked and initialized. This makes data-dependent access patterns almost too complicated to manage. Yet, the thread-local variable cache provides a suitable way to manage state. A future version may want to provide some stack-local or interstitial storage between linearized operations for better data-flow encapsulation and resiliency.
Terms
Capture Syntax
The existing template forms allow for injection of data using a simple syntax for data binding points. It make sense to use the same basic form for specifying data capture. Bind points can be set with
{var}
or{{Identity()}}
syntax. The capture syntax should be similar, but distinct:[varname]
.For some protocols, statement forms will already provide a suitable backdrop upon which to use the new capture syntax. For example,
select field1,field5 from ...
already specifies the logical name of a field in way which can be implemented in a specific and predictable way. You could simply flag a variable for capture asselect [field1],field5 from ...
. In this way, the capture syntax is effectively marking up existing operations in a visually obvious way.Direct capture fields
For some other protocols which do not provide a suitable text on which to use the above syntax, it may be necessary to specify captured fields directly. An op template could specify this as:
Capture As
Anywhere a capture field is specified, it should be possible as well to indicate what variable name the value should be stored under. This is very analogous to the "as" syntax in CQL or SQL.
Using Values
A simplified syntax for injecting captured values back into other operations is suggested as
{[name]}
, which is essentially a place holder for a{{Load('name')}}
call. This can be implemented initially as a simple pre-processor rewrite rule. This has the advantage of distinguishing this pattern for specific improvements separate and isolated from the extant thread-local variable cache.Example
In this example, the first statement will be parsed, and then submitted to the driver as simply
select user_key from user_accounts where user_email='user@example.org'
. All traces of capture points or bind points are removed during op template pre-processing.Related
previous request: See #136
Open Questions
Beta Was this translation helpful? Give feedback.
All reactions