Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term memoization #272

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

rescribet
Copy link
Collaborator

@rescribet rescribet commented Nov 9, 2018

As per request I've integrated some of the external changes to rdflib into the library.

This is still a WIP still contains some minor bugs, but the tests succeed and it works properly when integrated into our rdflib app with minimal changes, but some additional code isn't ported yet, which might come this weekend. But the overall mechanics and changes are already visible.

The largest change is having to use factory methods rather than constructor calls, IIRC this could be overcome by using ES5 functional types rather than ES6 classes, but I can see value in using factories and leaving the constructor semantics unchanged to prevent unexpected behaviour (e.g. new Literal('a') === new Literal('a') // => true).

TODO:

  • Check if remaining proxy code is still necessary, and port it if true
  • Memoize the literals
  • Performance benchmarks to back-up the claims
  • (after approval) Update general documentation
  • (after approval) Look into releasing items from memory for extremely long-running processes

@rescribet rescribet force-pushed the store-memoization branch 8 times, most recently from 7a6c734 to 9fbd794 Compare November 11, 2018 15:07
@rescribet
Copy link
Collaborator Author

rescribet commented Nov 11, 2018

So, I've created a turtle parsing benchmark, and the results show about a 40% decrease in memory usage (~55% after garbage collection), and an 8% reduction in processing time.

Looking at the difference before and after the gc call, there seems to be some additional space for reducing memory usage.

if (datatype) {
this.datatype = NamedNode.fromValue(datatype)
}
this.datatype = datatype ? NamedNode.fromValue(datatype) : XSD.string
}
copy () {
Copy link
Collaborator Author

@rescribet rescribet Nov 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method (copy) seems rather nonsensical now, since indexing is done with the value, so either reindexing logic has to be created, or the user should pass intended modifications to this function.

src/literal.js Outdated Show resolved Hide resolved
src/namespace.js Outdated Show resolved Hide resolved

nn.sI = ++this.termIndex
nn.term = ln
this.termMap[nn.sI] = this.nsMap[nn.value] = nn
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking whether this.nsMap[nn.value] already contains a value can aid in debugging, but introduces a runtime overhead. If rdflib has a 'dev build' it'd certainly be a good addition

@rescribet rescribet changed the title [WIP] Store memoization Store memoization Nov 13, 2018
@rescribet
Copy link
Collaborator Author

Collection still is missing from memoization since, due to its mutable nature, it might require a different strategy. Giving it a store index (should be system index? since the class spans multiple stores) would allow more performant indexedformula indexing (due to the way number indexes are handled vs string indexes)

Fletcher91 added 3 commits November 19, 2018 12:14
It was using `ns` assuming it's available globally
This deduplicates all objects entering the store, keeping at most one
instance for each value in memory
Fletcher91 added 3 commits November 19, 2018 13:20
This implements some of the methods shared over all memoized Node
descendants. Though RDF would only contain Named- blank-nodes and
literals, RDFlib also allows Collection and Graph types to be introduced
into a triple, so an additional layer of is introduced to overwrite some
of the core methods with more performant versions which take advantage
of the memoization (e.g. `equals()` using a single `===`).

Another major change is the introduction of `#generateString`, which
contains implementation details about how a class should be stringified.

Since there should only ever be a single instance for every unique node,
the `#toString` method is overridden to memoize the result of the first
time `#generateString` is called, saving later string concatenation
operations and garbage collection for a persistent memory allocation.
This has a few advantages:
* Makes rdflib.js comply with the rdfjs task force definition (http://rdf.js.org/#literal-interface)
* Makes behaviour consistent with the new `Term.literalByValue` implementation
* Downstream can use switch without additional undefined and null checking

We now only have one instance of xsd:string, so the overhead is only a
reference rather than an entire object for each literal.
@michielbdejong
Copy link
Collaborator

Awesome work! Trying to write down what you, @Vinnl and I just discussed f2f:

I do find it a bit scary that when you do:

var a = new A();
var b = new A();
a.value = 'x';
b.value ='y';
console.log(a.value);

You would get 'y' instead of 'x'.

So I think that we might be better off going for nodes that are just objects. So instead of:

var node = new NamedNode('http://example.com/friend')

we could maybe just do:

var node  = {
  termType: 'NamedNode',
  value: 'http://example.com/friend'
}

That would also be a lot cheaper for passing these to webworkers and back.

@elf-pavlik
Copy link
Contributor

elf-pavlik commented Jul 24, 2019

Not sure how this PR affects conforming to RDF/JS Data Model. As a note, one of latest changes to that draft provides DataFactory#fromTerm and DataFactory#fromQuad which can also take JSON serializable object and upgrade it to RDF/JS conformant object.

To go other way and 'downgrade' RDF/JS conformant object to JSON serializable object I recall issue opened by @RubenVerborgh rdfjs/data-model-spec#94 which might need revising.

That would also be a lot cheaper for passing these to webworkers and back.

Currently I experiment with prototype based on https://github.com/PolymerLabs/actor-helpers which actually aims at moving as much as possible into web workers. Easy round trip between RDF/JS conformant objects and JSON serializable objects will come there as requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants