Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to decode a block (fully or partially) will never be corrected #436

Open
6 tasks
EvanHahn opened this issue Jan 18, 2024 · 2 comments
Open
6 tasks
Assignees
Labels
mvp Requirement for MVP

Comments

@EvanHahn
Copy link
Contributor

Description

The problem

If a device fails to fully decode a block, the missing data will be forever missing from the SQLite index, and therefore the app. I think this can happen if there's a total or partial failure to decode a block.

Total failure to decode

The simplest example: when IndexWriter fails to decode a block, it discards it:

try {
const version = { coreDiscoveryKey: discoveryKey(key), index }
var doc = this.#mapDoc(decode(block, version), version)
} catch (e) {
this.#l.log('Could not decode entry %d of %h', index, key)
// Unknown or invalid entry - silently ignore
continue
}

Imagine that my device is on an older version than yours, and that your device has a data type I've never heard of. decode() will throw and the block will be "silently ignore"d.

We will never return to this block. This block will sync to my Hypercore but will be permanently missing from my SQLite index, even if I update.

Partial failure to decode

I believe a similar issue can happen when a data type gets a new field. A device will only add fields it understands:

// Snippet from https://github.com/digidem/mapeo-sqlite-indexer/blob/4e9484442204157a792d808da333768f49774c4d/index.js#L74-L77
this.#writeDocSql = db.prepare(
  `REPLACE INTO ${docTableName} (${docColumns.join(',')})
  VALUES (${docColumns.map((name) => `@${name}`).join(',')})`
)

If a new field is added, we'll never go back to insert it.

Possible solutions

I can think of a few solutions:

  1. When we detect a version change (e.g., a new field), completely rebuild the index from scratch.
    • Pros: should be simple to drop everything and reset.
    • Cons: will this be slow? Will it open a long-running transaction that blocks other database operations? Is it easy to detect these changes reliably? Will it require a "migrating" screen to be added to CoMapeo? Should there be a way to manually rebuild the index in some advanced settings in case we make a mistake?
  2. Store non-decoded blocks and fields for later processing. At some point (e.g., startup of a new app version), reprocess them.
  3. Some other option I'm not thinking of.
    • Pros: ???
    • Cons: ???

I think this is worth discussing, but I think option 1 is probably best in the short term. It seems fairly easy to implement, and we can always implement option 2 if performance becomes a concern. But I may be wrong!

Tasks

Assuming we take my preferred approach:

  • Add tests
    • Test a block that completely fails to decode (e.g., new data type)
    • Test a block that fails to decode a field (e.g., a new field)
  • Add mechanism to detect version changes (may already be done?)
  • Add function to completely rebuild the index from scratch
  • Work with frontend to figure out best API for this. Should it be done automatically, or should it expose APIs so the frontend can display some UI? Should there be events fired?
@EvanHahn
Copy link
Contributor Author

EvanHahn commented Mar 7, 2024

Seems related to #27.

@EvanHahn EvanHahn added the mvp Requirement for MVP label Mar 7, 2024
@EvanHahn EvanHahn self-assigned this Mar 7, 2024
@gmaclennan
Copy link
Member

I think option #1 seems preferable. We don't have to worry about storing versions as per #27 for the MVP, since we will know what versions the MVP ships with, so can add it in the next iteration. It would be good to build some metrics that run on mobile devices so that we can see how long a complete re-index takes. It is the same time that it takes to sync and add a new device to a project. I think it is ok that a user sees an "upgrading your database" message after an app update, but we should check that with the co-design team after we know how long it might take (big difference between 15 seconds and 10 minutes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mvp Requirement for MVP
Projects
None yet
Development

No branches or pull requests

2 participants