Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bulk loader work with arrays instead of strings #1101

Open
MatMoore opened this issue Dec 18, 2017 · 0 comments
Open

Make bulk loader work with arrays instead of strings #1101

MatMoore opened this issue Dec 18, 2017 · 0 comments
Labels

Comments

@MatMoore
Copy link
Contributor

Moved from https://trello.com/c/8zhBPuQT/12-make-bulk-loader-work-with-arrays-instead-of-strings.

What

Every night a job runs to rebuild the search index with new popularity data.
https://github.com/alphagov/search-analytics/blob/master/nightly-run.sh

The bulk load script accepts text from standard input, representing elasticsearch documents. It then calls indexing code that is shared with regular indexing functionality, even though the argument type is different.

This makes the code really difficult to work on, because any value can be either a string or an array of hashes. This complexity affects all of the indexing code, eg

    def bulk_payload(document_hashes_or_payload)
      if document_hashes_or_payload.is_a?(Array)
        index_items_from_document_hashes(document_hashes_or_payload)
      else
        index_items_from_raw_string(document_hashes_or_payload)
      end
    end

Why

There are two separate code paths that essentially do the same thing, and if you make any change to this code you have to be very careful to change both of them in the same way, and test both of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant