DuplicatiIndexer

📃 A way to easily find files that exist within a Duplicati backup

What is this

When Duplicati creates a backup, it creates an index file called filelist.json. This file can often be gigabytes in size, making it very hard to search for files within the backup.

This project creates an index file that is on average 40x smaller than filelist.json and is easily searchable with the program.

How does it work

The program uses ijson to stream the massive JSON index and creates a "MARISA trie" of the paths found. Since the items are stored in the "trie" (as seen below) they can be searched by a prefix very quickly.

How do I use it?

Install Python 3 and Poetry
Run poetry install in the project root directory
Retrieve the filelist.json file from the backup's *.dlist.zip archive
To create an index:
1. Run poetry run python -m jsonpy create [input] [output]
2. If you run this command without arguments it will, by default, ingest a file called filelist.json and spit out index.marisa.gz
To search an existing index
1. Run poetry run python -m jsonpy search [input] [search term]

What is this licensed under?

  This Source Code Form is subject to the terms of the Mozilla Public
  License, v. 2.0. If a copy of the MPL was not distributed with this
  file, You can obtain one at http://mozilla.org/MPL/2.0/.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
.vscode		.vscode
jsonpy		jsonpy
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pylama.ini		pylama.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.vscode

.vscode

jsonpy

jsonpy

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

poetry.lock

poetry.lock

pylama.ini

pylama.ini

pyproject.toml

pyproject.toml

Repository files navigation

DuplicatiIndexer

What is this

How does it work

How do I use it?

What is this licensed under?

About

Releases

Packages

Languages

License

HarryPeach/DuplicatiIndexer

Folders and files

Latest commit

History

Repository files navigation

DuplicatiIndexer

What is this

How does it work

How do I use it?

What is this licensed under?

About

Resources

License

Stars

Watchers

Forks

Languages