parler-parse

Parler HTML goes in (stdin), structured JSON comes out (stdout)

Might be useful for feeding into elasticsearch or cross-referencing with the video/images dump.

Usage

You will need a rust compiler (easiest way is via rustup) to build from source. After that run the following commands in your terminal:

# clone the repo
git clone https://github.com/ilsken/parler-parse.git && cd parler-parse

# run the example
https://github.com/ilsken/parler-parse.git
cargo run < examples/echo--parent-no-comment.html

CLI options

USAGE:
    parler-indexer [FLAGS] [OPTIONS] [--] [path]...

FLAGS:
    -c, --compact      Output compact (single line) JSON. Defaults to true if stdin in not a terminal
    -h, --help         Prints help information
    -r, --recursive    Recursively search directories
    -V, --version      Prints version information

OPTIONS:
        --fail-log <fail file>              Write failed paths to a file
        --paths-from-file <path file>...    Read paths from a file
        --success-log <success file>        Write successfully processed paths to a file

ARGS:
    <path>...    HTML File(s) or directory of HTML File(s) to parse

Where do I get the archives?

This project was developed against the "partial parler post text" archive that available from Distributed Denial of Secrets.

Currently parses:

OG Meta
Posts + Echos
- Author (username + name + avatar + badge)
- Body
- Media Attachments (Url, Title, Excerpt, Type, ID (numeric and base62/hex encoded))
Comments + Replies + Engagements
Metrics (impressions, echoes, comment count, etc)
All mentioned usernames in the post
Profile pages + all posts
Estimated timestamp offset (3 days ago -> - 3 days in seconds)

Roadmap

✅ Bug: Author field will be null if a user just echoe'd a post (only has the author of the echoed post). We can populate it with the og meta title field
✅ Multi-threaded, recursive directory processing (crossbeam + rayon)
✅ Allow bulk / multi-threaded processing for all files in a directory for quickly importing into elastic/mellisearch/tantivy
[TODO] Add file metadata (create/modified date/path)
[TODO] WARC support + metadata
[TODO] Fix up timestamps based on metadata

Example output

{
  "opengraph_meta": {
    "title": "@AnthonyDaubs - AnthonyDaubs -",
    "owner": {
      "name": "AnthonyDaubs",
      "username": "@AnthonyDaubs"
    },
    "url": "/post/8c36602d9568482dacfc55d9b63d5a07",
    "image_url": "https://images.parler.com/af00acf47ba74651998fb9676aabd117_256"
  },
  "posts": [
    {
      "echo_by": null,
      "cards": [
        {
          "kind": "Post",
          "author": {
            "name": "AnthonyDaubs",
            "username": "@AnthonyDaubs",
            "avatar": {
              "url_raw": "https://images.parler.com/af00acf47ba74651998fb9676aabd117_256",
              "url": "https://images.parler.com/af00acf47ba74651998fb9676aabd117_256",
              "host": "images.parler.com",
              "is_external": false,
              "id": "af00acf47ba74651998fb9676aabd117"
            }
          },
          "rel_ts": "2 days ago",
          "approx_ts_offset": -172800,
          "body": "",
          "impression_count": 3,
          "is_sensitive_content": true,
          "media_items": [
            {
              "kind": "Video",
              "title": "",
              "link": {
                "label": "https://video.parler.com/Q2/s5/Q2s5oVN1pfgk_small.mp4",
                "url_raw": "https://video.parler.com/Q2/s5/Q2s5oVN1pfgk_small.mp4",
                "url": "https://video.parler.com/Q2/s5/Q2s5oVN1pfgk_small.mp4",
                "host": "video.parler.com",
                "is_external": false,
                "id": "Q2s5oVN1pfgk",
                "id_b62_dec": 1355361448748163000000
              },
              "excerpt": "",
              "source": {
                "label": "",
                "url_raw": "https://video.parler.com/Q2/s5/Q2s5oVN1pfgk_small.mp4",
                "url": "https://video.parler.com/Q2/s5/Q2s5oVN1pfgk_small.mp4",
                "host": "video.parler.com",
                "is_external": false,
                "id": "Q2s5oVN1pfgk",
                "id_b62_dec": 1355361448748163000000
              },
              "numeric_id": null
            }
          ]
        }
      ],
      "comments": [],
      "post_id": null,
      "mentions": [],
      "engagements": {
        "comment_count": 0,
        "echo_count": 0,
        "upvote_count": 0
      }
    }
  ]
}

License

MIT licensed, feel free to use it. If you want to use it for research, I'd love to hear about it and help if I can. Shoot me an email or message me on twitter (@chris_tarquini)

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
examples		examples
src		src
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

src

src

vendor

vendor

.gitignore

.gitignore

.gitmodules

.gitmodules

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

Repository files navigation

parler-parse

Usage

CLI options

Where do I get the archives?

Currently parses:

Roadmap

Example output

License

About

Releases

Packages

Languages

License

tarqd/parler-parse

Folders and files

Latest commit

History

Repository files navigation

parler-parse

Usage

CLI options

Where do I get the archives?

Currently parses:

Roadmap

Example output

License

About

Resources

License

Stars

Watchers

Forks

Languages