Skip to content
@gleanerio

GleanerIO

A set of projects implementing principles around indexing structured data on the web / schema.org (Developed as part of NSF's EarthCube)

GleanerIO

About

Gleaner is a tool for extracting JSON-LD from web pages. You provide Gleaner a list of sites to index and it will access and retrieve pages based on the sitemap.xml of the domain(s). Gleaner can then check for well formed and valid structure in documents and process the JSON-LD data graphs into a form usable to drive a search interface.

Pinned

  1. gleaner gleaner Public

    Gleaner: JSON-LD and structured data on the web harvesting

    Go 14 10

  2. nabu nabu Public

    Nabu: Synchronize data graph objects with a triplestore

    Go 1 2

  3. scheduler scheduler Public

    Scheduling approaches related to gleaner tooling

    Python 1 3

  4. archetype archetype Public

    A testbench repo with the three primary personnas of user, provider and indexer.

    CSS 4 1

  5. notebooks notebooks Public

    Jupyter notebooks for SHACL processing, JSON-LD framing and object operaations

    Jupyter Notebook 1

  6. scienceonschemaexamples scienceonschemaexamples Public

    This repository will contain actual science on schema JSON-LD files, and HTML pages that contain JSON-LD scripts to document as possible test cases

    HTML

Repositories

Showing 10 of 11 repositories

Top languages

Loading…

Most used topics

Loading…