Skip to content
sambekar15 edited this page Oct 1, 2020 · 3 revisions

Superglue Architecture

Superglue is a collection of components which all contribute functionality to solving our core problems. In describing our architecture, we take special care to point out which components are clients and which are libraries, in order to clarify entry-points and extension-points of the system.

![Architecture](.github/assets/Screen Shot 2020-10-01 at 9.55.28 AM.png)


Library components

Library components are not themselves executable; instead, they expose a programmatic interface which can be consumed by clients in order to invoke their functionality. Below, we’ll give a high-level overview of problem each library component solves. Data-Access-Objects (DAO)

The DAO component defines programmatic access to Superglue’s data model, and abstracts over the persistence mechanism used in a given deployment. This library is used by any client that needs to read or edit the state of the system. Parser

The parser is used to analyze SQL scripts and identify usages of table names, as well as those tables are used as inputs (e.g. SELECT) or outputs (e.g. CREATE TABLE). This metadata is later used to construct a graph of lineage of data moving through those tables.

Input: SQL scripts
Output: Mapping of scripts to their input and output tables

Service

The service component provides an assortment of functionality which is used for fulfilling client requests.

Lineage

The LineageService class serves requests for lineage by stitching together a graph from the metadata collected by the parser.

Elasticsearch

The ElasticService class provides an elasticsearch client and useful methods for manipulating Superglue data on ES, including creating indices, uploading documents, and managing aliases. Client components

Client components are executables which consume the exposed interfaces of the library components to actually invoke the functionality provided. Command-line interface

The command line component is arguably the simplest way to interact with Superglue. It provides a handful of flags and parameters that can be used to configure what actions to take. A good example use-case is using the CLI to run the parser over a collection of scripts. REST API

The REST API provides an HTTP interface to Superglue’s functionality. Key among these is the ability for web clients to request the lineage of a given table or job, as well as to request execution statistics for active jobs. This API is the interface that supports the Superglue UI, a web application that visually presents the lineage and execution data.

Clone this wiki locally