Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the prefixes found in the top level @context, including remote @context. #193

Open
afs opened this issue Oct 28, 2021 · 10 comments
Open
Assignees
Labels
enhancement New feature or request Stale

Comments

@afs
Copy link

afs commented Oct 28, 2021

Prefixes have no standing in the RDF data model but they are convenient for display of URIs.

Describe the solution you'd like
Expose the compact URI prefix mapping from the top-level @context, maybe a method RdfDataset.prefixes() that returns a
Map<String, String>. This would be limited to the prefixes from the top level @context, the active context in-scope at the end of parsing the top level JSON after any nested local context have dropped out-of-scope.

Describe alternatives you've considered
Secondary parsing at the JSON level of the JSON Document (this is what Jena v4.2.0 does). This does not included remote @context as it would require re-downloading the URL or interacting with any context cache.

Jena also requires the prefix URI to end in "/", "#" or ":" and Jena includes @vocab as prefix "". There are pragmatic Jena decisions that could be applied to the Map returned by Titanium.

Additional context
This came up as part of JENA-2187.

@filip26
Copy link
Owner

filip26 commented Oct 28, 2021

Hi @afs,
thank you for reporting that. Please help me understand the issue in order to prepare test cases.

Do I understand it right that the goal is to generate RDF Turtle from a given JSON-LD input?

The JSON-LD to RDF algorithm expands an input and the expanded input (all prefixes lost after this step) is converted into node map.
So I'm thinking that maybe we could somehow utilize a compaction algorithm to get prefixed output, or just the prefixes.

@afs
Copy link
Author

afs commented Oct 28, 2021

Hi @filip26,

Turtle output is one use; there are several different Turtle output formats from "pretty" to a one quad-one line form which is "N-Quads+prefixes". Output does not happen when the JSON-LD is read in - the steps are read in, store, (later) write out.

Other uses include converting URIs to convenient string for UI display is another. In Jena, the dataset is the storage unit and it carries with it some prefixes.

The prefixes normally come from the files parser to build the dataset.

The process of going from Titanium to Jena is:

private void read(Document document, StreamRDF output, Context context) throws Exception {
        // JSON-LD to RDF
        RdfDataset dataset = JsonLd.toRdf(document).get();
        extractPrefixes(document, output::prefix);
        JenaTitanium.convert(dataset, output);
    }

https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/lang/LangJSONLD11.java

StreamRDF is the abstraction for sending parser output.

  • get the Titanium dataset
  • find prefixes and send to output
  • convert the list of Titanium RdfNQuad to Jena Quad and send to output.

output is typically writing into a Jena DatasetGraph - the storage abstraction.

DatasetGraph has a method prefixes() to return the prefixes carried by the dataset.

For:

{
    "@context": {
	"@version": 1.1,
	"foaf" : "http://xmlns.com/foaf/0.1/",
	"skos" : "http://www.w3.org/2004/02/skos/core#"
    }
}

I was hoping to have RdfDataset provide a map "foaf" -> "http://xmlns.com/foaf/0.1/" , "skos" -> "http://www.w3.org/2004/02/skos/core#".

Conversion between systems:
https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/system/JenaTitanium.java

@filip26 filip26 added the enhancement New feature or request label Oct 28, 2021
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!

@github-actions github-actions bot added the Stale label Nov 28, 2021
@filip26 filip26 added this to the 1.2.0 milestone Dec 12, 2021
@filip26 filip26 removed the Stale label Dec 12, 2021
@filip26 filip26 self-assigned this Dec 12, 2021
@filip26
Copy link
Owner

filip26 commented Dec 12, 2021

How to deal with conflicting prefixes?

e.g.

{
  "@context": {
    "name": "http://example.com/person#name",
    "details": "http://example.com/person#details"
  },
  "name": "Markus Lanthaler",
  "details": {
    "@context": {
      "name": "http://example.com/organization#name"
    },
    "name": "Graz University of Technology"
  }
}

converted into n-quads

_:b0 <http://example.com/person#details> _:b1 .
_:b0 <http://example.com/person#name> "Markus Lanthaler" .
_:b1 <http://example.com/organization#name> "Graz University of Technology" .

What keys should contain the prefix map?

@afs
Copy link
Author

afs commented Dec 13, 2021

"name": "http://example.com/person#name" isn't really a prefix - it's a short name for a URI.
Prefixes appear in Turtle as prefix:localName which is more like:
"person": "http://example.com/person#" and then person:name

Those can be nested as well so there is a decision point here.
There isn't a wrong answer.

RDF/XML can have nested xml namespaces declarations (the XML equivalent of prefixes). It is quite unusual to see nested XML namespaces in RDF/XML - I think they would be more common in JSON-LD.

JSON is slightly different to XML because XML is parsed in encounter order and JSON is a map.

Possibility 1: ignoring the inner @context and only expose the document-wide declarations.
Possibility 2: slightly more complicated is "put in as nested - outer overrides inner"

It probably makes sense for the outer, document definition to be in the final outcome.

HTH

@filip26
Copy link
Owner

filip26 commented Dec 14, 2021

if the given example should produce prefix map like this one:

{ 
  "person":  "http://example.com/person#", 
  "organization":  "http://example.com/organization#"
}

then we have to develop an algorithm for extracting and naming prefixes from JSON-LD context. Perhaps, we could start with a map of well known prefixes (foaf, skos, ...).

The other options is to generate prefix map from N-Quads using a part of URL as prefix name.

@filip26
Copy link
Owner

filip26 commented Dec 14, 2021

Just an aside note: from another point of view; as I understand prefixes are about readability. Thus in some cases it would be more beneficial to a consumer to provide its own list of well known prefixes in order to get an easily readable output.

@afs
Copy link
Author

afs commented Dec 14, 2021

Yes. The user can add them to the Jena graph for example, or even read a Turtle file which only has prefixes. This happens when loading N-triples - no prefixes, but common for large database dumps - and the user wants to get some nicer output.

@filip26
Copy link
Owner

filip26 commented Dec 19, 2021

I'm preparing a low level JsonLdProcessor API that will allow you to grab a context or/and optimize processing. Target version is 1.3.0

@filip26 filip26 modified the milestones: 1.2.0, 1.3.0 Dec 19, 2021
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!

@github-actions github-actions bot added the Stale label Jan 19, 2022
@filip26 filip26 modified the milestones: 1.3.0, 1.4.0 Apr 8, 2022
@filip26 filip26 removed this from the 1.4.0 milestone Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants