Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements in documentation #5

Open
antolinos opened this issue May 16, 2022 · 5 comments
Open

Minor improvements in documentation #5

antolinos opened this issue May 16, 2022 · 5 comments

Comments

@antolinos
Copy link

I am in the process of populating the scoring DB. It is very well documented but I found some minor pitfalls that could be easily improved.
The documentation says that the model has 3 fields:

  1. _id,
  2. _group
  3. _fields

I found some issues when I was playing with it:

  1. The use of _. When I was trying to POST /items the documentation says that each item should match the structure defined in the model section however I only make it to work when I removed the _. It means that group instead of _group and fields instead of _fields. I saw that id is an alias of _id
    id: str = Field(alias='_id')

I would propose to leave only id at least in the documentation that sounds to me more coherent

  1. _fields type: it is a dictionary even if the documentation says that it can contain just a string

It can contain a string or a complex nested json object.

Next item did not work for me:

 {"id":186877495,"group":"datasets","fields": "Ford Mustang 1964"}

and will raise the following error:

{'detail': "An exception of type ValidationError occurred. Arguments:\n([ErrorWrapper(exc=DictError(), loc=('fields',))], <class 'app.models.items.ItemModel'>)"}

This worked:

{"id":186877495,"group":"datasets","fields":  {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}}
@nitrosx
Copy link
Collaborator

nitrosx commented May 16, 2022

You are correct regarding the name of the fields. Let me know if you could submit a PR for the documentation.

Regarding the type of field, you are also correct. It was my intention to allow for a simple string and for a complex structure, but I impose to be a structure in the model. Both of them in my mind are valid options, although I tested only with complex structures.

I will check the feasibility to allow fields as a simple string (example a simple abstract) or a complex structure (like the one that you provided as example)

@antolinos
Copy link
Author

antolinos commented May 16, 2022

This was my first and probably naive approach:

{
"id":186877495,
"group":"datasets",
"fields":"cm01  GridSquare_8305671cflat001 Near atomic resolution cryoEM structure of the type 6 secretion system membrane complex mx2005 10.15151/ESRF-ES-186874482 "
}

But I have not tried yet to weight the datasets.

By the way, which are the list of valid values for group?

@nitrosx
Copy link
Collaborator

nitrosx commented May 16, 2022

I would structure the item in the following way:
{ "id":186877495, "group":"datasets", "fields":{ "abstract" : "cm01 GridSquare_8305671cflat001 Near atomic resolution cryoEM structure of the type 6 secretion system membrane complex mx2005 10.15151/ESRF-ES-186874482 " } }

group can be any string that you would like to use, as long as it is consistent in term of upper or lower case.

@antolinos
Copy link
Author

I would structure the item in the following way: { "id":186877495, "group":"datasets", "fields":{ "abstract" : "cm01 GridSquare_8305671cflat001 Near atomic resolution cryoEM structure of the type 6 secretion system membrane complex mx2005 10.15151/ESRF-ES-186874482 " } }

But it is not the abstract, it is the concatenation of the proposal name, dataset name, proposal title, visit name and doi. By looking at the weights computation, the word abstract will not biases the results as will be used as a term shared by all datasets?

group can be any string that you would like to use, as long as it is consistent in term of upper or lower case.

I did not get that, what does it mean? Should they be upper or lower case? I thought that group (datasets or documents) where used for the search api to distinguish between proposal and publication:
image

If it is not the case, what is group used for?

@nitrosx
Copy link
Collaborator

nitrosx commented May 16, 2022

Based on the latest post, I would structure your item as it follows:
{ "id":186877495, "group":"datasets", "fields":{ "proposal_name" : "cm01", "dataset_name" : "GridSquare_8305671cflat001", "proposal_title" : "Near atomic resolution cryoEM structure of the type 6 secretion system membrane complex", "visit_name" : "mx2005", "doi" : "10.15151/ESRF-ES-186874482" } }

although doi will not add any added value.
The word abstract will add some bias, but if you have enough entries, the bias will be minimal.

The group is an arbitrary string that you assign to group together a number of items.
Example: if I want to score separately a group of items that are derived from the datasets and from documents, I will insert "datasets" in the item belonging to the first group, and "documents" in the items belonging to the second one. When I request a score, I can specify that I would like to get scores of items belonging to the group "datasets", so I can limit which items I work on and increase performance.

The group in the scoring is not related to the type shown in the portal. Document type is a field of the panosc document model.
BTW, thank you for bringing this to my attention. I will make sure to add the documentation to make it clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants