-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Model support for mzXML, sequences, and method files #664
Comments
Please review new model design and my notes ( |
@fkang-pu - I apologize. What I intended for this issue was just to distill the model changes we all agreed on in the meetings. However, my description in the issue was too broad. I just edited it to apply only to the model changes. Could you make a small design (either in a file or by filling out the issue owner section in the issue above) that contains only the elements of the schema that we came to consensus on in the meetings? This would consist solely of changed tables and fields. And, you can reference your larger scope powerpoint files and all our other files in the issue, e.g. in the comment section where it currently says "TBD". Your documents have been very helpful thus far and should lead to multiple small issues, this being one of them. |
Also, slides 22-26 seem to still include elements that we agreed needed to be removed in the last meeting, so when fleshing out the issue owner section, updates from the last meeting should be included. |
Regarding requirements,
|
I had a discussion with John about model changes yesterday afternoon. John prefers to make minimum changes to MSRun (e.g. link method file directly from ArchFile to MSRun). I like his suggestions. After reviewing the requirements proposed by Lance and Rob's design descriptions, I created a new ER diagram (part 7, slide 27) in uploaded file with prefix 20230329. Hopefully it will help us to speed up discussions this afternoon and come up with a final design. |
First of all - excellent! Thanks for the thorough review! I think this demonstrates the value of documenting and reviewing, not only the requirements, but the design as well. Not everyone is at every meeting and me all have our own interpretations.
You are correct. When I sat down to formalize requirements, my touchstone was what Josh was getting at in our meeting where we presented version 2, and when I typed it up:
...it occurred to me that unanalyzed/unpicked peaks could come from not only the current used mzXML files in any of a number of accucor/isocorr files, it could come from mzXML files that were not included ion the run. And that triggered a number of other thoughts which I alluded to in the review of Lance's milestone doc. You can see my comment by looking at the comment history of the doc, but to make searching for it unnecessary, it says:
It's somewhat of an indirect reference to requirement no. 3, but Lance indicated that we shouldn't have that as a requirement (in the implied context of the milestones), but he did seem open to including finer-grained requirements in the review discussion here, which is why I retained this requirement.
I think that excluding this requirement could bite us later when Josh wants to find the mzXML file referred to in the sequence in the method file that was totally unanalyzed. While I'd agree that it may not affect the model implementation we've discussed - it could, depending on how it's implemented in associated code updates that support the new model. For example, if Josh wants to be able to find these excluded mzXML files, but our implementation requires (to load a sample) that it has an MSRun child and that requires a scan group and a peakgroup to exist, it would mean a refactor. That refactor would be unnecessary if we account for this possibility. What made me think of this was the cold exposure validation. Michael had accidentally included 4 accucor files that he had intended to exclude (so they weren't in the sample sheet). It immediately reminded me, when I wrote these requirements of Josh's guiding feedback that he wanted to be able to obtain unanalyzed data. The mzXML files that went into those accucor files represented (by definition) unanalyzed data. They would show up in the sequence, which they would be able to find by downloading the method file, and go to find the mzXML file in tracebase, and if we don't include them, they won't be there. So including this requirement would prevent us from making a mistake that might make including them more difficult than it would need to be if we'd kept it in mind during this effort.
That is my suggestion, yes, in order to satisfy the ultimate goal that started this refactor: the ability to retrieve unanalyzed data from mzXML files. This requirement seems right in line with that goal.
I would argue that it is in lock step with the goal. Here's my argument:
We could rename mzXML files in TraceBase, but I would say that doing so is unnecessary (c.i.p.: the sha) and disruptive (breaks external data associations, c.i.p. lab notebooks and the originating file system). In fact, I think we have all sorts of integrity issues in TraceBase right now with duplicate samples with different names. In fact, I think we should not require sample names to be unique and instead create sample accession numbers using a study prefix and a number, but that is definitely out of scope here. My point is, we shouldn't be uniquely identifying mzXML files by their non-unique names. |
Closing this issue, and replacing with the mzXML Data Storage milestone. |
FEATURE REQUEST
Inspiration
In our TraceBase presentation to the Rabinowitz lab, Josh said he would like to be able to get from peak data/groups to the original sequence, and expressed a concern that it seems limiting (to accomplish that) since we don't store mzXML files (or their data).
Description
Add models and model fields to support mzXML files and their sequence (from a "method" file).
Alternatives
There has been lots of meeting discussions on this, which many proposals and docs. (Please link all relevant documentation in the comments below, if any are missing).
Dependencies
This issue cannot be started until the completion of the following issue(s):
Comment
TBD
ISSUE OWNER SECTION
Assumptions
None
Requirements
(Guidelines/lesser requirements)...
Limitations
None
Affected Components
A tentative list of anticipated repository items that will be changed, labeled
with "add", "delete", or "change". One item per line. (Mostly, this will be a
list of files.)
Primary Changes
Changes prompted by the primary changes
DESIGN
Interface Change description
No outward changes to the current interface. All of the views and loading interfaces should be the same.
Code Change Description
MSSequence
MSRun change (remove all existing and add...)
MSScanGroup
PeakGroup change (keep fields not mentioned...)
ms_run_id...
I didn't add the file and submission tables/fields - add as you see fit
Make changes to all supporting code to retain current functionality. E.g.:
Tests
The text was updated successfully, but these errors were encountered: