workorder-detail-fix: Enforce order item details for buggy job types #929

flashy-man · 2024-01-06T14:59:55Z

The fix: An 'eventful' handler that checks each job after it's dispatched to a workshop, correcting the details of any job items that have been altered from its parent order's.

The bug: For certain work orders that have been changed via gui/job-details (or other means), jobs created from the order will revert the changes. This is because a job_item's type doesn't exactly match what is expected for the job. Particularly, the "any" type will pedantically only match other "any" types, so for jobs like PrepareMeal, a "plant" or "liquid" type job_item will trigger a revert when it's dispatched to the workshop. ("plant" =/= "any", technically)

With that fixed, it's easier to order lavish meals of meticulous selection, or silk images sewn specifically on hemp ropes.

One possible issue: Eventful frequency is set to max (every tick) for job initiated event, which will affect every other script using it.

Before, I had it set up as a 100-tick periodic script, but there was some hassle with active jobs. Eventful's JOB_INITIATED check needs to run frequently enough to ensure that the job is caught before anyone starts gathering items. The items vector can be cleared if caught in time, just not sure if it's wise. I had to go in and manually un-task the cleared job items, and who knows what else could go wrong. Doing it just every 10 ticks might be fine as long as fastdwarf isnt on.

workorder-detail-fix.lua

TymurGubayev · 2024-01-06T20:28:56Z

Btw, the pre-commit checks are failing because of trailing whitespace, mixed line ending, and something about the end of file.

myk002

I haven't looked at the code yet, but here are some higher-order notes:

script should be in the fix/ subdir: e.g. fix/workorder-details
If this is useful to run for all players all the time, it should be added as an entry in the control panel registry and enabled by default
you'll need to implement the enabled API

…h enable api

myk002 · 2024-01-07T00:58:38Z

Ok, I finally got a chance to look at this properly.

So, for my context, is this a problem that only comes up when using gui/job-details, or can this happen with purely vanilla UI actions? Did you get gui/job-details from #799 ?

Is the code in this PR required for gui/job-details to function correctly? If so, maybe this should be part of that tool and be automatically engaged instead of being a separate "fix' command.

@TymurGubayev what do you think about this?

flashy-man · 2024-01-07T05:24:33Z

Yes the gui/job-details from #799 is the only tool I know to directly edit the order details like this, and this fix is required for the full expected functionality there. Shouldn't affect vanilla users at all.

However gui/job-details might not be the only tool benefiting in the long run. In theory it should also affect work orders imported via the orders plugin, though my testing says that orders currently destroys item details on export. workorder does the same.

As things are now it's only benefiting gui/job-details, but I can see that changing if any tool wants to get a hair more specific with work orders.

TymurGubayev · 2024-01-07T09:49:40Z

The workorder script behaving similarly to the orders plugin is kind of as expected because it's basically just a part of the plugin translated into Lua, plus minor adjustments about exactly how the imported manager orders are handled.

It'd be good to adjust both of them before the job-details PR is merged, otherwise, players are in for nasty surprises.

As for the strategy about when should this fix run: since there are 3 different sources of possible problems (orders plugin, workorder, and job-details scripts), we need to either always have it on; or periodically check the manager orders for problematic things and then turn the main fix on (is there maybe a hook we can use, f.e. when new manager order is created or the manager is about to perform their job?); or we adjust the 3 sources to make them enable this fix in some way when necessary (maybe set and store a flag in save data).

flashy-man · 2024-01-07T10:06:34Z

periodically check the manager orders for problematic things and then turn the main fix on

I like this one. Then it's not running a bunch of handlers when it doesn't need to, and no other scripts need to know about it*. In lieu of some appropriate hook, I could change the PR to be a periodic script like:

Load save, no handlers on yet
Check periodically for a funky order in orders list
Fix handler is enabled for the rest of the session if so (we don't need to check for funky orders anymore)

And then make that make sense with enabled API

*It's kind of a tough call on whether even the order checker should be on by default. Realistically only those who tamper with work orders will need to worry about this bug. Since only a few scripts mess with work orders right now, maybe it's best to let those scripts turn it on.

I'd maybe want to check on init when the fort is loaded. That way it could catch orders that have been modified previously without having to rely on save data. I'm fairly sure I can avoid any false positives and activating unnecessarily. But either way

flashy-man · 2024-01-09T13:42:42Z

As outlined, now it checks the order list first before arming the handler. Still to do:

Do some structured testing, auto-generate lots of orders, run checker every tick, etc, to weed out any missed edge cases or errors.
Think about checking the orders list once on fort init, even if we decide the fix shouldn't be enabled by default. It should be trivial (just add it as a recommended/default init somewhere in dfhack config.) It's just a matter of whether you guys think it's warranted. I've been annoyed by this bug for a while, so of course I think it is.
Public "check order list once" module function so other tools can check explicitly after editing/importing orders.
This could be a full daemon that auto-disarms the handlers when not needed. But with no way (yet) to undo the side-effects to the eventful handler frequency, it wouldn't make me feel much better.

Some semi-relevant ramblings i took out of my comments: (click me)

Only SewImage orders have the item-to-improve in the first slot of their items list. SewImage also isn't affected by Bug#10092 Whereas all other improvement jobs are. Coincidence? Perhaps not.

I looked deep into that particular bug and got stuck looking for the source of "some stack variable" that appeared to come in way before the manager order checker process touched it. Would make a lot of sense if it's related to the index of the improved item. The other thing is that the bugged improvement jobs' `item_type` gets mangled to a large 2-byte number, and neither of the bytes are familiar to me.

One thing that worries me about these bugs is, these checks could just be redundant and safe, or they could be guarding an actual issue with how the item details get filled. Limiting the scope of my fixes to just a few job types makes me feel better about it.

I do have a half-fix for 10092, however it isn't able to complete periodic orders yet. If there's not an "issue" for it yet I may raise one (along with others that came up in this PR)

ab9rf · 2024-01-09T18:28:50Z

i'm not thrilled with the idea of a script traversing the job list with each tick, might want to do some load testing to see what the performance impact is

myk002 · 2024-01-09T19:00:11Z

i'm not thrilled with the idea of a script traversing the job list with each tick, might want to do some load testing to see what the performance impact is

at least the iteration is done in cpp and not here. On the other hand, this will be a 5x increase over prioritize, which sets the period to 5 ticks for JOB_INITIATED.

I want to think about this a little more to see if there is a way to solve the problems in the source tools instead of as a periodic fixup. Alternately, is there behavior that could be merged into vanilla when creating the jobs? If we can identify legitimately "wrong" behavior, Putnam might make a vanilla fix.

TymurGubayev · 2024-01-09T20:33:02Z

If we can identify legitimately "wrong" behavior, Putnam might make a vanilla fix.

well, some job types are spawned with adjusted details, while others aren't, so IMHO it is a bug, even though it's not possible in vanilla.

<deleted, I'm stupid>

TymurGubayev · 2024-01-09T20:49:45Z

I think there are 3 possibilities

this gets recognized as a bug and is fixed in vanilla;
this fix is run every tick - since it only affects new jobs, maybe it's not that bad?
this fix is run less frequently, in which case it needs to deal with started item gathering, which gets us back to

The items vector can be cleared if caught in time, just not sure if it's wise. I had to go in and manually un-task the cleared job items, and who knows what else could go wrong.

I think the manual part here can be automated, and then maybe it'll just work?

Another possibility to solve the cleaning up is to cancel all related jobs, including the main one, and create a new one instead with the fix already applied.

flashy-man · 2024-01-09T23:33:35Z

I saw the word 'hook' near onJobInitiated event in eventful.cpp, so i hoped that it traverses a list of events collected via hooks, rather than traversing the whole job list. The only performance check i did was handler CPU time, which i can scarcely measure. (Better performance testing is on my list, also trying to understand eventful.cpp)

Also a period of 5-10 ticks could be just fine for this fix to be ~99% effective. I'm probably thinking too much of the odd dwarf who happens to be near the workshop at just the wrong time. He'd have to pick up the new job, start working on it, and get to the stage where the actual items are designated from job.job_items, all within X ticks. Worst case scenario here is that the wrong items are gathered, because someone probably had fastdwarf on, and the script has to have some spaghetti code to fix it. (or just leave it alone. We tried eh?)

Re: Editing the affected source tools instead. I've looked hard for ways to beat this bug at the source on order creation. I couldn't find any way to make the item types pass properly from the order, other than the binpatching I was originally doing. There's an instruction in the order-job-dispatch process that really does not want to let any item type pass other than -1 for the 'bugged' jobs. If it would consider job_item.vector_id like dfhack.job.isSuitableItem does, and also consider that the "any" type should match any type, it would probably be fine.

As far as asking for a vanilla fix: @TymurGubayev said it, it's basically almost a vanilla bug. Maybe if we ask really nicely. We just want to make our nobles happy right?

so, TODO: Test for handler triggering too late at 5-10 ticks. Get a nice ~~lua~~ performance test. (Maybe) workshop persuasive rant about how players need highly specific work orders

ab9rf · 2024-01-10T07:34:20Z

I saw the word 'hook' near onJobInitiated event in eventful.cpp, so i hoped that it traverses a list of events collected via hooks, rather than traversing the whole job list.

unfortunately the only way we have to detect new jobs is... to traverse the job list and look for any jobs that weren't on it the last time we traversed it. (see https://github.com/DFHack/dfhack/blob/180fd34a96e7ef2bbdb77c2a172ba43ade182464/library/modules/EventManager.cpp#L435C1-L463) so subscribing the "job initialized" hook forces the core event system to do a job list traversal at the frequency requested.

flashy-man · 2024-01-10T15:17:26Z

Hmmm. Triggering on every job init isn't ideal for this anyway. It'd be nice if there was an event hook on the order-job-dispatch process, to only catch those jobs created by the manager. But let's not go there ...

I'm going to look into this idea from @TymurGubayev:

(is there maybe a hook we can use, f.e. when new manager order is created or the manager is about to perform their job?)

Only thing: The job-order-dispatch process has nothing to do with manager's activities. It seems to just happen every 1650 ticks exactly ^{(i totally knew this and i didnt write it down :')*}

df.global.cur_year_tick / dfhack.getTickCount()
order dispatch tick: 106530, 621526015
order dispatch tick: 108180, 621542515 +(1650, 16500)
order dispatch tick: 109830, 621559015 +(1650, 16500)
order dispatch tick: 111480, 621575515 +(1650, 16500)

Instead of worrying about events on every job, it should be fine to scan the job list then (even in Lua, every 1650 ticks is not bad right?). I think the schedule formula is (cur_year_tick + 720) % 1650 == 0, but i'll have to test it on a longer game/more saves

~~If we go this route,~~ workorder-job-repeater ~~may have to be changed (since i think the point is to bypass this dispatch event)~~ nvm it just copies the old job... but maybe it shouldnt in case the order has been changed? graagh

Maybe we should ask/see if there are any planned vanilla updates to this system any time soon

flashy-man · 2024-01-13T23:23:28Z

Got the manager schedule (thanks to whoever found manager_timer).
It was really A) manager_timer == 0 and tick % 150 == 30. Timer decrements on the latter condition, starting at 10 until it reaches 0 (then it dispatches the jobs, and resets.) Ends up so it goes 11 times between dispatches. (11 * 150 = 1650)

Now the fix code is different. Still needs more testing, but I went through some different saves to make sure i could sync with all the manager schedules. (it even works on 47.05). Here is the flow:

Enable: determine next tick of manager job dispatch & schedule the handler for that tick
```
next_dispatch = tick + (-(tick - 30) % 150) + manager_timer * 150
```
Handler: runs every 150 ticks. Only on conditions A)***, B) manager exists and C) bugged orders exist, will it check the job list for new jobs, and correct them on the same tick they were dispatched. So order-traversing once every 1650 ticks, and only job-traversing then if we find a bugged order.

*** A) is at the top. Also, the script runs after the manager process, so the condition is slightly different. Not sure about sub-tick timing yet but it seems Lua scripts always run after game code

Very small fix for the active job issue. It still happens on that first run.
(i didnt test this at all aside from having done it several times in gm-editor)

-- experimental: remove the items and un-task them
--while job.items[0] do -- bad idea
while #job.items ~= 0 do
    job.items[0].item.flags.in_job = false
    job.items:erase(0)
end

Still a periodic script technically, but it's better. Since the main concern is doing less to fix it, i'll make an overdue write-up of the bug with some snapshots of the code. Could be something I missed, but I couldn't find any way to bypass that filtering process in the job dispatch.

binpatching was best for my personal use, but with all these OS and steam versions and blah blah, probably not the way to go.

(commits incoming)

myk002 · 2024-01-17T09:04:28Z

-- experimental: remove the items and un-task them
--while job.items[0] do -- bad idea
while #job.items ~= 0 do
    job.items[0].item.flags.in_job = false
    job.items:erase(0)
end

In cpp we have Job::disconnectJobItem(), which is more thorough than this code (e.g. it removes specific refs on the item). it's not exposed to Lua yet, but it could be exposed and used by this script, if needed.

internal/control-panel/registry.lua

docs/fix/workorder-detail-fix.rst

fix/workorder-detail-fix.lua

flashy-man · 2024-01-18T12:36:50Z

Woohoo review-- Main things:

New name: fix/workorder-details
Now re-syncing when dispatch tick is lost (only happens using timestream that I have seen)
No more trying to fix already-gathered items. It was crashing either due to the vector checking while loop, or not clearing item refs.

add workorder-detail-fix

c1960e0

TymurGubayev reviewed Jan 6, 2024

View reviewed changes

workorder-detail-fix.lua Outdated Show resolved Hide resolved

flashy-man added 5 commits January 6, 2024 08:31

better way to get the job type enums

0ac190c

print valid commands when no arg is passed

152fe5b

print valid cmds when no arg is passed

4e42bd8

fix bad commit

323121d

delete old job name -> id function

08a56b0

flashy-man added 2 commits January 6, 2024 12:48

workorder-detail-fix: use local variables

b4b2518

fix trailing whitespace; doc underline

9635fab

myk002 reviewed Jan 6, 2024

View reviewed changes

flashy-man added 7 commits January 6, 2024 13:20

move workorder-detail-fix to fix directory, add enable API

5c643b7

no arg parsing when being used as a module (for enable api)

69a5e1f

actually set enable state

e52adbe

trailing whitespace again

b30668f

hopefully fix line endings (sublime text calls them 'unix' endings)

fae0027

hopefully fix EOF error (just needed newline at end of file)

45f87de

only print when called from command line. also fix to work better wit…

47761b6

…h enable api

move doc to docs/fix

e1f170e

flashy-man added 4 commits January 7, 2024 02:37

workorder-detail-fix -> fix/workorder-detail-fix

ea76f59

add detail_fix_is_needed() to say when the fix is required

727200b

rework: only enable handler when detail_fix_is_needed

39a4db4

better status() function to account for more statuses

a4ce456

woowee doc underline

8ad4f77

flashy-man added 6 commits January 14, 2024 09:44

rename workorder-detail-fix in changelog

a5c2cc9

no more eventful; only repeatutil

4e950dd

merged changelog

a39ec2e

fix redundancy, confusing modulo math

d6b0775

line endings

f6275f0

copy flags whole instead of one-by-one

384625d

myk002 reviewed Jan 17, 2024

View reviewed changes

flashy-man added 11 commits January 17, 2024 18:06

alphebetize entry in control panel registry

3cff781

update docs: new name/usage

b0146e7

fix problems with status()/jobs_corrected

f504edc

fix redundant repeatutil usage

027f410

print jobs_corrected only when enabled

b7c0ce8

remove active job fix (unstable)

4ca5a42

use ipairs, utils.invert

f961c8f

better way to get manager; move local declarations

fa308ab

automatic re-sync if lost

e584bba

documented active job limitation

fcc8546

delimit block so goto can jump over local scope

fea88ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workorder-detail-fix: Enforce order item details for buggy job types #929

workorder-detail-fix: Enforce order item details for buggy job types #929

flashy-man commented Jan 6, 2024

TymurGubayev commented Jan 6, 2024

myk002 left a comment

myk002 commented Jan 7, 2024

flashy-man commented Jan 7, 2024 •

edited

TymurGubayev commented Jan 7, 2024

flashy-man commented Jan 7, 2024 •

edited

flashy-man commented Jan 9, 2024 •

edited

ab9rf commented Jan 9, 2024

myk002 commented Jan 9, 2024

TymurGubayev commented Jan 9, 2024 •

edited

TymurGubayev commented Jan 9, 2024

flashy-man commented Jan 9, 2024 •

edited

ab9rf commented Jan 10, 2024

flashy-man commented Jan 10, 2024 •

edited

flashy-man commented Jan 13, 2024 •

edited

myk002 commented Jan 17, 2024

flashy-man commented Jan 18, 2024

workorder-detail-fix: Enforce order item details for buggy job types #929

Are you sure you want to change the base?

workorder-detail-fix: Enforce order item details for buggy job types #929

Conversation

flashy-man commented Jan 6, 2024

TymurGubayev commented Jan 6, 2024

myk002 left a comment

Choose a reason for hiding this comment

myk002 commented Jan 7, 2024

flashy-man commented Jan 7, 2024 • edited

TymurGubayev commented Jan 7, 2024

flashy-man commented Jan 7, 2024 • edited

flashy-man commented Jan 9, 2024 • edited

ab9rf commented Jan 9, 2024

myk002 commented Jan 9, 2024

TymurGubayev commented Jan 9, 2024 • edited

TymurGubayev commented Jan 9, 2024

flashy-man commented Jan 9, 2024 • edited

ab9rf commented Jan 10, 2024

flashy-man commented Jan 10, 2024 • edited

flashy-man commented Jan 13, 2024 • edited

myk002 commented Jan 17, 2024

flashy-man commented Jan 18, 2024

flashy-man commented Jan 7, 2024 •

edited

flashy-man commented Jan 7, 2024 •

edited

flashy-man commented Jan 9, 2024 •

edited

TymurGubayev commented Jan 9, 2024 •

edited

flashy-man commented Jan 9, 2024 •

edited

flashy-man commented Jan 10, 2024 •

edited

flashy-man commented Jan 13, 2024 •

edited