Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI-generated written text content #800

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft

AI-generated written text content #800

wants to merge 12 commits into from

Conversation

gistya
Copy link

@gistya gistya commented Aug 13, 2023

Description

This PR adds a module gpt with an overlay and resizable window, enabling AI-generated written content based on in-game descriptions of written content. gpt.lua sends written descriptions to a python helper script that runs in the background, gptserver.py, which in turn handles secure communication with OpenAI's servers.

Security

While the gpt.lua dfhack script communicates with the helper script using an unencrypted localhost TCP connection, no sensitive information is ever passed between dfhack and the python helper, which will ignore any connections from a hosts other than localhost (127.0.0.1).

All communications between gptserver.py and OpenAI's cloud are performed via OpenAI's official python module, which in turn handles all the nitty-gritty of requests. Their module employs up-to-date HTTPS security.

Your personal OpenAI key will be stored on disk in a file named oaak.txt. This file is not encrypted but it's only read by the python script (not by DFHack). Use common sense permissions for this file like you would for an SSH private key.

OpenAI also allows you to set a hard limit on monthly costs, so that if this script goes in an infinite loop or somehow your API key leaks out, you won't be on the hook for thousands of dollars. Also, you can regenerate the API key at anytime on OpenAI's website and make a new key, which is recommended to be done quite frequently regardless of what security you're using.

Should these measures be seen as insufficient, we could investigate platform-specific system keychain storage for the API key, however this might make the python script less portable across platforms. Please comment below on your thoughts on this.

Costs

Use of this script requires an OpenAI API account with a positive balance. You can get $5 free API credit that expires in 90 days for signing up, which should be good for 50 to 500 requests depending on the model used. They charge by the token (appx. 3/4 of a word), and the cost varies widely depending on which model is used.

The default model selection is the fast, cheap one. You may elect to go with GPT-3.5 or 4.0 by launching gptserver.py with command-line options -gpt3 or -gpt4, but be ready for slower response times and higher per-token fees. Overall, for typical use of a DF player, one would not expect the costs to exceed $5-10/mo. even for heavy use, and as we move into the future, these kinds of services will likely gain a free tier for non-commercial light use (similar to the public ChatGPT service).

Remaining work to be done:

Tests

At this point the PR contains no automated tests, hence why it's being posted as a draft PR.

I would like to ask the preliminary reviewers for advice on testing strategy as I have not yet looked into how testing of these scripts is done.

Integration testing

My initial thoughts would be to use the test harness (?) to boot the gptserver.py in a "mock server" mode where it will give canned responses for canned requests. We would want to add a Pipfile.lock (if there's not already one) to ensure the sands of python dependencies don't shift out from under the test runners (might also come in handy for end users so we don't have to list the dependencies in the docs; that's not very maintainable and I felt gross doing it).

This setup would allow us to run tests that validate the integration between the lua and python scripts hasn't been broken and error states in the python script get handled properly (right now the UI doesn't reflect a death of the python script appropriately, which is a TODO before this merges).

End-to-end testing

If we know the knowledge descriptions of some Urist McTestbuddy in the test world, and we can macro the right screens being opened, then we could setup something like an end-to-end test with the only mocked thing being OpenAI's service.

Unit testing

Since the script does sanitization on response strings to strip characters that DF can't render properly, and since it does heuristics to determine which type of written content a given description is of, it would be good to have some unit tests around these functions. I've tried to make those as functional as possible to facilitate testing.

Since the script is stateful in its present form, I would also like to add tests to validate whether the state transitions are handled properly and in a robust manner. There should be no combination of states that the script cannot recover gracefully from. Ideally it could be made completely stateless except for a single state store, but I haven't gotten that far yet. (Redux FTW)

Future directions

  • addition of a configuration UI panel within DFHack (or additional command-line options for the python script) to allow configuration of various API parameters to be used for the OpenAI requests
  • if/when dfhack integrates luasocket, then avoid using python altogether

I'm a bit leery of exposing implementation details specific to OpenAI within the DFHack UI, because I am guessing that a decent number of folks will want to rewrite the python script to talk to a different LLM which would likely have its own, totally different API. Having to then also update a bunch of UI code could be annoying.

@gistya
Copy link
Author

gistya commented Aug 13, 2023

pre-commit.ci autofix

docs/gpt.rst Outdated
@@ -0,0 +1,67 @@
gpt
=======
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
=======
===

docs/gpt.rst Outdated
=======

.. dfhack-tool::
:summary: AI-generated written content!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the exclamation point, but summaries have to end in a period to match the parser : P

How about "Generate written content with AI."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will fix thanks.

docs/gpt.rst Outdated Show resolved Hide resolved
docs/gpt.rst Outdated
Comment on lines 13 to 19
``enable gpt``
=======
Enables the plugin. The overlay will be shown when a knowledge item or unit view sheet is open.

``disable gpt``
=======
Disables the plugin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually isn't necessary. players can turn the overlay on and off in gui/control-panel.

docs/gpt.rst Outdated Show resolved Hide resolved
docs/gpt.rst Outdated

Setup:

1. Register for an OpenAI API account. It must be a paid or active trial account.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a url would be useful here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

docs/gpt.rst Outdated

1. Register for an OpenAI API account. It must be a paid or active trial account.
2. Generate an API token for your account.
3. Save your OpenAI API token to a file at the root of your DF directory, `oaak.txt`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try not to put any user data in the DF root directory. I suggest putting it in dfhack-config/.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense

docs/gpt.rst Outdated
3. Save your OpenAI API token to a file at the root of your DF directory, `oaak.txt`.
4. Install python. We used version 3.11 installed from the Microsoft Store.
5. Install python dependencies Flask and OpenAI: `pip install Flask` and `pip install OpenAI`.
6. Start the local helper python app: cd into dfhack/scripts directory & run `python gptserver.py`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options here. if we deploy this with DFHack, then this python script should probably go in the DFHack/dfhack repo and get installed somewhere more appropriate (like maybe the DF root dir)

if this script gets distributed as a mod (see https://docs.dfhack.org/en/stable/docs/guides/modding-guide.html#the-structure-of-a-mod), then the python code can get included in the mod files

5. Install python dependencies Flask and OpenAI: `pip install Flask` and `pip install OpenAI`.
6. Start the local helper python app: cd into dfhack/scripts directory & run `python gptserver.py`.

Once the python helper is running, you may now enable and use the gpt plugin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest having the overlay display a message like "cannot contact server -- do you have the Python helper running?" instead of requiring the player to remember to start the server first

Copy link
Author

@gistya gistya Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overlay doesn't try to contact the server. It waits until you press the button and then, when there is something to submit, then it tries submitting. At that point, if it can't reach the server, the UI displays a message that says what you suggest (more or less).

The UI becomes fairly unresponsive though, since if the server can't be reached, the exiting luasocket library throws a C++ exception, and Lua's lack of modern error handling means we have to resort to pcall, which seems to carry a nasty performance penalty, perhaps exacerbated by how often updateLayout is being called. Still looking into this one, but it seems to have a workaround at least, and isn't expected to happen very often.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pcall shouldn't really be introducing much of a penalty, although errors themselves might... how often are you calling it?

docs/gpt.rst Outdated
Comment on lines 35 to 37
Tweaking additional OpenAI API parameters will require modifying `gptserver.py` to suit
your particular desires, until such time as someone may have added additional
configuration options in a future update to DFHack :D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be fairly trivial to have a configuration screen that can open from the overlay and send options data to the python server

gpt.lua Outdated
--@ module = true

local json = require('json')
local dfhack = require('dfhack')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be necessary -- dfhack is already in the default environment

Comment on lines +23 to +28
local function string_from_Status(status)
if status == Status.start then return "start" end
if status == Status.waiting then return "waiting" end
if status == Status.receiving then return "receiving" end
if status == Status.done then return "done" end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider utils.invert(). e.g.

local Status = {
...
}
local Status_rev = utils.invert(Status)

local function string_from_Status(status)
    return Status_rev[status]
end

Copy link
Author

@gistya gistya Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will update it tomorrow. BTW this is one thing that Swift really got perfect, how they handle enums with associated values is pure genius. I've been experimenting with Swift for Windows. They recently added some very powerful C++ interoperability features where you can basically just start writing Swift and it compiles and runs as if you wrote C++, or something. Curious.

--

-- Whether or not to print debug outpuut to the console.
local is_debug_output_enabled = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes up so often... we really need a centralized debugfilter-integrated Lua logging system. I'm not saying you have to write it. I'm just kvetching.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amen 🗡️

local is_debug_output_enabled = false

-- Port on which to communicate with the python helper.
local port = 5001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably get all this from the config file -- dfhack-config/gpt.json -- so players have a way to make changes (preferably through an in-game GUI)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constant (not meant to be changed).

local timeout = client_timeout_secs + client_timeout_msecs/1000

-- Number of onRenderFrame events to wait before polling again.
local polling_interval = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the time between graphical frames can vary significantly. polling_interval_ms would be better; we have a timer API to get ms -- example: https://github.com/DFHack/dfhack/blob/develop/plugins/lua/overlay.lua#L432-L444

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will look into it thanks

prompt_value = 'Return a response stating simply, "There has been an error."'
else
debug_log('Creating prompt for non-poem/non-star-chart/non-unsupported content_type: ' .. content_type)
prompt_value = 'In between the four carrots is a description of a written ' .. content_type .. ': ^^^^' .. knowledge_description .. '^^^^. \n\n' .. excerpts_prompt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carrots -> carets

Comment on lines +466 to +470
if keys._MOUSE_R_DOWN or keys.LEAVESCREEN then
if view then
view:dismiss()
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need to close the window if the knowledge description screen is closed? why not just leave it open so the player can appreciate it? they can always close it with a right click on the window at any time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is, right click usually doesn't close the knowledge description window, it just backs you to the list of known written content items, at which point we want the window closed so the user can open another knowledge item without necessarily generating it as soon as it's opened (which is what would happen if the window has remained opened). If someone is trying to economize which things they generate, that would be annoying.

The good news is that if you click back into the same knowledge item you just left, the text is all still there, and you can always alt-tab to the python cmd console to see the output there also in case you forgot to screenshot a good one.

I would suggest running the plugin and playing around with it, and then if you think I've misread the situation please let me know and I'll make the adjustment, or maybe we think of a third solution that's better than either of these options.

GPTBannerOverlay.ATTRS{
default_pos={x=-35,y=-2},
default_enabled=true,
viewscreens={'dwarfmode/ViewSheets/UNIT','dwarfmode/ViewSheets/ITEM'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems too broad -- do we need more fine-grained focus strings here? it's not hard to add more context in modules/Gui.cpp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just using what was available. If you give me one for the UNIT/Skills/Knowledge I'll gladly use it

Copy link
Member

@myk002 myk002 Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let me see if I can give you something better here.

ref: DFHack/dfhack#3675

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we go:
image
and with a knowledge item selected:
image

end
end

self.subviews.label:updateLayout()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can get expensive. better to only update the layout when the text actually changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is what's making the pcall() lock up the UI updating? I'll play around with it tomorrow.

if status == Status.done then return "done" end
end

local Content_Type = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels..odd, but I need to read through the code a few more times before I feel comfortable suggesting changes

@myk002
Copy link
Member

myk002 commented Aug 14, 2023

would it make sense to make the generated text persistent per world? then the same knowledge item would always get the same generated text per world (and it would cut down on API costs if the same item were viewed multiple times)

@myk002
Copy link
Member

myk002 commented Aug 14, 2023

I would like to ask the preliminary reviewers for advice on testing strategy as I have not yet looked into how testing of these scripts is done.

we have facilities for mocking out pretty much anything. You can even mock out part of the DF game state so the script can read test data from the df object (see https://github.com/DFHack/scripts/blob/master/test/deteriorate.lua)

another good unit test example: https://github.com/DFHack/scripts/blob/master/test/prioritize.lua

Co-authored-by: Myk <myk.taylor@gmail.com>
Copy link
Author

@gistya gistya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responding to the comments from myk002

--

-- Whether or not to print debug outpuut to the console.
local is_debug_output_enabled = false
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amen 🗡️

local is_debug_output_enabled = false

-- Port on which to communicate with the python helper.
local port = 5001
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constant (not meant to be changed).

local timeout = client_timeout_secs + client_timeout_msecs/1000

-- Number of onRenderFrame events to wait before polling again.
local polling_interval = 10
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will look into it thanks

Comment on lines +23 to +28
local function string_from_Status(status)
if status == Status.start then return "start" end
if status == Status.waiting then return "waiting" end
if status == Status.receiving then return "receiving" end
if status == Status.done then return "done" end
end
Copy link
Author

@gistya gistya Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK will update it tomorrow. BTW this is one thing that Swift really got perfect, how they handle enums with associated values is pure genius. I've been experimenting with Swift for Windows. They recently added some very powerful C++ interoperability features where you can basically just start writing Swift and it compiles and runs as if you wrote C++, or something. Curious.

GPTBannerOverlay.ATTRS{
default_pos={x=-35,y=-2},
default_enabled=true,
viewscreens={'dwarfmode/ViewSheets/UNIT','dwarfmode/ViewSheets/ITEM'},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just using what was available. If you give me one for the UNIT/Skills/Knowledge I'll gladly use it

Comment on lines +466 to +470
if keys._MOUSE_R_DOWN or keys.LEAVESCREEN then
if view then
view:dismiss()
end
end
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is, right click usually doesn't close the knowledge description window, it just backs you to the list of known written content items, at which point we want the window closed so the user can open another knowledge item without necessarily generating it as soon as it's opened (which is what would happen if the window has remained opened). If someone is trying to economize which things they generate, that would be annoying.

The good news is that if you click back into the same knowledge item you just left, the text is all still there, and you can always alt-tab to the python cmd console to see the output there also in case you forgot to screenshot a good one.

I would suggest running the plugin and playing around with it, and then if you think I've misread the situation please let me know and I'll make the adjustment, or maybe we think of a third solution that's better than either of these options.

end
end

self.subviews.label:updateLayout()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is what's making the pcall() lock up the UI updating? I'll play around with it tomorrow.

gpt.lua Outdated
--@ module = true

local json = require('json')
local dfhack = require('dfhack')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local dfhack = require('dfhack')

@gistya
Copy link
Author

gistya commented Aug 14, 2023

would it make sense to make the generated text persistent per world? then the same knowledge item would always get the same generated text per world (and it would cut down on API costs if the same item were viewed multiple times)

Right now it's a bit of a compromise. It caches the last-received item for the session in memory, so that if you were to re-open the same one again many times in a row, it will show you the cached response.

However as soon as you look at something else, now that occupies the cache, so that if you go back to the first one, it will regenerate fresh.

This can be nice, since it saves you from regenerating an entry if you just accidentally closed it and then reopened it. But it also lets you regenerate it purposefully in case the first version was lame, by looking at something else, and then coming back to the first thing.

That could be a good thing, if you're using the default cheap/fast engine, which is more of a quantity over quality compromise. In this case you'll encounter more times where you want to regenerate the same prompt.

However, if you're running GPT4, which costs about 5x more but is also likely to produce a great result the first time every time, then you'll probably not want to regenerate the same one twice very often. In the current situation, the onus is on the user to simply not click on something again that they don't want to regenerate; if they forgot it was generated before, that's on them.

Personally (and I'm not young) I have found it pretty easy to remember which ones I already generated, and I just generate one when I see something interesting that I'm fairly certain I haven't looked at before. If I'm confused, I can refer to my screenshots.

That being said, I can imagine every user will have a different set of expectations and preferences for how that ought to work, so in an ideal world it's configurable.

So, if we were to consider adding support for caching every response and persisting the responses to disk, what kind of database store do we have in the project's dependencies, capable handle this in a memory-efficient manner?

@myk002
Copy link
Member

myk002 commented Aug 17, 2023

So, if we were to consider adding support for caching every response and persisting the responses to disk, what kind of database store do we have in the project's dependencies, capable handle this in a memory-efficient manner?

how much text are we talking here? The excerpts I've seen are pretty small, and putting a few MB in JSON isn't out of the question. At runtime you'd cache in a table in Lua in memory. What are the upper bounds of the memory complexity requirements? Note that we do already link to zlib, so compression is an option if that becomes necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants