AI-generated written text content #800

gistya · 2023-08-13T20:03:48Z

Description

This PR adds a module gpt with an overlay and resizable window, enabling AI-generated written content based on in-game descriptions of written content. gpt.lua sends written descriptions to a python helper script that runs in the background, gptserver.py, which in turn handles secure communication with OpenAI's servers.

Security

While the gpt.lua dfhack script communicates with the helper script using an unencrypted localhost TCP connection, no sensitive information is ever passed between dfhack and the python helper, which will ignore any connections from a hosts other than localhost (127.0.0.1).

All communications between gptserver.py and OpenAI's cloud are performed via OpenAI's official python module, which in turn handles all the nitty-gritty of requests. Their module employs up-to-date HTTPS security.

Your personal OpenAI key will be stored on disk in a file named oaak.txt. This file is not encrypted but it's only read by the python script (not by DFHack). Use common sense permissions for this file like you would for an SSH private key.

OpenAI also allows you to set a hard limit on monthly costs, so that if this script goes in an infinite loop or somehow your API key leaks out, you won't be on the hook for thousands of dollars. Also, you can regenerate the API key at anytime on OpenAI's website and make a new key, which is recommended to be done quite frequently regardless of what security you're using.

Should these measures be seen as insufficient, we could investigate platform-specific system keychain storage for the API key, however this might make the python script less portable across platforms. Please comment below on your thoughts on this.

Costs

Use of this script requires an OpenAI API account with a positive balance. You can get $5 free API credit that expires in 90 days for signing up, which should be good for 50 to 500 requests depending on the model used. They charge by the token (appx. 3/4 of a word), and the cost varies widely depending on which model is used.

The default model selection is the fast, cheap one. You may elect to go with GPT-3.5 or 4.0 by launching gptserver.py with command-line options -gpt3 or -gpt4, but be ready for slower response times and higher per-token fees. Overall, for typical use of a DF player, one would not expect the costs to exceed $5-10/mo. even for heavy use, and as we move into the future, these kinds of services will likely gain a free tier for non-commercial light use (similar to the public ChatGPT service).

Remaining work to be done:

Tests

At this point the PR contains no automated tests, hence why it's being posted as a draft PR.

I would like to ask the preliminary reviewers for advice on testing strategy as I have not yet looked into how testing of these scripts is done.

Integration testing

My initial thoughts would be to use the test harness (?) to boot the gptserver.py in a "mock server" mode where it will give canned responses for canned requests. We would want to add a Pipfile.lock (if there's not already one) to ensure the sands of python dependencies don't shift out from under the test runners (might also come in handy for end users so we don't have to list the dependencies in the docs; that's not very maintainable and I felt gross doing it).

This setup would allow us to run tests that validate the integration between the lua and python scripts hasn't been broken and error states in the python script get handled properly (right now the UI doesn't reflect a death of the python script appropriately, which is a TODO before this merges).

End-to-end testing

If we know the knowledge descriptions of some Urist McTestbuddy in the test world, and we can macro the right screens being opened, then we could setup something like an end-to-end test with the only mocked thing being OpenAI's service.

Unit testing

Since the script does sanitization on response strings to strip characters that DF can't render properly, and since it does heuristics to determine which type of written content a given description is of, it would be good to have some unit tests around these functions. I've tried to make those as functional as possible to facilitate testing.

Since the script is stateful in its present form, I would also like to add tests to validate whether the state transitions are handled properly and in a robust manner. There should be no combination of states that the script cannot recover gracefully from. Ideally it could be made completely stateless except for a single state store, but I haven't gotten that far yet. (Redux FTW)

Future directions

addition of a configuration UI panel within DFHack (or additional command-line options for the python script) to allow configuration of various API parameters to be used for the OpenAI requests
if/when dfhack integrates luasocket, then avoid using python altogether

I'm a bit leery of exposing implementation details specific to OpenAI within the DFHack UI, because I am guessing that a decent number of folks will want to rewrite the python script to talk to a different LLM which would likely have its own, totally different API. Having to then also update a bunch of UI code could be annoying.

…ce via OpenAI API

gistya · 2023-08-13T21:11:12Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

myk002 · 2023-08-14T02:48:40Z

docs/gpt.rst

@@ -0,0 +1,67 @@
+gpt
+=======


Suggested change

=======

===

myk002 · 2023-08-14T02:53:54Z

docs/gpt.rst

+=======
+
+.. dfhack-tool::
+    :summary: AI-generated written content!


I can understand the exclamation point, but summaries have to end in a period to match the parser : P

How about "Generate written content with AI."

OK will fix thanks.

docs/gpt.rst

myk002 · 2023-08-14T03:03:57Z

docs/gpt.rst

+``enable gpt``
+=======
+Enables the plugin. The overlay will be shown when a knowledge item or unit view sheet is open.
+
+``disable gpt``
+=======
+Disables the plugin.


this actually isn't necessary. players can turn the overlay on and off in gui/control-panel.

docs/gpt.rst

myk002 · 2023-08-14T03:05:09Z

docs/gpt.rst

+
+Setup:
+
+1. Register for an OpenAI API account. It must be a paid or active trial account.


a url would be useful here

myk002 · 2023-08-14T03:07:19Z

docs/gpt.rst

+
+1. Register for an OpenAI API account. It must be a paid or active trial account.
+2. Generate an API token for your account.
+3. Save your OpenAI API token to a file at the root of your DF directory, `oaak.txt`.


we try not to put any user data in the DF root directory. I suggest putting it in dfhack-config/.

OK, makes sense

myk002 · 2023-08-14T03:12:15Z

docs/gpt.rst

+3. Save your OpenAI API token to a file at the root of your DF directory, `oaak.txt`.
+4. Install python. We used version 3.11 installed from the Microsoft Store.
+5. Install python dependencies Flask and OpenAI: `pip install Flask` and `pip install OpenAI`.
+6. Start the local helper python app: cd into dfhack/scripts directory & run `python gptserver.py`.


I see two options here. if we deploy this with DFHack, then this python script should probably go in the DFHack/dfhack repo and get installed somewhere more appropriate (like maybe the DF root dir)

if this script gets distributed as a mod (see https://docs.dfhack.org/en/stable/docs/guides/modding-guide.html#the-structure-of-a-mod), then the python code can get included in the mod files

myk002 · 2023-08-14T03:15:53Z

docs/gpt.rst

+5. Install python dependencies Flask and OpenAI: `pip install Flask` and `pip install OpenAI`.
+6. Start the local helper python app: cd into dfhack/scripts directory & run `python gptserver.py`.
+
+Once the python helper is running, you may now enable and use the gpt plugin.


I'd suggest having the overlay display a message like "cannot contact server -- do you have the Python helper running?" instead of requiring the player to remember to start the server first

The overlay doesn't try to contact the server. It waits until you press the button and then, when there is something to submit, then it tries submitting. At that point, if it can't reach the server, the UI displays a message that says what you suggest (more or less).

The UI becomes fairly unresponsive though, since if the server can't be reached, the exiting luasocket library throws a C++ exception, and Lua's lack of modern error handling means we have to resort to pcall, which seems to carry a nasty performance penalty, perhaps exacerbated by how often updateLayout is being called. Still looking into this one, but it seems to have a workaround at least, and isn't expected to happen very often.

pcall shouldn't really be introducing much of a penalty, although errors themselves might... how often are you calling it?

myk002 · 2023-08-14T03:17:12Z

docs/gpt.rst

+Tweaking additional OpenAI API parameters will require modifying `gptserver.py` to suit
+your particular desires, until such time as someone may have added additional
+configuration options in a future update to DFHack :D


it would be fairly trivial to have a configuration screen that can open from the overlay and send options data to the python server

myk002 · 2023-08-14T03:46:39Z

gpt.lua

+--@ module = true
+
+local json = require('json')
+local dfhack = require('dfhack')


this should not be necessary -- dfhack is already in the default environment

myk002 · 2023-08-14T03:48:57Z

gpt.lua

+local function string_from_Status(status)
+  if status == Status.start then return "start" end
+  if status == Status.waiting then return "waiting" end
+  if status == Status.receiving then return "receiving" end
+  if status == Status.done then return "done" end
+end


consider utils.invert(). e.g.

local Status = { ... } local Status_rev = utils.invert(Status) local function string_from_Status(status) return Status_rev[status] end

OK will update it tomorrow. BTW this is one thing that Swift really got perfect, how they handle enums with associated values is pure genius. I've been experimenting with Swift for Windows. They recently added some very powerful C++ interoperability features where you can basically just start writing Swift and it compiles and runs as if you wrote C++, or something. Curious.

myk002 · 2023-08-14T03:52:38Z

gpt.lua

+--
+
+-- Whether or not to print debug outpuut to the console.
+local is_debug_output_enabled = false


this comes up so often... we really need a centralized debugfilter-integrated Lua logging system. I'm not saying you have to write it. I'm just kvetching.

amen 🗡️

myk002 · 2023-08-14T03:55:16Z

gpt.lua

+local is_debug_output_enabled = false
+
+-- Port on which to communicate with the python helper.
+local port = 5001


should probably get all this from the config file -- dfhack-config/gpt.json -- so players have a way to make changes (preferably through an in-game GUI)

This is a constant (not meant to be changed).

myk002 · 2023-08-14T03:58:07Z

gpt.lua

+local timeout = client_timeout_secs + client_timeout_msecs/1000
+
+-- Number of onRenderFrame events to wait before polling again.
+local polling_interval = 10


the time between graphical frames can vary significantly. polling_interval_ms would be better; we have a timer API to get ms -- example: https://github.com/DFHack/dfhack/blob/develop/plugins/lua/overlay.lua#L432-L444

OK will look into it thanks

myk002 · 2023-08-14T04:09:57Z

gpt.lua

+    prompt_value = 'Return a response stating simply, "There has been an error."'
+  else
+    debug_log('Creating prompt for non-poem/non-star-chart/non-unsupported content_type: ' .. content_type)
+    prompt_value = 'In between the four carrots is a description of a written ' .. content_type .. ': ^^^^' .. knowledge_description .. '^^^^. \n\n' .. excerpts_prompt


carrots -> carets

myk002 · 2023-08-14T04:19:03Z

gpt.lua

+  if keys._MOUSE_R_DOWN or keys.LEAVESCREEN then
+    if view then
+      view:dismiss()
+    end
+  end


do we actually need to close the window if the knowledge description screen is closed? why not just leave it open so the player can appreciate it? they can always close it with a right click on the window at any time.

Thing is, right click usually doesn't close the knowledge description window, it just backs you to the list of known written content items, at which point we want the window closed so the user can open another knowledge item without necessarily generating it as soon as it's opened (which is what would happen if the window has remained opened). If someone is trying to economize which things they generate, that would be annoying.

The good news is that if you click back into the same knowledge item you just left, the text is all still there, and you can always alt-tab to the python cmd console to see the output there also in case you forgot to screenshot a good one.

I would suggest running the plugin and playing around with it, and then if you think I've misread the situation please let me know and I'll make the adjustment, or maybe we think of a third solution that's better than either of these options.

myk002 · 2023-08-14T04:20:31Z

gpt.lua

+GPTBannerOverlay.ATTRS{
+    default_pos={x=-35,y=-2},
+    default_enabled=true,
+    viewscreens={'dwarfmode/ViewSheets/UNIT','dwarfmode/ViewSheets/ITEM'},


this seems too broad -- do we need more fine-grained focus strings here? it's not hard to add more context in modules/Gui.cpp

Just using what was available. If you give me one for the UNIT/Skills/Knowledge I'll gladly use it

yeah, let me see if I can give you something better here.

ref: DFHack/dfhack#3675

here we go:

and with a knowledge item selected:

myk002 · 2023-08-14T04:22:47Z

gpt.lua

+      end
+    end
+
+    self.subviews.label:updateLayout()


this can get expensive. better to only update the layout when the text actually changes

Maybe this is what's making the pcall() lock up the UI updating? I'll play around with it tomorrow.

myk002 · 2023-08-14T04:24:54Z

gpt.lua

+  if status == Status.done then return "done" end
+end
+
+local Content_Type = {


this feels..odd, but I need to read through the code a few more times before I feel comfortable suggesting changes

myk002 · 2023-08-14T04:30:20Z

would it make sense to make the generated text persistent per world? then the same knowledge item would always get the same generated text per world (and it would cut down on API costs if the same item were viewed multiple times)

myk002 · 2023-08-14T04:35:32Z

I would like to ask the preliminary reviewers for advice on testing strategy as I have not yet looked into how testing of these scripts is done.

we have facilities for mocking out pretty much anything. You can even mock out part of the DF game state so the script can read test data from the df object (see https://github.com/DFHack/scripts/blob/master/test/deteriorate.lua)

another good unit test example: https://github.com/DFHack/scripts/blob/master/test/prioritize.lua

Co-authored-by: Myk <myk.taylor@gmail.com>

gistya

responding to the comments from myk002

gistya · 2023-08-14T04:29:41Z

gpt.lua

+--
+
+-- Whether or not to print debug outpuut to the console.
+local is_debug_output_enabled = false


amen 🗡️

gistya · 2023-08-14T04:30:07Z

gpt.lua

+local is_debug_output_enabled = false
+
+-- Port on which to communicate with the python helper.
+local port = 5001


This is a constant (not meant to be changed).

gistya · 2023-08-14T04:30:33Z

gpt.lua

+local timeout = client_timeout_secs + client_timeout_msecs/1000
+
+-- Number of onRenderFrame events to wait before polling again.
+local polling_interval = 10


OK will look into it thanks

gistya · 2023-08-14T04:32:33Z

gpt.lua

+local function string_from_Status(status)
+  if status == Status.start then return "start" end
+  if status == Status.waiting then return "waiting" end
+  if status == Status.receiving then return "receiving" end
+  if status == Status.done then return "done" end
+end


OK will update it tomorrow. BTW this is one thing that Swift really got perfect, how they handle enums with associated values is pure genius. I've been experimenting with Swift for Windows. They recently added some very powerful C++ interoperability features where you can basically just start writing Swift and it compiles and runs as if you wrote C++, or something. Curious.

gistya · 2023-08-14T04:33:52Z

gpt.lua

+GPTBannerOverlay.ATTRS{
+    default_pos={x=-35,y=-2},
+    default_enabled=true,
+    viewscreens={'dwarfmode/ViewSheets/UNIT','dwarfmode/ViewSheets/ITEM'},


Just using what was available. If you give me one for the UNIT/Skills/Knowledge I'll gladly use it

gistya · 2023-08-14T04:37:53Z

gpt.lua

+  if keys._MOUSE_R_DOWN or keys.LEAVESCREEN then
+    if view then
+      view:dismiss()
+    end
+  end


Thing is, right click usually doesn't close the knowledge description window, it just backs you to the list of known written content items, at which point we want the window closed so the user can open another knowledge item without necessarily generating it as soon as it's opened (which is what would happen if the window has remained opened). If someone is trying to economize which things they generate, that would be annoying.

The good news is that if you click back into the same knowledge item you just left, the text is all still there, and you can always alt-tab to the python cmd console to see the output there also in case you forgot to screenshot a good one.

I would suggest running the plugin and playing around with it, and then if you think I've misread the situation please let me know and I'll make the adjustment, or maybe we think of a third solution that's better than either of these options.

gistya · 2023-08-14T04:41:38Z

gpt.lua

+      end
+    end
+
+    self.subviews.label:updateLayout()


Maybe this is what's making the pcall() lock up the UI updating? I'll play around with it tomorrow.

gistya · 2023-08-14T04:42:39Z

gpt.lua

+--@ module = true
+
+local json = require('json')
+local dfhack = require('dfhack')


Suggested change

local dfhack = require('dfhack')

gistya · 2023-08-14T04:55:31Z

would it make sense to make the generated text persistent per world? then the same knowledge item would always get the same generated text per world (and it would cut down on API costs if the same item were viewed multiple times)

Right now it's a bit of a compromise. It caches the last-received item for the session in memory, so that if you were to re-open the same one again many times in a row, it will show you the cached response.

However as soon as you look at something else, now that occupies the cache, so that if you go back to the first one, it will regenerate fresh.

This can be nice, since it saves you from regenerating an entry if you just accidentally closed it and then reopened it. But it also lets you regenerate it purposefully in case the first version was lame, by looking at something else, and then coming back to the first thing.

That could be a good thing, if you're using the default cheap/fast engine, which is more of a quantity over quality compromise. In this case you'll encounter more times where you want to regenerate the same prompt.

However, if you're running GPT4, which costs about 5x more but is also likely to produce a great result the first time every time, then you'll probably not want to regenerate the same one twice very often. In the current situation, the onus is on the user to simply not click on something again that they don't want to regenerate; if they forgot it was generated before, that's on them.

Personally (and I'm not young) I have found it pretty easy to remember which ones I already generated, and I just generate one when I see something interesting that I'm fairly certain I haven't looked at before. If I'm confused, I can refer to my screenshots.

That being said, I can imagine every user will have a different set of expectations and preferences for how that ought to work, so in an ideal world it's configurable.

So, if we were to consider adding support for caching every response and persisting the responses to disk, what kind of database store do we have in the project's dependencies, capable handle this in a memory-efficient manner?

myk002 · 2023-08-17T01:50:11Z

So, if we were to consider adding support for caching every response and persisting the responses to disk, what kind of database store do we have in the project's dependencies, capable handle this in a memory-efficient manner?

how much text are we talking here? The excerpts I've seen are pretty small, and putting a few MB in JSON isn't out of the question. At runtime you'd cache in a table in Lua in memory. What are the upper bounds of the memory complexity requirements? Note that we do already link to zlib, so compression is an option if that becomes necessary.

Vitamin Arrr added 2 commits August 13, 2023 10:55

add plugin that generates written content using artificial intelligen…

0ae4c13

…ce via OpenAI API

Prevent remote connections

6fa6a3a

pre-commit-ci bot and others added 3 commits August 13, 2023 21:11

[pre-commit.ci] auto fixes from pre-commit.com hooks

e73719b

for more information, see https://pre-commit.ci

Merge branch 'master' into gpt

bfab83c

update docs, move gptserver under srv directory

b7b86ed

myk002 reviewed Aug 14, 2023

View reviewed changes

Vitamin Arrr added 3 commits August 13, 2023 20:18

Add period to pacify parser

de481e0

PR comments, fixes

4edbfbb

pr comments and ensure proper error if server helper is offline

ca5f44c

myk002 reviewed Aug 14, 2023

View reviewed changes

PR comments

9274c61

Apply suggestions from code review

b737d2a

Co-authored-by: Myk <myk.taylor@gmail.com>

gistya commented Aug 14, 2023

View reviewed changes

Vitamin Arrr added 2 commits August 13, 2023 22:06

Merge branch 'master' into gpt

c81f2e8

pr comment

0f9a648


		Setup:

		1. Register for an OpenAI API account. It must be a paid or active trial account.

AI-generated written text content #800

Are you sure you want to change the base?

AI-generated written text content #800

Conversation

gistya commented Aug 13, 2023 • edited

Description

Security

Costs

Remaining work to be done:

Tests

Integration testing

End-to-end testing

Unit testing

Future directions

gistya commented Aug 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gistya Aug 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gistya Aug 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

myk002 Aug 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

myk002 commented Aug 14, 2023

myk002 commented Aug 14, 2023

gistya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gistya Aug 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gistya commented Aug 14, 2023 • edited

myk002 commented Aug 17, 2023 • edited

gistya commented Aug 13, 2023 •

edited

gistya Aug 14, 2023 •

edited

gistya Aug 14, 2023 •

edited

myk002 Aug 14, 2023 •

edited

gistya Aug 14, 2023 •

edited

gistya commented Aug 14, 2023 •

edited

myk002 commented Aug 17, 2023 •

edited