Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary callback data #1844

Merged
merged 54 commits into from
Jun 6, 2021
Merged

Arbitrary callback data #1844

merged 54 commits into from
Jun 6, 2021

Conversation

Bibo-Joshi
Copy link
Member

@Bibo-Joshi Bibo-Joshi commented Mar 23, 2020

This description is quite outdated, please see the discussion below

ToDo

  • When merged, whe should create a wiki page
  • Open an issue for the v14 milestone as reminder to make Persistence.store_callback_data default to True and make BP.get/update_callback_data abstract methods. Currently they're not b/c backwards compatibility …
  • Add/Update .. versionadded:: directives
  • Add a note to Deprecation: v15 #2347 to remove Defaults from tg.Bot completey and move them to tg.ext.Bot. We may have to think about how to handle signatures with DefaultVaule, then, i.e. if _insert_defaults should stay in tg.Bot or should be moved to tg.ext.Bot.
  • should probably be merged after Customize context #2262, so that we can update the added persistence logic according to Customize context #2262 (comment), see also 39a9f83
  • check that the "edit on GH" thingies redirect to the correct file

This PR adds the functionality to

  • pass arbitrary objects as callback_data to InlineKeyboardButtons
  • in accordance to that, the pattern argument for CallbackQueryHandler may now be a callable accepting the callback_data as argument.
  • validate that the callback_data received in an update is the same as the sent one

This is opt-in, so we're not forcing anyone to use it.
Closes #709

How does it work

Passing arbitrary objects as callback_data

This is achieved by replacing them by their id() and keeping a dictionary that maps the ids to the objects. I decided to use id() over uuid or similar because

  • Python guarantees that the id() is unique, so that's covered already and
  • immutables won't be stored multiple times. Since most callback data will probably still consist of strings, this seemed clever to me.

Of course, this ties in with persistence. As for the roles PR #1789: Adding stuff to persistence should be reserverd for major features. IMHO, this is one of them.

Signing the data

The callback data sent by the bot will read signature data. signature is generated using hmac from the standard library, where the key consists of the bots token and username and the message consits of the data and the chat_id, the message was sent to.

Upon receiving an update, CallbackQuery.de_json will check, that the data is valid by checking the signature. Malicious updates will be discarded and logged.

Generating the signature and validation done by methods added to the helpers module.

Users can decide, not to validate the data. This can be useful, when the token has been revoked and they want the old buttons to still work.

Wiki Page: Callback Data

Arbitrary objects as callback_data

Telegrams Bot API only accepts strings with length up to 64 bytes as callback_data for InlineKeyboardButtons, which sometimes is quite a limitation.

With PTB you are able, to pass any object as callback_data. This is achieved by storing the object and passing a unique id to Telegram. When a CallbackQuery is recieved, the id in the callback_data is replaced with the stored object. To use this feature, set the parameter arbitrary_callback_data for your Updater or Bot instance to True.

This means two thigs for you:

  1. If you don't use persistence, buttons won't work after restarting your bot, as the stored updates are lost. More precicely, the callback_data you will recieve is None. If you don't need persistence otherwise, you can set store_callback_data to True and all the others to False.

  2. When using the CallbackQueryHandler, the pattern argument can now be either

    • a regex expression, which will be used, if the callback_data is in fact a string
    • a callable accepting the callback_data as only argument. You can perform any kinds of tests on the callback_data and return True ore False accordingsly

Security of InlineKeyboardButtons

Callback updates are not sent by Telegram, but by the client. This means that they can be manipulated by a user. (While Telegram inofficially does try to prevent this, they don't want Bot devs to rely on them doing the security checks).

Most of the time, this is not really a problem, since callback_data often just is Yes, No, etc. However, if the callback data is something like delete message_id 123, the malicious user could delete any message sent by the bot.

To increase security of callbacks, PTB signs outgoing callback data. The signature not only makes sure that the callback_data is valid, but also that the update comes from the chat, the message with the InlineKeyboardButton was sent in. Incoming updates with an invalid signature will be rejected.

The signature is based on your bots token and username. This means that old buttons will no longer be accepted, if you revoke your token. If you need to revoke your token, e.g. during developement, and want the buttons to still work, you can pass

validate_callback_data = False

to either your Updater or Bot instance. Make sure to switch it to True after some time.

@Bibo-Joshi
Copy link
Member Author

IISC this doesn't handle inline keyboards attached to inline result messages yet. If we revisit this PR, that's one issue to think about

@Bibo-Joshi Bibo-Joshi added this to the v13.2 milestone Dec 17, 2020
@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented Jan 2, 2021

We just discussed this and will be going forward with it. A few notes:

  • Switch to UUID instead of id()
  • built the data storage as LRU cache limiting memory usage
    • store the timestamp along with the timestamp so if the user passes maxsize=None they can take care of deleting based on time stamps on their own.
    • Still provide CallbackQuery/Context/Bot.drop_callback_data for fine grained control

sketch for a LRU data class:

class LRUDict:
    data = {}
    maxsize = 1000

def put(self, key, object):
    if len(self.data) == maxsize:
        self.data.pop(next(iter(self.data)))  # dicts are only guaranteed to be ordered since py3.7, so maybe use OrderedDict instead …
    self.data[key] = object

Edit: Couple with linked list to make efficient

Copy link
Member

tsnoam commented Jan 2, 2021

@Bibo-Joshi Ordered dict is not LRU.
Ordered dict (in python semantics) refers to the order of insertion.
So it's either to remove and insert again for every object we use or we use a proper LRU.
A proper LRU will be implemented so that every access to a key will cause it to be marked as recently used. This is done internally by keeping a doubly linked list which has the most used entries in the head and least used entries in the tail.

Though, writing this, I think that if the intention is to immediately remove items when they're accessed then an Ordered Dict would be enough. (Though it might be limiting if we change the design)

without tests or pre-commit for now

# Conflicts:
#	telegram/bot.py
#	telegram/callbackquery.py
#	telegram/error.py
#	telegram/ext/basepersistence.py
#	telegram/ext/callbackqueryhandler.py
#	telegram/ext/dictpersistence.py
#	telegram/ext/dispatcher.py
#	telegram/ext/picklepersistence.py
#	telegram/ext/updater.py
#	telegram/utils/helpers.py
#	telegram/utils/webhookhandler.py
#	tests/test_bot.py
#	tests/test_callbackquery.py
#	tests/test_helpers.py
#	tests/test_persistence.py
#	tests/test_updater.py
@Bibo-Joshi
Copy link
Member Author

Merging master was surprisingly easy :D
Adapted the LRU idea. Namings can ofc be changed.
Tests run fine locally, let's see how they perform in CI.

@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented Jan 6, 2021

After internal discussion, using the uuid should be enough, so the latest commit removes the signing procedure.
However I became aware of two flaws in the current setup:

  • the data in the keyboard attached to the incoming CallbackQuery is currently not replaced
  • pressing a button does not automatically mean that the keyboard will be deleted. The bot might just call answer_callback_query(text='hint') to display some info, without changing the message or the keyboard. So we musn't delete the data on incoming CallbackQueries. This means that
    1. CallbackDataCache needs to be properly converted to a LRU-cache by updating the order of the items in the doubly linked queue
    2. We should provide convenience methods to delete the objects corresponding to the incoming query or the complete attached keyboard. Probably samething like CallbackQuery.{clear_data, clear_keyboard_data}, InlineKeyboardMarkup.clear_data, InlineKeyoardButton.clear_data() and Message.clear_keyboard_data

Edit I: We could think of making Message.edit_text/reply_markup/… and CallbackQuery.edit_message_text/reply_markup/… (i.e. everything that drops the keyboard) drop the data automatically 🤔

Edit II: just skipping CQs with invalid data is a bad idea. Users might wanna reply with a prompt telling them that the button no longer works. Rather set the data attribute to the InvalidCallbackData exception or something like that.

Edit III: Inline messages. We don't get the keyboard back when buttons from those are pressed. Idea: instead of making the LRU cache per button, make it per keyboard. we have space for 2 uuids in 64chars anyway. So once we get a CQ, we know all the other buttons/objects that we sent along with it and can make use of Edit I. Ofc when answering an inline query with multiple results that have an inline keyboard, we don't know which one actually gets sent, but that's what the LRU cache is good for.

@Bibo-Joshi Bibo-Joshi changed the title Arbitrary, signed callback data Arbitrary callback data Jan 6, 2021
Copy link
Member

@harshil21 harshil21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things that need to be changed:

  • There's still traces of InvalidButtonData all over in the docs, needs to be updated (I gave up after a point)
  • All instances of Any to object
  • The TODO section

There's also TTLCache from cachetools which would mean that we wouldn't need the access_time logic, as we can just use their implementation. Maybe that can help simplifying the code?

Regarding slots, should we make a new branch based on this and then merge here? The implementation detail should be kept separate from this imo. This PR is big enough already

Finally I think an example for this would help users

.pre-commit-config.yaml Outdated Show resolved Hide resolved
docs/source/telegram.ext.bot.rst Outdated Show resolved Hide resolved
telegram/callbackquery.py Outdated Show resolved Hide resolved
telegram/ext/basepersistence.py Show resolved Hide resolved
telegram/ext/bot.py Outdated Show resolved Hide resolved
tests/test_bot.py Outdated Show resolved Hide resolved
telegram/ext/callbackqueryhandler.py Outdated Show resolved Hide resolved
telegram/ext/callbackqueryhandler.py Outdated Show resolved Hide resolved
tests/test_bot.py Outdated Show resolved Hide resolved
telegram/ext/callbackdatacache.py Show resolved Hide resolved
@Bibo-Joshi Bibo-Joshi mentioned this pull request May 30, 2021
@Bibo-Joshi
Copy link
Member Author

Thanks for the thorough review @harshil21, you found a lot of things to improve 😃 So far I've gone through your comments and marked all as resolved that where either clear or where you reacted with 👍 to my comments. I will get to the TODO section soonish. Regarding the other comments:

There's also TTLCache from cachetools which would mean that we wouldn't need the access_time logic, as we can just use their implementation. Maybe that can help simplifying the code?

Unfortunately this doesn't allow for the entries not to expire :/

Regarding slots, should we make a new branch based on this and then merge here? The implementation detail should be kept separate from this imo. This PR is big enough already

Upsie, read that too late :D already merged that

Finally I think an example for this would help users

stay tuned 🎶

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there. Relax, I am just a little warning for the maintainers to release directly after merging your PR, otherwise we have broken examples and people might get confused :)

@Bibo-Joshi
Copy link
Member Author

I added an example that can be reviewed.
Also I just cherry-picked the stuff from #2262 (comment) + made according adjustments here, so all the open todo items in the description are "clean-up" after merge.
Only thing left from my side is to try & up the coverage a bit when the latest tests are finished

Copy link
Member

@harshil21 harshil21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay example looks good. However, I noticed one thing while running the example - When a user spam clicks a button (say '1'), it becomes an instance of InvalidCallbackData, and it fails. Of course when you remove the drop_callback_data line, this doesn't happen anymore. But maybe such behaviour should be at least documented somewhere?

Also can we remove the cast() in the example and ask mypy to ignore them? makes it a little complicated for beginners.

telegram/ext/dictpersistence.py Outdated Show resolved Hide resolved
telegram/ext/callbackdatacache.py Outdated Show resolved Hide resolved
@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented Jun 5, 2021

Okay example looks good. However, I noticed one thing while running the example - When a user spam clicks a button (say '1'), it becomes an instance of InvalidCallbackData, and it fails. Of course when you remove the drop_callback_data line, this doesn't happen anymore. But maybe such behaviour should be at least documented somewhere?

Ah, yes, that's right. In fact it could be considered a feature, see #2539 :D Anyway, I've updated the wiki above as follows:

However, if you want to keep memory usage low, you have
additional options to drop data:

  • on receiving a CallbackQuery, you can call context.drop_callback_data(callback_query). This will delete the data associated with the keyboard attached to the message that originated the CallbackQuery. Calling context.drop_callback_data is safe in any case where you change the keyboard, i.e. callback_query.edit_message_text/reply_markup/…
    Note: If the user clicks a button more than one time fast enough, but you call context.drop_callback_data for the first resulting CallbackQuery, the second one will have InvalidCallbackData as callback_data. However, this is usually not a problem, because one only wants one button click anyway.

Note that I also scraped the sentence about drop_callback_data not working for inline messages, b/c that's BS (commit incoming) :D

Also can we remove the cast() in the example and ask mypy to ignore them? makes it a little complicated for beginners.

I'm not too sure about this, tbh. If we drop cast, we'll have to add a type: ignore, which probably also confusing for beginners. I'd like to keep at least the cast in

# Get the data from the callback_data.
    number, number_list = cast(Tuple[int, List[int]], query.data)

because if you use type hinting, you'll have to do that cast in any case when using arbitrary callback data, so I think the example should showcase it. I could live with dropping the other two casts, though …. (edit: I did)

Copy link
Member

@harshil21 harshil21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# Conflicts:
#	docs/source/telegram.ext.utils.types.rst
#	telegram/ext/__init__.py
#	telegram/ext/basepersistence.py
#	telegram/ext/callbackcontext.py
#	telegram/ext/conversationhandler.py
#	telegram/ext/dictpersistence.py
#	telegram/ext/dispatcher.py
#	telegram/ext/picklepersistence.py
#	telegram/ext/updater.py
#	telegram/ext/utils/types.py
#	telegram/utils/types.py
#	tests/test_persistence.py
#	tests/test_slots.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

make callbackquery safe
6 participants