{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":531177364,"defaultBranch":"main","name":"maze-transformer","ownerLogin":"understanding-search","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-08-31T16:55:46.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/124733501?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715989243.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"a52baca69a2c946065397cea7cc2ff6592345f9e","ref":"refs/heads/update-maze-dataset-tokenizers-step2","pushedAt":"2024-05-17T23:40:43.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"maze-dataset PR #37 moved token_utils.py and util.py to a different directory. Updated imports.","shortMessageHtmlLink":"maze-dataset PR #37 moved token_utils.py and util.py to a different d…"}},{"before":"20092c37a6b553d03e4d398592c68049c0d0a6a1","after":"3c7fb12e380fa4bc183457e4f637083403555695","ref":"refs/heads/main","pushedAt":"2024-05-14T03:31:16.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"allow HookedTransformer without ZanjHookedTransformer","shortMessageHtmlLink":"allow HookedTransformer without ZanjHookedTransformer"}},{"before":"e70c43a6d40e354e9fd9d7f5593f519993e1de40","after":null,"ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T03:00:01.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"}},{"before":"495d8d329a6f952d562f94563c4d45bf2ae1f5ce","after":"20092c37a6b553d03e4d398592c68049c0d0a6a1","ref":"refs/heads/main","pushedAt":"2024-05-14T03:00:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"update maze-dataset dep and poetry lockfile (#213)\n\n* update maze-dataset dep and poetry lockfile\r\n\r\n* fixed some imports\r\n\r\n* update another import\r\n\r\n* another import fix\r\n\r\n* hookedtransformer maze tokenizer compat\r\n\r\n* update wandb dep\r\n\r\n* update transformer_lens dep\r\n\r\n* fix import in notebook\r\n\r\n* re-ran this notebook. was failing in CI for some reason, idk?\r\n\r\n* better error when dataset cfgs dont match\r\n\r\n* (run format) - working locally, but configs differ in CI\r\n\r\n```\r\nValueError: ('dataset has different config than cfg.dataset_cfg, and allow_dataset_override iscollect_generation_meta', 'args': (), 'kwargs': {} False', \"{'applied_filters': {'self': [{'name': '}], 'other': []}}\")\r\n```\r\n\r\nprobably because we are loading a dataset with the new format. will patch but it will be hacky\r\n\r\n* special case for applied filters diff","shortMessageHtmlLink":"update maze-dataset dep and poetry lockfile (#213)"}},{"before":"78dc31b0214743ee8be504498211fd044e56b3fb","after":"e70c43a6d40e354e9fd9d7f5593f519993e1de40","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:50:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"special case for applied filters diff","shortMessageHtmlLink":"special case for applied filters diff"}},{"before":"a8beac9e43e4ade82524113bb142f851697419cb","after":"78dc31b0214743ee8be504498211fd044e56b3fb","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:24:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"(run format) - working locally, but configs differ in CI\n\n```\nValueError: ('dataset has different config than cfg.dataset_cfg, and allow_dataset_override iscollect_generation_meta', 'args': (), 'kwargs': {} False', \"{'applied_filters': {'self': [{'name': '}], 'other': []}}\")\n```\n\nprobably because we are loading a dataset with the new format. will patch but it will be hacky","shortMessageHtmlLink":"(run format) - working locally, but configs differ in CI"}},{"before":"f82cf01b8ea6fbf66852fdd295f962c2f042d012","after":"a8beac9e43e4ade82524113bb142f851697419cb","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:19:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"better error when dataset cfgs dont match","shortMessageHtmlLink":"better error when dataset cfgs dont match"}},{"before":"b9578410f607253b62a73a55f00effa806e2dabc","after":"f82cf01b8ea6fbf66852fdd295f962c2f042d012","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:13:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"re-ran this notebook. was failing in CI for some reason, idk?","shortMessageHtmlLink":"re-ran this notebook. was failing in CI for some reason, idk?"}},{"before":"fc5753e8b5ed1c26657a1673da3c0003f3e8ba79","after":"b9578410f607253b62a73a55f00effa806e2dabc","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:11:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"fix import in notebook","shortMessageHtmlLink":"fix import in notebook"}},{"before":"b77ed5f4da9cd9711161d394310496644ca55f43","after":"fc5753e8b5ed1c26657a1673da3c0003f3e8ba79","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:10:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"update transformer_lens dep","shortMessageHtmlLink":"update transformer_lens dep"}},{"before":"c5fb75a299d73d77a6aae9603a8b7af64f73e1cf","after":"b77ed5f4da9cd9711161d394310496644ca55f43","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T02:02:19.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"update wandb dep","shortMessageHtmlLink":"update wandb dep"}},{"before":"25fc0551f441fba478a791799cb72ef52865a267","after":"c5fb75a299d73d77a6aae9603a8b7af64f73e1cf","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T01:52:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"another import fix","shortMessageHtmlLink":"another import fix"}},{"before":"1f6c6604a30143f2302b02a66190b0b830ff9ac5","after":"25fc0551f441fba478a791799cb72ef52865a267","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T01:47:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"update another import","shortMessageHtmlLink":"update another import"}},{"before":"dba62be84a73a0d56d1ddc9993aa79bdf4a34daa","after":"1f6c6604a30143f2302b02a66190b0b830ff9ac5","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T01:43:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"fixed some imports","shortMessageHtmlLink":"fixed some imports"}},{"before":null,"after":"dba62be84a73a0d56d1ddc9993aa79bdf4a34daa","ref":"refs/heads/fix-compat","pushedAt":"2024-05-14T01:39:21.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"update maze-dataset dep and poetry lockfile","shortMessageHtmlLink":"update maze-dataset dep and poetry lockfile"}},{"before":"0ff866e92f6199402b104ce3e3e1dd69e616b38d","after":"ac2c09c93bb021f06ff9bcd5ece6b704c603cb5d","ref":"refs/heads/remove-unk-token","pushedAt":"2024-04-25T23:04:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"Update dependencies, including `maze-dataset = \"^1.0.0\"`","shortMessageHtmlLink":"Update dependencies, including maze-dataset = \"^1.0.0\""}},{"before":null,"after":"0ff866e92f6199402b104ce3e3e1dd69e616b38d","ref":"refs/heads/remove-unk-token","pushedAt":"2024-04-18T18:34:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"Add check on token for `maze-dataset` update","shortMessageHtmlLink":"Add check on <UNK> token for maze-dataset update"}},{"before":"b0a27b4250ac5420a845eea084e9591c305c2bec","after":null,"ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T22:52:45.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"}},{"before":"b3417f9634b9d40867864d72cb174e8d4f1b6bb9","after":"495d8d329a6f952d562f94563c4d45bf2ae1f5ce","ref":"refs/heads/main","pushedAt":"2024-03-05T22:52:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"fix tokenizer to work with new version of transformers library (#208)\n\nvarious fixes to make tokenizers work with the latest versions of HF `transformers` and `transformer_lens`\r\n\r\n# commit history\r\n* Try and fix tokenizer to work with new version of transformers library\r\n\r\nThe proposed solution is probably not backwards compatible, and is fairly hacky (it strips spaces, and I am not sure it properly assigns vocab / special tokens):\r\n\r\nThere is an issue with our tokenization in the new version of transformers. In particular, in the tokenize function from transformers.tokenization_utils.py the line tokens = self.tokens_trie.split(text) returns a list of tokens with spaces if the input sequence is (1,0)… (i.e. includes spaces). this wasn’t the case before, and I suspect stems from how I have to change the addition of the vocabulary in our tokenizer (to work with their new way of handling token addition via the _add_tokens method (we can’t just overwrite the dicts as these are now properties >.<). As a temporary fix we can manually remove spaces from sequences, but that’s quite disgusting\r\n\r\nThe best option might be to create token jsons and push a tokenizer to huggingface.\r\n\r\n* Updated poetry dependencies. `poetry.lock` now has `transformers 4.38.1`, `transformer-lens 1.14.0` among many other updates.\r\n\r\n* Added `self.init_kwargs[\"add_bos_token\"] = True` as an uninformed band-aid. Need to discuss if this makes any sense.\r\n\r\n* Tiny fix to `HuggingMazeTokenizer._tokenize` as described in the Github comment above. One unit test eliminated, other unit tests and notebook tests pass. A few notebooks are dumping their outputs directly to notebooks/ instead of a temp directory. Didn't delete them just for reference by a future fix.\r\n\r\n* Unit tests pass, my CPU won't let me run `make test` right now.\r\n\r\n* All tests pass\r\n\r\n* Updated `black` dependency to match CI version. Reran formatting.\r\n\r\n* run formatters\r\n\r\n* minor type hint fix\r\n\r\n* our special tokens aren't what HF special tokens are\r\n\r\n* re-run format??\r\n\r\n* improved test_maze_to_tokens_roundtrip, added comparison with manually inspected tokenization\r\n\r\n* throw exception on an empty space token\r\n\r\n* moved tokenizer test to maze-dataset\r\n\r\n* format\r\n\r\n---------\r\n\r\nCo-authored-by: aaron-sandoval <32021231+aaron-sandoval@users.noreply.github.com>\r\nCo-authored-by: mivanit ","shortMessageHtmlLink":"fix tokenizer to work with new version of transformers library (#208)"}},{"before":"dc7534fe2cb20e355866d5d08b3e76d564f6e4b7","after":"b0a27b4250ac5420a845eea084e9591c305c2bec","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T22:45:47.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"format","shortMessageHtmlLink":"format"}},{"before":"59838a4672711eff2dd5fd2334edaf061cea74c3","after":"dc7534fe2cb20e355866d5d08b3e76d564f6e4b7","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T22:43:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"throw exception on an empty space token","shortMessageHtmlLink":"throw exception on an empty space token"}},{"before":"161e1a0b867250b9486e05519542792d3eb182af","after":"59838a4672711eff2dd5fd2334edaf061cea74c3","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T22:36:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"improved test_maze_to_tokens_roundtrip, added comparison with manually inspected tokenization","shortMessageHtmlLink":"improved test_maze_to_tokens_roundtrip, added comparison with manuall…"}},{"before":"052a1d9b4027e11873e8e1494fe83041d6f0ff9d","after":"161e1a0b867250b9486e05519542792d3eb182af","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T21:44:30.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"re-run format??","shortMessageHtmlLink":"re-run format??"}},{"before":"1a961bbf088ca147130b8c4e2a7bce97b6fb0360","after":"052a1d9b4027e11873e8e1494fe83041d6f0ff9d","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T21:35:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"minor type hint fix","shortMessageHtmlLink":"minor type hint fix"}},{"before":"b1abc373e97c78ec652ce254dcb4f5dde2ac58b0","after":"1a961bbf088ca147130b8c4e2a7bce97b6fb0360","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-05T21:30:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"mivanit","name":null,"path":"/mivanit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19347900?s=80&v=4"},"commit":{"message":"run formatters","shortMessageHtmlLink":"run formatters"}},{"before":"4e8e450e757f208cf65d784b82ee6bffef8a51c2","after":"b1abc373e97c78ec652ce254dcb4f5dde2ac58b0","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-01T18:08:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"Updated `black` dependency to match CI version. Reran formatting.","shortMessageHtmlLink":"Updated black dependency to match CI version. Reran formatting."}},{"before":"f91ab15dac63baa376cbe90822c37eee2216ca1a","after":"4e8e450e757f208cf65d784b82ee6bffef8a51c2","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-03-01T17:44:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"All tests pass","shortMessageHtmlLink":"All tests pass"}},{"before":"767977bbee38fcc98102196afbf0a08bd8012215","after":"f91ab15dac63baa376cbe90822c37eee2216ca1a","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-02-29T23:16:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"Unit tests pass, my CPU won't let me run `make test` right now.","shortMessageHtmlLink":"Unit tests pass, my CPU won't let me run make test right now."}},{"before":"8ace9e06822ed59fc77e4e5974b5e25cbab49287","after":"767977bbee38fcc98102196afbf0a08bd8012215","ref":"refs/heads/tokenizer-fix","pushedAt":"2024-02-29T19:16:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"Tiny fix to `HuggingMazeTokenizer._tokenize` as described in the Github comment above. One unit test eliminated, other unit tests and notebook tests pass. A few notebooks are dumping their outputs directly to notebooks/ instead of a temp directory. Didn't delete them just for reference by a future fix.","shortMessageHtmlLink":"Tiny fix to HuggingMazeTokenizer._tokenize as described in the Gith…"}},{"before":null,"after":"62251669e1b4d02d35c5efeae0edc6e0f5696161","ref":"refs/heads/add-hf-tokenizer","pushedAt":"2024-02-23T22:55:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"aaron-sandoval","name":null,"path":"/aaron-sandoval","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/32021231?s=80&v=4"},"commit":{"message":"wip to enable saving and loading a HuggingMazeTokenizer to HF Hub. Merges in some new work done on `main`, like adding `save_vocabulary`. Might end up needing some tweaks before the branch is set up right.","shortMessageHtmlLink":"wip to enable saving and loading a HuggingMazeTokenizer to HF Hub. Me…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAETVzb1wA","startCursor":null,"endCursor":null}},"title":"Activity · understanding-search/maze-transformer"}