Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring errors #109

Closed
pawelgnatowski opened this issue Aug 16, 2022 · 32 comments
Closed

Ignoring errors #109

pawelgnatowski opened this issue Aug 16, 2022 · 32 comments
Labels
API API related issue

Comments

@pawelgnatowski
Copy link

Describe the bug
while bugs are encountered, export process terminates
To Reproduce
Steps to reproduce the behavior:

  1. Run slackdump like this '...'
    slackdump -f -export MyDump @channelList.txt
    Expected behavior
    Continue or prompt if to ignore error.
    Output
    2022/08/16 13:53:39 application error: export error: failed to dump "xyz" (xxx): callback error: failed to dump channel xxx: strconv.Atoi: parsing "null": invalid syntax

Desktop (please complete the following information):

  • OS: [e.g. macOS]
    Windows
    Additional context
    Same problem occurs if there is temporary slack down time e.g. 503 error.
    For larger exports it may take time to process.
    It would be good to expose flags to ignore error and/or retry X times after Y time then ignore or exit.
    Now i need to forgo entire channel as it is failing. Also had to write script that compares continent if other channels fail (not fatally) and can be restarted later.

Otherwise gj bro!

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

Hey @pawelgnatowski , thank you for your report and glad that you found this tool useful.

I'll see what I can do with this.

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

  1. It seems that this error is propagated from the slack-go/slack library (there's no place in the slackdump code, where that calls the Atoi) and is related to JSONTime type. That's bad news, because if it was our code, it would be possible to omit the particular field value, with slack library, we'll need to sacrifice the whole batch of messages (ConversationsPerReq is the maximum, but API may return less, depending on it's mood) without complex iterative logic which would decrease the batch size and retry the API call until it, eventually, succeeds, therefore identifying the failing message in the batch. Of course, losing, say, 100 message in a batch is not a tragedy, comparing to losing all channel data.
  2. Can you confirm that the command line in the issue is correct? The command line refers to "export" mode, while this error can only be returned if running in "conversation dump" mode. I will implement the "ignore errors" flag globally, asking out of interest.

@pawelgnatowski
Copy link
Author

You may be right - i actually have tried full export first but due to another error and sheer size of 5k channels i wanted to limit the amount of channels, the only way i found is by providing channel list.
BTW. Is there any way to get saved-items (kind of starred elements?) and mentions and reactions - this is how i build my MVP list which i do not see exported anywhere.

Command:
slackdump.exe -f -export xxx @channels.txt

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

Ah, it makes sense now. Re starred items - no, for now Slackdump is quite simple - only gets channels, users, and conversations.

  1. Starred items (i just checked) is a separate api call. It's actually a very good feature suggestion.
  2. There's no dedicated API to get mentions, as they're just a markup within the message,
  3. but there's a reactions.list API endpoint which, similar to starred items, can be used to get all items that the user has reacted on. Will place it in the TODO list as well :)

@pawelgnatowski
Copy link
Author

sounds good, any ETA on the 1,3 - need to know how dirty i need to make my hands, as time window is closing fast. btw - i tried slack export viewer - i guess it is either full export or it does not work :(
but that i guess could tackle at a later time... damn i love Slack.

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

Sorry, no ETA on this - i do it in my free time, features are plentiful, and I got only two hands 😂 But I'll see what I can do

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

When I released it open source I hoped that there'd be people contributing, as it seems to be helpful, but I guess the time hasn't come yet.

@pawelgnatowski
Copy link
Author

wish i could - haven't picked up Go yet.

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

That's no problem, Pawel :) Feature suggestion or bug report are also great contributions, feedback loop is very important.

@pawelgnatowski
Copy link
Author

i definitely must go for 1 & 3 which means i'll probably use node or python for it. Can share some lessons learned for the APIs you have mentioned. Thanks for doing this project. Kudos!

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

Thank you :)

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached
slackdump-2.1.2-null-fix.zip
). Could you please try it on the problematic channel and see if that works?

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.

Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.

@pawelgnatowski
Copy link
Author

@pawelgnatowski I have made a quick and not so dirty patch to my fork of the slack library that will treat "null" as zero unix time (Jan 1,1970), and built a windows binary out of it,(attached slackdump-2.1.2-null-fix.zip ). Could you please try it on the problematic channel and see if that works?
can't download it, blocked download - virus detected ;]

@pawelgnatowski
Copy link
Author

Also, I don't think that ignoring this kind of errors will work - each api call returns a next token, that might be nil in case that the error occurs, so it would not be possible to get the next "page". Let's see if the quickfix from my previous comment works.

Regarding 503 and the server-class errors in general, I think it would be possible to handle it along with 429 rate limit errors, but in this case we'd have to wait at increasing time intervals, i.e. 1st attempt fails, wait 30 seconds, next attempt fails, wait 60 seconds, etc.

  1. ok - what if you change pagination size to mby ommit offending message and still get a token - or am I missing something?
  2. sounds reasonable.

@rusq
Copy link
Owner

rusq commented Aug 17, 2022

can't download it, blocked download - virus detected ;]
https://www.virustotal.com/gui/file/089be6d45ee681e8936d8d7b98c2e471d5e3bea887b509e0e7b332d592585388?nocache=1

Seems like a false positive? Anyway, I can understand the lack of trust.

Here are the changes in slackdump:
master...i109-qf

and here are the changes in the slack lib fork:
rusq/slack@master...null-time

Would you be able to checkout and build branch i109-qf on your machine and check if that works?

@pawelgnatowski
Copy link
Author

Hey, not about trust, just literally it was blocked by browsers. I will be back in a couple of days and will try then.

@pawelgnatowski
Copy link
Author

gave the zip file another try and it works ^_^, guess M$ updated defender or smth.
tried the faulty channel again:
2022/08/27 10:51:58 error saving "FGHGC2XFG-" to "xxx\attachments": callback error: download to "xxx\attachments\FGHGC2XFG-" failed, [src=]: received empty download URL
process continues though... <3

@rusq
Copy link
Owner

rusq commented Aug 27, 2022

Thanks! Looks like there's some malformed file within that channel - there's an ID of this file ("FGHGC2XFG"), but no name, no URL etc. Very strange. But glad to hear that it works, I basically modified the slack library to ignore empty JSONTime. I'll submit the PR to upstream slack library. If that doesn't get through, i'll just maintain the change in the fork.

@rusq
Copy link
Owner

rusq commented Aug 27, 2022

I have prepared a tool for #115, that shows the RAW output of the API - can I ask you to run it on that channel, and copy/paste the JSON for that file object with "ID": "FGHGC2XFG". Would be interesting to see what in the actual fuck is going on over there?
rawoutput.zip

It uses the same auth as the slackdump, so you could run it like this:

rawoutput.exe channel_id

it will generate the slackdump_raw.log file which is a dump of headers and JSON output from the API - could you please search for FGHGC2XFG, and paste the surrounding json object in this thread? Most likely it will be empty, but if it contains some identifiable information, i.e. slack workspace name, it would make sense to obfuscate it, or replace with meaningless strings. I'd be keen to see what fields of that malformed file are populated and which are not.

@pawelgnatowski
Copy link
Author

{ "type":"message", "text":"ZZZZ)\n\nYYY\n\nHHH?", "files":[ { "id":"FGHGC2XFG", "mode":"tombstone" } ], "upload":true, "user":"XXX7CK4GK", "display_as_bot":false, "ts":"1551261378.012600", "thread_ts":"1551261378.012600", "reply_count":2, "reply_users_count":2, "latest_reply":"1551262603.013800", "reply_users":[ "XXX7CK4GK", "XXX1654PP" ], "is_locked":false, "subscribed":false }

@rusq
Copy link
Owner

rusq commented Aug 27, 2022

Very interesting - it looks like it's a "deleted remote file" according to the this doc

Probably they are so rare, that no one ever had this special case with the slack lib. I searched through their issues and was unable to find anything on this.

Thank you!

@rusq
Copy link
Owner

rusq commented Aug 27, 2022

TODO:

  • Handle tombstone files
  • Open an issue with slack-go/slack on "tombstone" files.

@rusq
Copy link
Owner

rusq commented Aug 30, 2022

@pawelgnatowski I was trying to reproduce this the other day, the same way I did with #119 (the test code is in the issue I've opened with slack lib slack-go/slack#1104), however I did not get the unmarshal error, until I've added a "timestamp":null piece to the file.

  1. If you still have the raw_output file that was generated, could you please search it for the string "null"?
  2. If it's there, could you please post it the way you did last time with the PII removed, so I could use it to open another issue with the slack lib?

Thank you!

@pawelgnatowski
Copy link
Author

{ "type": "message", "text": "We're starting a data science community ", "files": [ { "id": "FSXXX79LN", "created": 1573757704, "timestamp": null, "name": "Data_Science_Community_of_Practice", "title": "Data Science Community", "mimetype": "application\/vnd.slack-docs", "filetype": "docs", "pretty_type": "Arugula", "user": "UXX617GTY", "editable": true, "size": 8886, "mode": "docs", "is_external": false, "external_type": "", "is_public": true, "public_url_shared": false, "display_as_bot": false, "username": "", "url_private": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/data_science_community_of_practice", "url_private_download": "https:\/\/files.slack.com\/files-pri\/T0XXX3EC-FSXXX79LN\/download\/data_science_community_of_practice", "permalink": "https:\/\/myteam.slack.com\/files\/T0XXX3EC\/FSXXX79LN", "permalink_public": "https:\/\/slack-files.com\/T0XXX3EC-FSXXX79LN-0733464a3f", "preview": "<p><br><br>We're staring a data science community of practicexxxxxxxxxxx<br><br><br><\/p>", "editor": null, "last_editor": null, "non_owner_editable": null, "updated": null, "is_starred": false, "has_rich_preview": false } ], "upload": true, "user": "UXX617GTY", "display_as_bot": false, "ts": "1561469761.006800", "thread_ts": "1561469761.006800", "reply_count": 12, "reply_users_count": 12, "latest_reply": "1562083633.018200", "reply_users": [ "UDVYYY3CH", "UDQYYYHGB", "UEJYYYG5Q", "UDZYYYPJR", "UERYYYZ6H", "U20YYY4UB", "UF7YYYFG9", "UCRYYYLUB", "UDCYYYYV8", "UE4YYY3S7", "UCXYYYYR1", "UE2YYYYNA" ], "is_locked": false, "subscribed": false }

@rusq
Copy link
Owner

rusq commented Aug 31, 2022

Excellent, thank you! Reproduced straight away!

_experiments/slack/bug109$ go run .
2022/08/31 17:26:07 strconv.Atoi: parsing "null": invalid syntax
exit status 1

@rusq
Copy link
Owner

rusq commented Aug 31, 2022

Created an issue slack-go/slack#1107 and PR slack-go/slack#1106 for the upstream library.

@pawelgnatowski
Copy link
Author

Btw. The stars and reactions API is super straight forward
Added team and user ids and got what i needed.
Super easy!
Thanks for the tip!

@rusq
Copy link
Owner

rusq commented Sep 1, 2022

Hey @pawelgnatowski , sorry, I was too focused on the API issue, and the reactions and bookmarks completely slipped my mind. I'll create a separate issue for those, not to lose track.

@pawelgnatowski
Copy link
Author

No prob, like you said, you do it when you do it.
I used your suggestions and just went to Slack api pages and voila.
Anyway, maybe you know of a good way to browse and search the dump?
Would be awesome to also get full text search, also with docs, ppt etc.
Any suggestions/ideas?

@mootari
Copy link

mootari commented Sep 1, 2022

Any suggestions/ideas?

@pawelgnatowski Have a look at this discussion: #127

@rusq
Copy link
Owner

rusq commented Sep 27, 2022

Merged the upstream slack library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API related issue
Projects
None yet
Development

No branches or pull requests

3 participants