-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs into the repo #756
base: master
Are you sure you want to change the base?
Conversation
5ab8ab6
to
27f5f39
Compare
Great work so far. 👍 What do you think about having the website on an orphan We should also put the name change of Sam on the todo list (#543 (comment)) |
Using a different repo would be even cleaner. But then it is harder to do coordinated changes (e.g. updating tutorial to new API), as Git & GitHub does not really support this. Plus “out of sight out of mind” applies. Since the goal of this effort is to make updating the docs simpler, I think using the same branch is probably the best choice here. Zola is essentially zero-config (only really requires setting the site URL Opened #757 for the todos. |
8094df0
to
ef48853
Compare
@mblaney The last remaining question content-wise is what to do with the demo:
We should also decide on the host we want to use:
|
thanks @jtojnar I wouldn't go for the subdomain option because that involves finding someone who can do DNS changes. It looks like the domain is owned by Automatiic, if someone here wants to help make that happen hopefully they will jump in, otherwise I would say continue with the other options. By process of elimination that means staying with the current host and either removing the demo or possibly using the github action you've suggested. Happy for you to decide on that. |
@mblaney If we stay with the current host, keeping the demo working is not that hard. Do you have FTP or SSH credentials for the host? Depending on that, we will need to choose either https://github.com/marketplace/actions/ftp-deploy or https://github.com/marketplace/actions/web-deploy-anything. And either way, we will need to set up credentials on GitHub: https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository |
I have FTP credentials but I'm not a project owner, so not sure how far I will get. Let me know when you're ready and I can try adding the credentials. |
9559027
to
3ed0fc1
Compare
I have tested this on my repo, seems to work well, including the demo: http://simplepie.ogion.cz/ (use So it should be ready now. @mblaney Now you should:
|
nice one @jtojnar your demo site looks great! I get a 404 for that actions url though? |
@mblaney This is what I see in my fork: Or maybe we need someone with member status on the repo? |
@jtojnar I noticed some styling errors an the API Docs page. Take a look a the left sidebar: http://simplepie.ogion.cz/api/ This is how it looks like atm: http://simplepie.org/api/ |
2ba363f
to
5feb86f
Compare
@Art4 Tweaked the style, should be fixed now. |
Thank you @jtojnar. I also noted some other things:
|
bb9a312
to
9818cd8
Compare
One concern would be increased repo size:
Methodology
Then I ran For the compressed sizes, I ran This is not that drastic but it would be a permanent cost going forward so perhaps we should store the backup somewhere else. Also I noticed just |
@mblaney Actually, looks like WordPress supports export without the need to access the database. Are you able to log into the administration and get the export from http://simplepie.org/blog/wp-admin/export.php? And are you able to download the contents of the FTP server and upload it as an archive here? |
Working on this now. Sorry for the delay. |
Backing up the |
@skyzyx You can also try uploading the following PHP script and run it to create an archive on the server. It should be much faster than downloading individual files: <?php
set_time_limit(0);
error_reporting(E_ALL);
$zip = new ZipArchive();
$zip->open(__DIR__ . '/simplepie_website_backup.zip', ZipArchive::CREATE);
echo 'Creating zip archive<br>';
$directory = new \RecursiveDirectoryIterator(
// Or change the directory path.
__DIR__,
FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO | FilesystemIterator::SKIP_DOTS
);
$iterator = new \RecursiveIteratorIterator($directory);
foreach ($iterator as $info) {
echo 'Adding ' . $info->getPathname() . '<br>';
$zip->addFile($info->getPathname());
}
$zip->close();
echo 'Finished<br>'; |
It took some time to finish the download, but it finally completed at 7.4 GiB. I removed the cache files, tarred the directory, and gzipped it with
|
@mblaney I wanted to use the files from the backup as a base for generating since the scraping is not perfect ( But since I do not have an access to FTP so I cannot access the backup. Could you please re-upload it somewhere publicly available? Also do you have access to the WordPress administration? The export would be helpful for similar reason. Otherwise, if the WordPress installation is too broken, could you try getting a database dump, e.g. by uploading a tool like Adminer and exporting the database using the credentials from wp-config.php file? |
hi @jtojnar the backup is just the wordpress install, so nothing usable like that is it? It appears to be too broken to log in, and I don't want to re-upload because it contains login credentials (even though I can't use them). I can try the database dump if you like, but not sure that will provide anything better than scraping? |
@mblaney IIRC the wiki system stores the content in the directory so that is the main thing I am after. The issue with scraping is that it is incomplete – there are some pages missing or returning error 500. I managed to get some of them out of internet archive but DB dump would be preferred since we can never be certain if |
This was changed in simplepie#745 but without any rationale. The only HTML file is in tests and that should not be manually edited at all.
Ran the following within `nix-shell -I 'nixpkgs=channel:nixos-unstable' -p zola` to create the website tree: zola init docs Filled in the website URL and disabled everything for now. Then created templates based on the successive commits.
There are only two markdown files and both use 2 spaces.
Ran the following within `nix-shell -I 'nixpkgs=channel:nixos-unstable' -p wget2 yq-go dos2unix 'python3.withPackages (ps: with ps; [ beautifulsoup4 pypandoc ])' nodePackages.prettier` ```sh # Download the website contents from the web, and the pages that fail with error 500 from Internet Archive. wget2 --user-agent 'Mozilla/5.0 (X11; Linux x86_64; rv:106.0) Gecko/20100101 Firefox/106.0' --mirror --force-directories --no-robots --retry-on-http-error=403 --http2-request-window=1 --random-wait --exclude-directories=/wiki/lib/exe/ http://simplepie.org rm simplepie.org/blog/2006/03/06/forums-powered-by-punbb/index.html wget2 https://web.archive.org/web/20190404091911/simplepie.org/blog/2006/03/06/forums-powered-by-punbb/ --directory-prefix=simplepie.org/blog/2006/03/06/forums-powered-by-punbb rm simplepie.org/blog/2012/10/30/simplepie-1-3-1-is-now-available/index.html sed -i 's/\xbb//' docs/content/blog/2006-03-06-forums-powered-by-punbb.html # fix encoding wget2 https://web.archive.org/web/20210812123158/https://simplepie.org/blog/2012/10/30/simplepie-1-3-1-is-now-available/ --directory-prefix=simplepie.org/blog/2012/10/30/simplepie-1-3-1-is-now-available # Copy the downloaded contents into the website tree. cp -r simplepie.org/* docs/content mkdir -p docs/static mv docs/content/{scripts,favicon.ico,images,css,robots.txt} docs/static # Standardize line endings. dos2unix docs/** # Drop API docs, we will generate them later. rm -r docs/content/api # Drop mint (analytics), it is abandoned. rm -r docs/content/mint # Drop dynamically generated demo pages. rm -r docs/content/demo/newsblocks # Drop downloads – we will just link GitHub. rm docs/content/downloads/*\?* docs/content/downloads/*.zip # Drop ancient scripts, no more font replacement using flash, or tricks to make PNGs transparent in IE (Sleight). rm -r docs/static/css/sIFR-* docs/static/scripts/ # Download headers explicitly since they are currently rotated by PHP # and wget was not able to find them. rm docs/static/images/headers/rotate-old.php wget http://simplepie.org/images/headers/rotate-xspf.xml --directory-prefix docs/static/images/headers/ cat docs/static/images/headers/rotate-xspf.xml | yq -p=xml '"http://simplepie.org" + .playlist.trackList.track[].location' | xargs wget --directory-prefix docs/static/images/headers/ # Drop Wordpress plug-in clutter. rm -r docs/content/blog/wp-{content,includes,json} # Drop feeds. rm docs/content/blog/**/feed/index.html rmdir docs/content/blog/**/feed # Drop wiki noise. rm docs/content/wiki/lib/exe/css.php?* mv docs/content/wiki/lib/tpl/simplepie/wikistyles.css docs/static/css/ mv docs/content/wiki/lib/images/smileys/icon_exclaim.gif docs/static/images/ rm -r docs/content/wiki/lib rm docs/content/wiki/feed.php* rm -r docs/content/wiki/{_detail,_export} find docs/content/wiki/ -name '*\?idx=*' -exec rm '{}' \; find docs/content/wiki/ -name '*\?do=*' -exec rm '{}' \; rm docs/content/wiki/_media/wiki/dokuwiki-128.png docs/content/wiki/wiki/dokuwiki rmdir docs/content/wiki/wiki mv 'docs/content/wiki/_media/tutorial/update_simplepie_cache.jpg?cache=' 'docs/content/wiki/_media/tutorial/update_simplepie_cache.jpg' rm docs/content/wiki/_media/tutorial/update_simplepie_cache.jpg\?* # Add extension to wiki pages. find docs/content/wiki -type f ! -name '*.jpg' ! -name '*.html' -print0 | xargs -0 -I '{}' mv '{}' '{}.html' # Remove duplicate wiki page rm docs/content/wiki/faq/Supported_Character_Encodings.html echo /wiki/faq/Supported_Character_Encodings /wiki/faq/supported_character_encodings >> docs/static/_redirects rm docs/content/wiki/plugins/wordpress/simplepie_plugin_for_wordpress.1.html # Rename start files (used as directory index in DokuWiki) to _index.html used by Zola. find docs/content/wiki/ -name start.html | sed -E 's#(docs/content/(.*))/start.html#mv "\0" "\1/_index.html"; echo "/\2/start /\2/" >> docs/static/_redirects#g' | sh - # Simplify blog structure. rm -r docs/content/blog/page docs/content/blog/index.html find docs/content/blog/2* -name index.html | sed -E 's#docs/content/blog/(....)/(..)/(..)/(.*)/index.html#mv "\0" "docs/content/blog/\1-\2-\3-\4.html"#g' | sh - rmdir docs/content/blog/*/*/*/* rmdir docs/content/blog/*/*/* rmdir docs/content/blog/*/* rmdir docs/content/blog/???? ls docs/content/blog/*.html | sed -E 's#docs/content/blog/(....)-(..)-(..)-(.+)\.html#/blog/\1/\2/\3/\4/ /blog/\4/#g' >> docs/static/_redirects # Prepare redirects for Apache sed -i 's/^/Redirect 302 /' docs/static/_redirects mv docs/static/{_redirects,.htaccess} # Manually extracted main template into templates/. ```
Produced by markdownify.py
They are big and many are outdated.
So that Zola does not complain about being broken once we remove the wiki.
I downloaded the HTML from live site and created a script (
markdownify.py
) that will convert them to Markdown. This PR attempts to make everything work. We will remove/update the outdated content in a follow-up PR.To regenerate the markdown files, run
python3 markdownify.py
in the SimplePie branch after installingpython3-beautifulsoup4
,python3-pypandoc
andprettier
. Or if you have Nix, you can just runnix-shell -I 'nixpkgs=channel:nixos-unstable' -p 'python3.withPackages (ps: with ps; [ beautifulsoup4 pypandoc ])' nodePackages.prettier --run 'python3 markdownify.py'
.Currently I am using Zola as the generator, as I am most familiar with it. A different SSG can be used if preferred.
To preview run
zola serve
in thedocs
subdirectory.TODO
h1
out of individual pages into template.docs/
directory.Fixes: #543