Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tugboat Next.js server can be restarted via Composer command #17209

Open
7 of 9 tasks
Tracked by #17437 ...
timcosgrove opened this issue Feb 12, 2024 · 22 comments · May be fixed by #17400
Open
7 of 9 tasks
Tracked by #17437 ...

Tugboat Next.js server can be restarted via Composer command #17209

timcosgrove opened this issue Feb 12, 2024 · 22 comments · May be fixed by #17400
Assignees
Labels
Accelerated Publishing Blocked Issues that are blocked on factors other than blocking issues. Needs refining Issue status

Comments

@timcosgrove
Copy link
Contributor

timcosgrove commented Feb 12, 2024

Requirements

We want a Composer command that restarts a running Next.js server on Tugboat, so that we switch the code checkout used for the Next.js for QA purposes.

Acceptance criteria

Background & implementation details

This follows on from #16853. That ticket set up the infrastructure for choosing a branch for next-build and vets-website. The goal of this ticket is to make use of those branches and to restart the server.

In order to test preview functionality, the feature flag that currently manages CMS preview being switched between Content Build and Next Build will need to be enabled.

This was referenced Feb 14, 2024
@timcosgrove timcosgrove added the Blocked Issues that are blocked on factors other than blocking issues. label Feb 26, 2024
@alexfinnarn
Copy link
Contributor

alexfinnarn commented Feb 28, 2024

This work will be done based on the #17285 PR still in review to help with unblocking. Otherwise, we have to wait until that other PR is merged. We could keep updating the referenced PR but decided to branch off it instead.

@alexfinnarn alexfinnarn linked a pull request Mar 4, 2024 that will close this issue
19 tasks
@alexfinnarn
Copy link
Contributor

I spent some time looking over the Tugboat configuration and Tugboat documentation...probably more than I needed, but the build stages are a bit confusing and docs like https://docs.tugboatqa.com/reference/tugboat-configuration/ help to sort out the intricacies between builds with base previews, those without, and non-build related stages.

One thought I had while looking at the Tugboat config file relates to pre-built images. I know Tugboat maintains a Cypress related image at tugboatqa/cypress-included but in this repo's setup, all the Cypress related dependencies are gathered each build. Furthermore, I would think the VA could make an image on top of tugboatqa/php:8.1-apache-bookworm with the additional packages to save time in future builds.

One bad thing about building each time is that the versions of all these extra dependencies aren't versioned like we have for composer and npm dependencies. Seeing as how the Tugboat environments have been having reliability issues recently, using pre-built images and removing a lot of the package updates in the init stage would make sense to me. Then, only update the images when you need to update the base, e.g. when updating PHP versions and/or Cypress versions.

Another thing I noticed is a section where variables are echoed into a config file which makes checking out git branches harder. I wonder if we can simply set the env variables at that step?

# Old...
echo "NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}" >> ${TUGBOAT_ROOT}/next/envs/.env.tugboat

# New...
NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}

I think that would get merged into process.env without having to edit the envs file and therefore no need to stash/pop when trying to restart the next server.

Finally, getting to the actual point of this issue, I think the server can be killed and restarted in scripts/next-start.sh by first checking for a running process, killing any, and then starting the server again. Install and build commands don't interfere with the server commands and as long as the dependencies have been updated and the build without SSG enabled completes, the server should still respond to previews.

# In scripts/next-start.sh ...
cd ${ROOT}/next

# Kill any current running server.
# We can look for "/scripts/yarn/start.js" since that is what "yarn start" runs.
NEXT_SERVER_PIDS=$(ps aux | grep '[.]/scripts/yarn/start.js' | awk '{print $2}')

# In case we have multiple processes, loop through them.
for pid in ${NEXT_SERVER_PIDS}; do
    kill $pid
done

# Start the dev server. Vets-website assets need to be in place prior to this build.
yarn start

I might be missing that there are two builds, one for preview and one for checking to see the whole site builds, but I think previews are the main thing that needs to work in next-build, and this method will build then start that server.

@alexfinnarn
Copy link
Contributor

alexfinnarn commented Mar 5, 2024

Now, I'm getting this error on Tugboat which comes from bash -lc 'composer va:next:start' in the Tugboat config.yml file:

> # Prepare the next build server
> ! ./scripts/should-run-directly.sh || ./scripts/next-build.sh
/var/lib/tugboat/next/packages/env-loader/dist/cms-feature-flags.js:5
const removeTrailingSlash = (s) => s.endsWith('/') ? s.substring(0, s.length - 1) : s;

TypeError: Cannot read properties of undefined (reading 'endsWith')
    at removeTrailingSlash (/var/lib/tugboat/next/packages/env-loader/dist/cms-feature-flags.js:5:38)
    at getCmsFeatureFlags (/var/lib/tugboat/next/packages/env-loader/dist/cms-feature-flags.js:8:31)
    at processEnv (/var/lib/tugboat/next/packages/env-loader/dist/index.js:23:76)
    at Object.<anonymous> (/var/lib/tugboat/next/scripts/yarn/build.js:3:1)
    at Module._compile (node:internal/modules/cjs/loader:1191:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1245:10)
    at Module.load (node:internal/modules/cjs/loader:1069:32)
    at Function.Module._load (node:internal/modules/cjs/loader:904:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:22:47
Script ! ./scripts/should-run-directly.sh || ./scripts/next-build.sh handling the va:next:build event returned with error code 1

I did get this locally since I didn't copy the env file and properly set up next-build after cloning the repo. On Tugboat this should be fine with the env vars being written into the file also in config.yml but maybe the git stash/pop isn't working...so I'm rebuilding the PR attached to #17285 to see if that is successful.

One idea would be to turn the "echo variable into file" commands into a script and then call that after checking out the new code, if the problem is env vars not being picked up.

@alexfinnarn
Copy link
Contributor

alexfinnarn commented Mar 5, 2024

Also weird on the Tugboat previews...I added one commit to this issues draft PR that is based on another PR, but the size of the previews is way different.

Newer preview:
Screenshot 2024-03-05 at 1 38 46 PM

Older preview:
Screenshot 2024-03-05 at 1 38 40 PM

Since we are now cloning the repos multiple times and building...is this what's causing the Tugboat issues with space? Maybe we can't clone and build frontends both for next-build and content-build?

Also possible that we could look for files to delete in one of the stages before running commands...maybe some files are being cached that aren't needed?

@tjheffner
Copy link
Contributor

so the tugboat preview size is not necessarily tied to cloning the repos, it's tied to using the base preview or not. the newer PR env is using the base preview, so it's relatively small. the older one was built without a base preview (to test changes to init:), so it has the full size of everything. here's the size of the current CMS PR main base preview:
Screenshot 2024-03-05 at 10 52 13 AM

long way of saying, yes it is contributing to tugboat issues with space, but unfortunately there's no way around it until we are able to merge that PR (which depends on the BRD PR, which depends on them bumping the AMI) 😵

@alexfinnarn
Copy link
Contributor

alexfinnarn commented Mar 5, 2024

Okay, that makes sense. But then this Tugboat should be deleted, right? https://tugboat.vfs.va.gov/65d690d5067b1809aade9992 It is a test of the same PR without a base preview that then was abandoned after confirming an issue...I haven't exactly been going through all the Tugboats and making sure I don't have old ones around, but this is the first time I've seen why/how this could be an issue.

And then it makes me wonder if dependabot PRs should be sequenced so they only ever have so many open. Like maybe https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/.github/dependabot.yml#L9 should be 5 instead of 10?

On a quick count, I think about half the Tugboat PRs are dependabot...and IMHO, you ought to merge one in and then wait for the whole build and tests to run before merging in others. So, reducing that limit would reduce the desire to merge many in at once and potentially cause reliability issues.

@tjheffner
Copy link
Contributor

yeah, that first one could be deleted. the dependabot ones... usually they get moved through quicker, but I think with the other infra issues they've been lagging a bit. hopefully they get moved through this week, it's a CMS team task.

re: echoing env vars,

Another thing I noticed is a section where variables are echoed into a config file which makes checking out git branches harder. I wonder if we can simply set the env variables at that step?

Old...

echo "NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}" >> ${TUGBOAT_ROOT}/next/envs/.env.tugboat

New...

NEXT_PUBLIC_DRUPAL_BASE_URL=https://cms-${TUGBOAT_SERVICE_TOKEN}.${TUGBOAT_SERVICE_CONFIG_DOMAIN}

exporting directly to the ENV instead of to the .env file may work, but we have to test it. the next command is going to load envs from the .env.tugboat file based on the APP_ENV var being present (it's always present on tugboat). we do have a mechanism to pass vars as part of the commands, but that may break the script on different non-tugboat envs. so some fiddling is likely.

@alexfinnarn
Copy link
Contributor

oh yeah, and I think I have to build this with no preview or off of the other PR's preview...which I didn't specify on the Tugboat attached to this issue...so that is rebuilding now.

I will try debugging as much as I can locally and seeing why the git stash/pop might be causing issues or if that part is fine and it is another issue with env vars.

@alexfinnarn
Copy link
Contributor

Maybe I'll make some suggestions for tickets of Tugboat things to look into, but I wondered how the hell the previews are so big and then I saw the assets pulled in from a backup.

2024-03-05T20:11:12.636Z - 65e7718f067b1809aa330f30# /bin/sh -c curl --remote-name https://dsva-vagov-prod-cms-backup-sanitized.s3-us-gov-west-1.amazonaws.com/files/cms-prod-files-latest.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 24 63.8G   24 15.3G    0     0  75.4M      0  0:14:25  0:03:28  0:10:57 76.0M

63.8G is about two-thirds of the total size I saw for a preview without a base, and in Tugboat's own documentation they use stage_file_proxy to pull in needed files on demand in their config examples. I bet there is some networking restriction where Tugboat can't reach the production assets and therefore they all need to be backed up...or something like that; however, I've always seen junk in Drupal file directories. Maybe this files directory is clean (and I will look locally once my download finishes), but it will be the first time I've seen a clean files directory.

Another interesting benefit of using stage_file_proxy is that you could analyze the Tugboat files directory against production to see how much of a difference there is. Building on Tugboat or locally should show the currently used files whereas production could have many years of files that are no longer used but still backed up.

@alexfinnarn
Copy link
Contributor

Well, looky here: https://nextjs.org/docs/pages/building-your-application/configuring/environment-variables#referencing-other-variables So at least in page router, I think this works now?

Nope...it's only for variables within the actual .env file you are using or a higher-level .env file. I thought maybe it would be possible to write to a .env.fake file that isn't version controlled and then read from there, but that probably won't work. Oh well, it would be nice to simply use interpolation in the .env.tugboat and have dotenv support that by default but no use looking into that option any more.

I think https://github.com/motdotla/dotenv-expand/tree/master might work but I'm not going to try and modify next-build to make that happen. It should work to stash/pop...or maybe git reset, checkout, re-echo env vars, and then start the server.

@alexfinnarn
Copy link
Contributor

alexfinnarn commented Mar 6, 2024

I see now in Slack that Tugboat will likely be upgraded one minor version so that might play into this issue...but I will gather some questions for discussion purposes that I have from setting up next-build locally and how that might play into updating code and restarting the server.

  1. Should manual steps listed in next-build readme go into the next-install.sh script? I know the env vars are updated on Tugboat but not sure about vets-website part. For local setup, we'd only need to run ddev composer va:next:install and not any manual steps.
  2. If vets-web-setup.sh is re-run in next-build-frontend.sh do the symlinks mentioned here need to be recreated? https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/.tugboat/config.yml#L207
  3. I get an error prerendering page "/_playground/api-explorer" due to the oauth token so locally I delete that route. How do I get the proper keys? The readme says they are on AWS...
  4. I should make a fake PR in next-build so that I can check that out and make sure the server restarts with a noticeable change between the main branch and the PR branch. What is the best process to do that?
  5. How does the build:preview and start interact? I know restarting the server is needed and I think rebuilding as well, but not sure if there's more to preview vs. other server needs like building everything out.

@alexfinnarn
Copy link
Contributor

Answers from chat with @tjheffner...

  1. Might be a good idea but no need to think about now. Possible candidate for making team transition easier.
  2. Symlinks should persist, but recreate them if they don't exist.
  3. Is an actual bug but a fix in content release workflow. will open up a smaller PR. Can get by with secrets in .env.local and keys from Tugboat.
  4. Try changing the layout template and adding dummy CSS. Keep as draft PR and add do not merge label.
  5. Two build commands. Preview builds API/preview routes and server. we are only concerned with previews, not the whole build.

@alexfinnarn
Copy link
Contributor

I thought this error was related to the symlinking so I added those commands before the vets-web-setup.sh call in next-build-frontend.sh but it still errored at the same part.

error Command failed with exit code 137.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
cp: cannot stat './script/../build/localhost/generated/vendor.entry.js': No such file or directory
error Command failed with exit code 1.

I will look into this tomorrow.

@alexfinnarn
Copy link
Contributor

This actually fails locally for me even on the main branch even with just running scripts/vets-web-setup.sh. I get the same error as above. I thought maybe this was an issue with generated assets colliding when trying to rebuild, but I would expect the first build to succeed.

Now I will read more on the vets-website codebase and why this code is needed and fails.

# From "yarn build" in vets-website...

# Only run Webpack if the assetSource = local
if [ "${assetSource}" = "local" ]; then
    echo "Building application assets"
    yarn build:webpack $webpackArgs
   # This step fails...
    cp -v "${buildDir}generated/vendor.entry.js" "${buildDir}generated/shared-modules.entry.js"
else
    echo "Will fetch application assets from the content build script"
fi

@alexfinnarn
Copy link
Contributor

Even if I comment out the copy step the script fails. The webpack build runs NODE_OPTIONS=--max-old-space-size=8192 webpack --config config/webpack.config.js and I see that a SIGINT 137 is when the system kills the script potentially for memory issues. Since the -max-old-space-size=8192 is set, it makes me think they added that due to webpack consuming too much memory. I bet the build fails due to memory issues and that's why the directory doesn't exist.

So, I will play around with changing that setting and looking at building this not in the DDEV container.

@alexfinnarn
Copy link
Contributor

Success on the vets-website build in DDEV! I had to increase my memory allocation for Docker to 10GB but it worked after that. For some reason, I had 7.9GB before so maybe 8GB would have worked.

I will make a note in the AC to update any docs on Docker resources for setting up the va.gov-cms project since I had to alter mine. Not sure if 8GB is default for Docker Desktop or not.

@alexfinnarn
Copy link
Contributor

alexfinnarn commented Mar 7, 2024

Seems like things are working locally. I have the whole shebang running from "/admin/content/deploy/next", and the next server appears to be running with the script exiting successfully.

# From the next-build.txt log file.
> ! ./scripts/should-run-directly.sh || ./scripts/next-start.sh
Started next server with PID: 51391
> ./scripts/should-run-directly.sh || ddev composer va:next:start --
==> Done

# Looking for server start command.
alexfinnarn@va-gov-cms-web:/var/www/html$ ps -ax | grep "[.]/scripts/yarn/start.js"
51841 pts/0    Sl     0:01 /run/rosetta/rosetta /usr/local/bin/node /usr/local/bin/node --no-opt -r /proc/.reset ./scripts/yarn/start.js

Even though the pids don't match, when I killed 51841 the other process of 51391 was killed.

The thing I'm not sure how to test is exposing the port from within the PHP web container since it's not started locally on my machine. I see a vhost for storybook and maybe next-build should have one too? However, I think things are working well enough that I will start a build on Tugboat and see what happens.

@alexfinnarn
Copy link
Contributor

After spending a while debugging Tugboat builds, I figured out an error in the Tugboat shell. Apparently, even though APP_ENV is set earlier in the next build/start scripts, the yarn command doesn't pick up that env var so it is undefined when needed. Switching to APP_ENV=${APP_ENV} yarn in those commands does get the env var picked up. I think I removed that from the code thinking it would be picked up and the code would be cleaner, but after putting it back the build/start commands picked up the right env vars.

I also found that skipping the files and tests can save at least 30 minutes, and that made me wish we had commit flags for CI so we could turn things off when we want to debug faster rather than having to comment out lines in a config file. I've used commit flags to target only certain tests being run and it was helpful.

@alexfinnarn
Copy link
Contributor

hmm...I now have the PR working on Tugboat but I had to manually check if the next server was running and restart it. At first, I wasn't seeing a red footer like the PR I uploaded, but after manually restarting I did see what I wanted to see.

I think the issue might be with the code looking to kill the server and since that code is run in the background and via delegation scripts I'm not sure what the right processes are at this point. I saw next-route-worker that I had to kill to actually stop the next server entirely.

So, the current code might be building the next site version, updating git info, trying to kill the wrong PIDs, and then the changes don't look like they've taken effect since the server isn't actually restarted. I didn't see anything in the logs as far as errors go. I'll try rebuilding and cataloging the PIDs/processes that are up when Tugboat starts and then as the server is restarted. I was hoping to grep for a certain string but not sure that will work.

@alexfinnarn
Copy link
Contributor

After a few more Tugboat build errors...sometimes it says it's done half-way through running test, which is a lie...I was able to have next running by default and then switch to a branch with a red footer. However, upon trying to reset the environment for another QA tester, the footer is still red and the header looks broken. I didn't see any errors in the logs.

I'm not sure if this matters since someone would probably only switch branches once per environment...and once we're past the base preview issue, they should be able to refresh the preview to try another branch.

I will look into this more, but the code seems really close to doing what is expected.

@alexfinnarn
Copy link
Contributor

Actually, it works on Tugboat if I choose a branch and then another branch to build. Looking at the code, I don't think the current content-build process takes into account someone resetting things to the main branch as there is no checkout that happens in that case...so the code could be improved in that regard but if content-build doesn't have it, then no need to have next-build have it. Plus, I'm not sure how often, if ever, people use this feature. If people use it often maybe there is a case to keep improving but for now I think it is okay to have a dev reset the git state via Tugboat shell instead of add code if people want to switch back to the main, default git branch.

@timcosgrove
Copy link
Contributor Author

still blocked by needing updated AMI for BRD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accelerated Publishing Blocked Issues that are blocked on factors other than blocking issues. Needs refining Issue status
Projects
None yet
3 participants