You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This will be tricky to reproduce, so I'm taking a best guess stab based on what we observed when this occurred for us.
Configure a pipeline to commit a "test-file-this-is-the-wrong-name" file to a git repo, and push it up to origin/main. Trigger this job every minute. If there's nothing to commit, that's ok, just end the job with success.
Kick off a long-running job in a separate pipeline (ours had been running from Jan 31 through Feb 9th (trying to claim a locked pool-resource).
In the mean time, modify the configuration of the first pipeline to commit "test-file-this-is-the-right-name" instead of the wrongly-named file being committed.
Wait a few hours, or days, then manually remove the "test-file-this-is-the-wrong-name" file from the repo, and push the change up to origin/main.
Wait for a ghost build to create "test-file-this-is-the-wrong-name" and add it back to the repo.
Kill the long-running job.
Repeat.
Expected results
Step 4 should never occur, and all builds kicked off for the pipeline should appear in the Concourse UI.
Actual results
Step 4 occurs, and the build does not show up in the Concourse UI. Similarly, searching the Concourse database for a record of the build fails.
However, atc logs show that the build was in fact triggered, but the build ID used cannot be found anywhere in the database.
Additional context
We have 3 web nodes, and 1 database node. We had unique job names between test-file-this-is-the-wrong-name and test-file-this-is-the-right-name which made it easier to find the logs for the ghost build.
This may have to do with database locking, but IDK. I would believe this issue to be so farfetched to not be real, if I had not observed it in our logs + not in our database.
ATC logs from our environment:
34551:{"timestamp":"2024-02-08T17:31:17.952396875Z","level":"info","source":"atc","message":"atc.tracker.notify.run.put-step.finished","data":{"build":"1","build_id":311872714,"exit-status":0,"job":"bump-healthchecker-in-networking-release","job-id":72956,"pipeline":"sandbox","session":"28.74164.3.73","step-name":"cf-networking-repo","team":"wg-arp-networking","version-info":{"version":{"ref":"60252ecaec6b65310415664163ff333b78933891"},"metadata":[{"name":"commit","value":"60252ecaec6b65310415664163ff333b78933891"},{"name":"author","value":"App Platform Runtime Working Group CI Bot"},{"name":"author_date","value":"2024-02-08 17:31:02 +0000"},{"name":"committer","value":"App Platform Runtime Working Group CI Bot"},{"name":"committer_date","value":"2024-02-08 17:31:02 +0000"},{"name":"message","value":"Upgrade silk-healthchecker\n"},{"name":"url","value":"https://github.com/cloudfoundry/cf-networking-release/commit/60252ecaec6b65310415664163ff333b78933891"}]}}}
web.dfabe417-7b47-4301-89f1-2db6ae4ada96.2024-02-09-21-21-39/web/web.stdout.log.4
108792:{"timestamp":"2024-02-08T17:31:31.843101381Z","level":"info","source":"atc","message":"atc.tracker.notify.run.put-step.finished","data":{"build":"1","build_id":311872719,"exit-status":0,"job":"bump-package-golang","job-id":72955,"pipeline":"sandbox","session":"28.74173.3.104","step-name":"cf-networking-repo","team":"wg-arp-networking","version-info":{"version":{"ref":"60252ecaec6b65310415664163ff333b78933891"},"metadata":[{"name":"commit","value":"60252ecaec6b65310415664163ff333b78933891"},{"name":"author","value":"App Platform Runtime Working Group CI Bot"},{"name":"author_date","value":"2024-02-08 17:31:02 +0000"},{"name":"committer","value":"App Platform Runtime Working Group CI Bot"},{"name":"committer_date","value":"2024-02-08 17:31:02 +0000"},{"name":"message","value":"Upgrade silk-healthchecker\n"},{"name":"url","value":"https://github.com/cloudfoundry/cf-networking-release/commit/60252ecaec6b65310415664163ff333b78933891"}]}}}
Searching our ATC database for the build IDs mentioned in those logs:
atc=> select * from builds where id in ('311872719', '311872714');
id | name | status | scheduled | start_time | end_time | schema | private_plan | completed | job_id | reap_time | team_id | manually_triggered | interceptible | nonce | public_plan | pipeline_id | drained | create_time | aborted | rerun_of | rerun_numbe
r | inputs_ready | needs_v6_migration | span_context | resource_id | resource_type_id | created_by
----+------+--------+-----------+------------+----------+--------+--------------+-----------+--------+-----------+---------+--------------------+---------------+-------+-------------+-------------+---------+-------------+---------+----------+------------
--+--------------+--------------------+--------------+-------------+------------------+------------
(0 rows)
Screenshot of commit not being listed as an output of any builds:
Summary
Steps to reproduce
This will be tricky to reproduce, so I'm taking a best guess stab based on what we observed when this occurred for us.
Expected results
Step 4 should never occur, and all builds kicked off for the pipeline should appear in the Concourse UI.
Actual results
Step 4 occurs, and the build does not show up in the Concourse UI. Similarly, searching the Concourse database for a record of the build fails.
However, atc logs show that the build was in fact triggered, but the build ID used cannot be found anywhere in the database.
Additional context
We have 3 web nodes, and 1 database node. We had unique job names between
test-file-this-is-the-wrong-name
andtest-file-this-is-the-right-name
which made it easier to find the logs for the ghost build.This may have to do with database locking, but IDK. I would believe this issue to be so farfetched to not be real, if I had not observed it in our logs + not in our database.
ATC logs from our environment:
Searching our ATC database for the build IDs mentioned in those logs:
Screenshot of commit not being listed as an output of any builds:
Timeline:
Triaging info
The text was updated successfully, but these errors were encountered: