Skip to content

Conversation

@farres1
Copy link
Contributor

@farres1 farres1 commented Jul 15, 2025

Motivation and Context

Enables containers to be automatically built and pushed to Artifact Registry, and deployed to Cloud Run on merge.

What has changed

Adds scripts to be run when changes are pushed to a PR, or when changes are merged to a branch

How to test?

  • Run triggers associated with these scripts in a dev project (in Google Cloud)
  • Check the triggers correctly build and push containers to Artifact Registry
  • Check the trigger associated with cloud-build-on-merge-dev correctly deploys to Google Cloud Run

Links

Ticket - https://jira.ons.gov.uk/browse/EQS-81

@farres1 farres1 requested review from a team as code owners July 15, 2025 11:22
@farres1 farres1 marked this pull request as draft July 22, 2025 13:31
@petechd petechd marked this pull request as ready for review July 31, 2025 08:01
Copy link
Contributor

@liamtoozer liamtoozer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, I would maybe prefer handling the release pipelines when we're ready 🤔 Also there are a few differences between this and the Converter Service now. We probably want to keep things consistent but happy to go with the flow

@@ -0,0 +1,12 @@
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding, but I'm not certain what this file is to be used for, as it's not used in the PR linked to this? I'm guessing that this file might be for our formal deployments, as I can see that it's using TAG_NAME? If so, might be worth suffixing the file with -formal.yaml or something to make it clear? We don't have tagging or version control set up yet so bit unsure? Manual Cloud Build triggers can be overridden in our infrastructure with the cloud_build_manual_trigger_for_app_enabled var I think?

Also, I'm not sure if this will preserve the same flags that we use for the Cloud Run deployment config in the infrastructure?

Copy link

@martyncolmer martyncolmer Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be for the deployment to prod or anywhere deploy only is required. We are looking to name the scripts after the function they carry out. So maybe cloudbuld-deploy-only.yaml would be an better name.

Copy link
Contributor

@liamtoozer liamtoozer Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is currently no reference to this file in our infrastructure repo, does that mean we will be manually creating a deployment trigger in our formal environments which uses this file, outside of the Terraform altogether? If that's the case, we might want to document this somewhere as part of our run books, release strategy documentation, or maybe as a comment in the file?

@@ -0,0 +1,44 @@
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite different to the one we have in the converter service: https://github.com/ONSdigital/eq-cir-converter-service/blob/main/cloud-build-on-merge-dev.yaml, and it seems to be named differently? I think the Converter Cloud Build file might have the advantage of being more concise, not relying on shell script for the steps (so might be less error prone), and matches Cloud Build best practices? We should aim for consistency across repos I think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are based of the authors cloudbuild scripts, which have been about for a while and used to be how it was done. We can update this to replace "docker" with "gcr.io/cloud-builders/docker", using the shell entry point does allow us to add log message, but i'm happy to drop those and use the default entry point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if using SHORT_SHA would work as expected for our staging env. In the Terraform, we're setting the latest tag in our staging environment tfvars to allow for continuous delivery/deployment. If we deploy with this particular file, then we would also need to keep updating our tfvars with the SHORT_SHA var each time we want to make a deployment. Using latest tag for this would prevent this I think?

- "-c"
- |
gcloud container images add-tag \
$_EQ_CIR_PROXY_SERVICE_IMAGE_REPO:$SHORT_SHA \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding, but for continuous delivery for our staging env, I would have thought we'd want to tag with latest?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is tag and deploy. it would be used for release to preprod after a release tag has been set in github

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is currently no reference to this file in our infrastructure repo, does that mean we will be manually creating a deployment trigger in our formal environments which uses this file, outside of the Terraform altogether? If that's the case, we might want to document this somewhere as part of our run books, release strategy documentation, or maybe as a comment in the file?

@@ -0,0 +1,34 @@
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to lint and test in Cloud Build (instead of GitHub Actions), I think we might want to also update this for the Converter Service too?

Also, we're still using GitHub Actions for linting and testing as it stands too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that for convertor too. We did discuss getting this through and review the best approach in a later ticket. At that discussion we can decide if we want to get rid of github actions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay no worries. Just a quick follow up - if we're running linting and testing both in GitHub Actions and in GCP, then it might be worth a temporary comment just to call this out? Otherwise it may be a bit confusing for others if we're intentionally doing both in the interim?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of thoughts about this since it came up during the session, and the pros and cons of running lint/test in cloud build vs actions. From a developer QoL perspective, is it not easier to investigate failing tests within the Pull Request within GitHub rather than having to context switch to GCP and navigating the cloudbuild logs? Could just be me but would be curious to see the opinions of the other devs.

On top of this as commented below, it opens up some issues such as needing to run docker within docker for things like Megalinter, which I believe can potentially be a security vulnerability, and is certainly not best practice. I think we also need to ensure when running megalinter in cloudbuild that it is making use of the same config as when it is running locally.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE the first point, we've had tests in cloud build for a long time now and noone has had issue with switching into GCP to see the test results. For me having all the CI scripts in one place and format, rather than mix of both is preferable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for the script to be rejigged to remove the SH entry point and to use the default entry point

echo "*************************************************************"
echo "* Running Megalinter on the codebase... *"
echo "*************************************************************"
docker run --platform linux/amd64 --rm \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this build step in Cloud Run is run inside a docker a container already, so this will run a docker container inside another docker container so not sure if this looks quite right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I does look odd, will need to break down what this is doing exactly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be done as a separate piece of work?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for it to be changed to argument based approach with the default entry point, rather than the sh entry point

@petechd petechd force-pushed the EQS-81-proxy-infrastructure branch from df49137 to 8ffc1dd Compare August 4, 2025 09:57
echo "*************************************************************"
echo "* Deploying Docker image to Cloud Run... *"
echo "*************************************************************"
gcloud run deploy cir-proxy-service \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might lose all of the vars we're passing in for our Terraform deployment? E.g. LOG_LEVEL isn't being passed in here

args:
- "-c"
- |
gcloud run deploy cir-proxy-service \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might lose all of the vars we're passing in for our Terraform deployment? E.g. LOG_LEVEL isn't being passed in here

- name: docker
id: push_proxy_service_image
entrypoint: "sh"
waitFor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps are all run in sequence, so don't think we'd need waitFor at the moment?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed

@liamtoozer liamtoozer added the do not merge Do not merge this PR label Sep 18, 2025
@liamtoozer
Copy link
Contributor

Picking this PR up. Added do not merge label until dependent infrastructure PR is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Do not merge this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants