Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to automate and schedule backups #553

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

frouioui
Copy link
Member

@frouioui frouioui commented May 13, 2024

Description

This Pull Request adds a new CRD called VitessBackupSchedule. Its main goal is to automate and schedule backups of Vitess, taking backups of the Vitess cluster at regular intervals based on a given cron schedule and Strategy. This new CRD is managed by the VitessCluster, like most other components of the vitess-operator, the VitessCluster controller is responsible for the whole lifecycle (creation, update, deletion) of the VitessBackupSchedule object in the cluster. Inside the VitessCluster it is possible to define several VitessBackupSchedules as a list, allowing for multiple concurrent backup schedules.

Among other things, the VitessBackupSchedule object is responsible for creating Kubernetes's Job at the desired time, based on the user-defined schedule. It also keeps track of older jobs and delete them if they are too old, according to user-defined parameters (successfulJobsHistoryLimit & failedJobsHistoryLimit). The jobs created by the VitessBackupSchedule object will use the vtctld Docker Image and will execute a shell command that is generated based on the user-defined strategies. The end user can define as many backup strategy per schedule, each of them mocks what vtctldclient is able to do, the Backup and BackupShard commands are available, a map of extra flags enable the user to give as many flag as they want to vtctldclient.

A new end-to-end test is added to our BuildKite pipeline as part of this Pull Request to test the proper behavior of this new CRD.

Related PRs

Demonstration

For this demonstration I have setup a Vitess cluster by following the steps in the getting started guide, until the very last step where we must apply the 306_down_shard_0.yaml file. My cluster is then composed of 2 keyspaces: customer with 2 shards, and commerce unsharded. I then modify the 306... yaml file to contain the new backup schedule, as seen in the snippet right below. We want to create two schedules, one for each keyspace. The keyspace customer will have two backup strategies: one for each shard.

apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  name: example
spec:
  backup:
    engine: xtrabackup
    locations:
      - volume:
          hostPath:
            path: /backup
            type: Directory
    schedules:
      - name: "every-minute-customer"
        schedule: "* * * * *"
        resources:
          requests:
            cpu: 100m
            memory: 1024Mi
          limits:
            memory: 1024Mi
        successfulJobsHistoryLimit: 2
        failedJobsHistoryLimit: 3
        strategies:
          - name: BackupShard
            keyspaceShard: "customer/-80"
          - name: BackupShard
            keyspaceShard: "customer/80-"
      - name: "every-minute-commerce"
        schedule: "* * * * *"
        resources:
          requests:
            cpu: 100m
            memory: 1024Mi
          limits:
            memory: 1024Mi
        successfulJobsHistoryLimit: 2
        failedJobsHistoryLimit: 3
        strategies:
          - name: BackupShard
            keyspaceShard: "commerce/-"
  images:

Once the cluster is stable, all tablets are serving and ready, I re-apply my yaml file with the backup configuration:

$ kubectl apply -f test/endtoend/operator/306_down_shard_0.yaml 
vitesscluster.planetscale.com/example configured

Immidiately I can check that the new VitessBackupSchedule objects have been created.

$ kubectl get VitessBackupSchedule 
NAME                                          AGE
example-vbsc-every-minute-commerce-ac6ff735   7s
example-vbsc-every-minute-customer-8aaaa771   7s

Now I want to check the pods where the jobs created by VitessBackupSchedule are running. After about 2 minutes, we can see four pods, two for each schedule. The pods are marked as Completed as they finished their job.

$ kubectl get pods
NAME                                                           READY   STATUS             RESTARTS        AGE
...
example-vbsc-every-minute-commerce-ac6ff735-1715897700-nkfzx   0/1     Completed          0              79s
example-vbsc-every-minute-commerce-ac6ff735-1715897760-qr4hp   0/1     Completed          0              19s
example-vbsc-every-minute-customer-8aaaa771-1715897700-rbsmd   0/1     Completed          0              79s
example-vbsc-every-minute-customer-8aaaa771-1715897760-kzn8t   0/1     Completed          0              19s
...

Now let's check our backup:

$ ls -l vtdataroot/backup/example/commerce/- vtdataroot/backup/example/customer/80- vtdataroot/backup/example/customer/-80 

vtdataroot/backup/example/commerce/-:
total 0
drwxr-xr-x  11 florentpoinsard  staff  352 May 16 16:15 2024-05-16.221502.zone1-0790125915
drwxr-xr-x  11 florentpoinsard  staff  352 May 16 16:16 2024-05-16.221602.zone1-0790125915

vtdataroot/backup/example/customer/-80:
total 0
drwxr-xr-x  11 florentpoinsard  staff  352 May 16 16:15 2024-05-16.221502.zone1-2289928654
drwxr-xr-x  11 florentpoinsard  staff  352 May 16 16:16 2024-05-16.221601.zone1-2289928654

vtdataroot/backup/example/customer/80-:
total 0
drwxr-xr-x  11 florentpoinsard  staff  352 May 16 16:15 2024-05-16.221511.zone1-4277914223
drwxr-xr-x  10 florentpoinsard  staff  320 May 16 16:16 2024-05-16.221609.zone1-2298643297

$ kubectl get vtb --no-headers
example-commerce-x-x-20240516-221502-2f185d5b-1854be28    2m7s
example-commerce-x-x-20240516-221602-2f185d5b-0a248174    67s
example-customer-80-x-20240516-221511-fefbca6f-8ede9c7d   2m7s
example-customer-80-x-20240516-221609-89028361-d9d1c1e4   67s
example-customer-x-80-20240516-221502-887d89ce-2fc618f4   2m7s
example-customer-x-80-20240516-221601-887d89ce-5b5b0acb   66s

frouioui added 20 commits May 8, 2024 10:30
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui frouioui changed the title Add VitessBackupSchedule VitessBackupSchedule add the ability to automate backups May 16, 2024
@frouioui frouioui changed the title VitessBackupSchedule add the ability to automate backups Add the ability to automate and schedule backups May 16, 2024
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui frouioui marked this pull request as ready for review May 17, 2024 17:01
@frouioui frouioui requested a review from mattlord May 17, 2024 17:03
Copy link
Collaborator

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment on lines -6586 to -6601
---
apiVersion: scheduling.k8s.io/v1
description: The vitess-operator control plane.
globalDefault: false
kind: PriorityClass
metadata:
name: vitess-operator-control-plane
value: 5000
---
apiVersion: scheduling.k8s.io/v1
description: Vitess components (vttablet, vtgate, vtctld, etcd)
globalDefault: false
kind: PriorityClass
metadata:
name: vitess
value: 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to delete these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not removing this, just moving it earlier in the file. I am using the raw output of kustomize, which I think is what we should do to avoid conflict in the future.

Comment on lines -6585 to -6600
---
apiVersion: scheduling.k8s.io/v1
description: The vitess-operator control plane.
globalDefault: false
kind: PriorityClass
metadata:
name: vitess-operator-control-plane
value: 5000
---
apiVersion: scheduling.k8s.io/v1
description: Vitess components (vttablet, vtgate, vtctld, etcd)
globalDefault: false
kind: PriorityClass
metadata:
name: vitess
value: 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #553 (comment)

Comment on lines +65 to +67
var watchResources = []client.Object{
&kbatch.Job{},
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is only being used once, that too in a for loop when it only has 1 value. Why don't we unfurl that value and use that directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just re-used the same pattern we use throughout the codebase. In other places, even if there is a single element, we use it this way. Making it easy to understand all the resources created by a given controller when reading the top of the file.

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
return job, nil
}

func (r *ReconcileVitessBackupsSchedule) createJobPod(ctx context.Context, vbsc *planetscalev2.VitessBackupSchedule, name string) (pod corev1.PodSpec, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we taking the vtctld action timeout into account?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so, what are you referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vtctldclient has a default command timeout of 1 hour, which is controlled by the --action-timeout flag. We will almost certainly want to increase that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or make it a flag, like extraFlags, on the backup schedule

Copy link
Member Author

@frouioui frouioui May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is solved as people can set --action-timeout by adding it to the extraFlags field of VitessBackupScheduleStrategy. The flags in set on extraFlags will be passed down to vtctldclient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth adding a note about that in release notes. i expect it will be a common issue people run in to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed via 9739a94

// This field will be ignored if we have picked the strategy BackupShard.
// +optional
// +kubebuilder:example="zone1-0000000102"
TabletAlias string `json:"tabletAlias"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kinda question the value of cron-scheduled Backup:

  • it's not uncommon to delete a PVC/Pod because it's in bad shape, and having to update backup schedules anytime that happens will be annoying.
  • a image/config rollout may switch a tablet from replica to primary, which means the next time a backup schedule runs it will take the primary offline.

would say it's not worth adding this strategy with these two potential footguns, unless it is a hotly requested feature.

Copy link
Member Author

@frouioui frouioui May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I think it is safe to drop this strategy especially if I add what you are mentioning in #553 (comment). Having two strategies: BackupShard and BackupCluster, would give enough flexibility for most use cases. I am thinking we can even add a third BackupKeyspace strategy, which is a good in-between solution between BackupShard and BackupCluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. BackupShard is a higher level request that can be intuitively and correctly executed over time as the cluster state changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment #553 (comment), which explains the new strategies.

// This field will be ignored if we have picked the strategy BackupTablet.
// +optional
// +kubebuilder:example="commerce/-"
KeyspaceShard string `json:"keyspaceShard,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a shard only has 1 replica, then the BackupShard strategy will mean all @replica queries will fail. may be worth having an option like MinHealthyReplicas for that strategy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @replica will fail because the tablet will not be able to respond query will being backed up? In that case, yeah it would make sense to add such parameter for the user to configure, with a default to 2 for instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. At least when not using online backup methods like xtrabackup.

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! I only had some relatively minor comments. I'll come back and do another pass once you've made whatever changes you like based on the comments from myself and Max.

Thanks!

.gitignore Outdated Show resolved Hide resolved
.buildkite/pipeline.yml Outdated Show resolved Hide resolved
deploy/role.yaml Show resolved Hide resolved
Comment on lines +2641 to +2647
<p>Strategy defines how we are going to take a backup.
If you want to take several backups within the same schedule you can add more items
to the Strategy list. Each VitessBackupScheduleStrategy will be executed by the same
kubernetes job. This is useful if for instance you have one schedule, and you want to
take a backup of all shards in a keyspace and don&rsquo;t want to re-create a second schedule.
All the VitessBackupScheduleStrategy are concatenated into a single shell command that
is executed when the Job&rsquo;s container starts.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means / looks like. An example somewhere would be nice IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an example in the CRD:

// +kubebuilder:example="-"
// +kubebuilder:example="commerce"

Copy link
Member Author

@frouioui frouioui May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In YAML it looks like that, which is what people can find in the examples/tests of the repo:

        strategies:
          - name: BackupShard
            keyspace: "commerce"
            shard: "-"

docs/api/index.html Show resolved Hide resolved
test/endtoend/backup_schedule_test.sh Outdated Show resolved Hide resolved
test/endtoend/backup_schedule_test.sh Show resolved Hide resolved
test/endtoend/backup_schedule_test.sh Outdated Show resolved Hide resolved
test/endtoend/backup_schedule_test.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@maxenglander maxenglander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another thought, might be nice to give users a way to assign annotations, and one or more affinity selection options to the backup runner pods. that way they can influence things scheduling and eviction.

for example, users might not want backup runner pods running on the same nodes as vttablet pods. and they might not want the backup runner pods to get evicted by an unrelated pod after they've been running for a long time.

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui
Copy link
Member Author

In commit bc74ab4, I have applied one of the most important suggestion discussed above which is to remove the BackupTablet strategy in favor of BackupKeyspace and BackupCluster. The strategies can be used as follows:

# BackupKeyspace
        strategies:
          - name: BackupKeyspace
            cluster: "example"
            keyspace: "customer"
# BackupCluster
        strategies:
          - name: BackupCluster
            cluster: "example"

Meanwhile, the BackupShard strategy does not change. When ran we can see the following command line argument in the job's pod, which gets executed upon creation of the container:

# BackupKeyspace
Args:
      /bin/sh
      -c
      /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/-80 && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/80-
# BackupCluster
Args:
      /bin/sh
      -c
      /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard commerce/- && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/-80 && /vt/bin/vtctldclient --server=example-vtctld-625ee430:15999 BackupShard customer/80-

cc @maxenglander @mattlord


// Cluster defines on which cluster you want to take the backup.
// This field is mandatory regardless of the chosen strategy.
Cluster string `json:"cluster"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i follow why this is necessary. my mental model is that a user defines []VitessBackupScheduleTemplate on the ClusterBackupSpec, and so implicitly each VitessBackupScheduleStrategy will be associated with the cluster where ClusterBackupSpec is defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point @maxenglander, it is pretty useless. I ended up removing that field from VitessBackupScheduleStrategy and adding it to VitessBackupScheduleSpec. The VitessCluster controller will come and fill that new field when it create a new VitessBackupSchedule object, that way VitessBackupSchedule is still be able to select existing components given their cluster names to avoid fetching wrong data in the event where we have multiple VitessCluster running in our K8S cluster.

See b30aa09 for the change.

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
@frouioui
Copy link
Member Author

another thought, might be nice to give users a way to assign annotations, and one or more affinity selection options to the backup runner pods. that way they can influence things scheduling and eviction.

for example, users might not want backup runner pods running on the same nodes as vttablet pods. and they might not want the backup runner pods to get evicted by an unrelated pod after they've been running for a long time.

In e6946fb I have added affinity and annotations in the VitessBackupScheduleTemplate, allowing the user to configure the affinity and annotations they want for their pods that take backups.

@@ -40,6 +40,8 @@ const (
TabletPoolNameLabel = LabelPrefix + "/" + "pool-name"
// TabletIndexLabel is the key for identifying the index of a Vitess tablet within its pool.
TabletIndexLabel = LabelPrefix + "/" + "tablet-index"
// BackupScheduleLabel is the key for identifying to which VitessBackupSchedule a Job belongs to.
BackupScheduleLabel = LabelPrefix + "/" + "backup-schedule"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be accomplished with an owner reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a little neater, since if you delete a backup schedule, you'd probably also want to delete any currently running pods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VitessBackupSchedule is already the owner of its own jobs, which makes it an owner of the pods that are created by the jobs. When deleting a backup schedule all the associated pods and jobs are also removed.

$ kubectl get pods                                                                                                                                                                                          
NAME                                                         READY   STATUS              RESTARTS   AGE
...
example-vbsc-every-minute-94d12263-1717100280-9cscg          0/1     ContainerCreating   0          27s
...

$  kubectl get vitessbackupschedule                                                                                                                                                                           
NAME                                 AGE
example-vbsc-every-minute-94d12263   90s

$ kubectl delete vitessbackupschedule example-vbsc-every-minute-94d12263                                                                                                                                     
vitessbackupschedule.planetscale.com "example-vbsc-every-minute-94d12263" deleted

$ kubectl get pods                                                                                                                                                                                           
NAME                                                         READY   STATUS        RESTARTS      AGE
...
example-vbsc-every-minute-94d12263-1717100280-9cscg          1/1     Terminating   1 (31s ago)   67s
...

$ kubectl get pods                                                                                                                                                                                         
NAME                                                         READY   STATUS    RESTARTS      AGE
example-90089e05-vitessbackupstorage-subcontroller           1/1     Running   0             2m30s
example-commerce-x-x-vtbackup-init-c6db73c9                  0/1     Error     2 (26s ago)   2m26s
example-commerce-x-x-zone1-vtorc-c13ef6ff-5855667dbc-d5dcl   1/1     Running   1 (73s ago)   2m27s
example-etcd-faf13de3-1                                      1/1     Running   0             2m30s
example-etcd-faf13de3-2                                      1/1     Running   0             2m30s
example-etcd-faf13de3-3                                      1/1     Running   0             2m30s
example-vttablet-zone1-0790125915-4e37d9d5                   3/3     Running   0             2m27s
example-vttablet-zone1-2469782763-bfadd780                   3/3     Running   0             2m27s
example-vttablet-zone1-2548885007-46a852d0                   3/3     Running   0             2m27s
example-zone1-vtctld-1d4dcad0-646b8c9c77-vl69s               1/1     Running   1 (73s ago)   2m29s
example-zone1-vtgate-bc6cde92-569bbcf4df-lslph               1/1     Running   1 (72s ago)   2m27s
vitess-operator-6f87b84f88-6wjlt                             1/1     Running   0             2m33s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be accomplished with an owner reference?

I am not 100% sure of what you mean by that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, this is what i meant by owner reference 👍

The VitessBackupSchedule is already the owner of its own jobs,

return err
}
if jobStartTime.Add(time.Minute * time.Duration(timeout)).Before(time.Now()) {
if err := r.client.Delete(ctx, job, client.PropagationPolicy(metav1.DeletePropagationBackground)); (err) != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a good thing to have a metric for

Copy link
Member Author

@frouioui frouioui May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed via 46b6967 + 5809cbd

return job, nil
}

func (r *ReconcileVitessBackupsSchedule) createJobPod(ctx context.Context, vbsc *planetscalev2.VitessBackupSchedule, name string) (pod corev1.PodSpec, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth adding a note about that in release notes. i expect it will be a common issue people run in to.

if shardIndex > 0 || ksIndex > 0 {
cmd.WriteString(" && ")
}
createVtctldClientCommand(&cmd, vtctldclientServerArg, strategy.ExtraFlags, ks.name, shard)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i reading this right that it will be taking a backup of each keyspace and shard in sequence? that doesn't seem ideal to me because if each shard takes an hour to backup, and there are 32 shards, then the backup of the first shard and last shard will be more than a day apart.

i think it would be better if there were at least the option of BackupCluster and BackupKeyspace to backup all keyspaces and shards in parallel.

might be better to limit this PR to only support BackupShard for now, and add support for the other options after more consideration into how to implement BackupKeyspace and BackupCluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that, remove those two strategies as part of this PR and I will work on a subsequent PR to add them back with a better approach. This PR is getting lengthy already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed via 70ba063

Copy link
Contributor

@mattlord mattlord May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO BackupAllShardsInKeyspace and BackupAllShardsInCluster are better names. It may seem nitty, but I think it's important as it reflects what it actually is: independent backups of the shards. i.e. it is NOT a single consistent backup of the keyspace or cluster at any physical or logical point in time.

Copy link
Member Author

@frouioui frouioui May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing Keyspace and Cluster strategies in this PR as it will require a bigger refactoring. I am keeping that in mind for when we add them though.

Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants