Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss with new relationship approach #2478

Open
cabcookie opened this issue Apr 19, 2024 · 6 comments
Open

Data loss with new relationship approach #2478

cabcookie opened this issue Apr 19, 2024 · 6 comments
Labels
Gen 2 question Further information is requested transferred

Comments

@cabcookie
Copy link

Environment information

System:
  OS: macOS 14.3.1
  CPU: (10) arm64 Apple M1 Pro
  Memory: 159.95 MB / 32.00 GB
  Shell: /bin/zsh
Binaries:
  Node: 18.19.0 - ~/.nvm/versions/node/v18.19.0/bin/node
  Yarn: undefined - undefined
  npm: 10.2.3 - ~/.nvm/versions/node/v18.19.0/bin/npm
  pnpm: undefined - undefined
NPM Packages:
  @aws-amplify/backend: 0.13.0-beta.20
  @aws-amplify/backend-cli: 0.12.0-beta.22
  aws-amplify: 6.0.28
  aws-cdk: 2.138.0
  aws-cdk-lib: 2.138.0
  typescript: 5.4.5
AWS environment variables:
  AWS_STS_REGIONAL_ENDPOINTS = regional
  AWS_NODEJS_CONNECTION_REUSE_ENABLED = 1
  AWS_SDK_LOAD_CONFIG = 1
No CDK environment variables

Description

The new relationship approach caused data loss for me.

Before the upgrade to @aws-amplify/backend@0.13.0-beta.20 this has been my data model:

DayPlan: a
    .model({
      owner: a.string().authorization([a.allow.owner().to(["read", "delete"])]),
      day: a.date().required(),
      dayGoal: a.string().required(),
      done: a.boolean(),
      todos: a.hasMany("DayPlanTodo"),
    })
    .authorization([a.allow.owner()]),
  DayPlanTodo: a
    .model({
      owner: a.string().authorization([a.allow.owner().to(["read", "delete"])]),
      todo: a.string().required(),
      done: a.boolean(),
      doneOn: a.date(),
      dayPlan: a.belongsTo("DayPlan"),
    })
    .authorization([a.allow.owner()]),

As I have to provide references now, I changed the model to this:

DayPlan: a
    .model({
      owner: a.string().authorization([a.allow.owner().to(["read", "delete"])]),
      day: a.date().required(),
      dayGoal: a.string().required(),
      done: a.boolean(),
      todos: a.hasMany("DayPlanTodo", "dayPlanTodosId"),
    })
    .authorization([a.allow.owner()]),
  DayPlanTodo: a
    .model({
      owner: a.string().authorization([a.allow.owner().to(["read", "delete"])]),
      todo: a.string().required(),
      done: a.boolean(),
      doneOn: a.date(),
      dayPlanTodosId: a.id().required(),
      dayPlan: a.belongsTo("DayPlan", "dayPlanTodosId"),
    })
    .secondaryIndexes((index) => [index("dayPlanTodosId")])
    .authorization([a.allow.owner()]),

Unfortunately, all records in DayPlanTodo have been deleted as a consequence. Lukily, I only did it in my sandbox environment.

@edwardfoyle edwardfoyle transferred this issue from aws-amplify/amplify-backend Apr 19, 2024
@AaronZyLee
Copy link
Contributor

AaronZyLee commented Apr 19, 2024

I think this is not related to the reference change but the GSI update in DayPlanTodo.

.secondaryIndexes((index) => [index("dayPlanTodosId")])

In the sandbox mode, when the GSI update is detected, the table will be deleted and recreated instead of the normal GSI update, which will wipe the existing data. This is the behavior in effect for SANDBOX ONLY to accelerate the GSI update time.
See code block https://github.com/aws-amplify/amplify-backend/blob/main/packages/backend-data/src/factory.ts#L259-L267

@AnilMaktala AnilMaktala added the question Further information is requested label Apr 19, 2024
@cabcookie
Copy link
Author

Interesting. I will test it in my staging environment and let you know

@AakashKB
Copy link

Facing this problem as well, is there a clear migration guide on how we can move from the old resource definition format to the new one while keep tables and access ids consistent?

I'm not seeing any documentation on this.

@cabcookie
Copy link
Author

I tested in my staging environment and indeed I didn't face data loss there. However, for the new schema definition expectation I had to be very careful to ensure I have the exact same table names and field names to ensure I do not loose data. In addition, the new schema definition creates new secondary indexes and deletes the existing ones which is very time consuming and Amplify pipeline-deploy as well as the step functions run into timeout issues as described here: #2488

@AaronZyLee
Copy link
Contributor

AFAIK, in Gen2 the table will only be replaced if the key schema for the table is changed (GSIs updates are exempted). The table names are not able to be modified in the any customer facing APIs of Gen2 because of the table replacement resulting from the rename. Neither will the table name is generated differently between sandbox and branch deployment environment

@AakashKB
Copy link

Issue isn't really data loss via table replacement, it is more naming relationship ids. In prior versions, this was automatically done.

Ex. Original

User: a
            .model({
                email: a.string().required(),
                relationships: a.hasMany("PlayerCharacterRelationship"),
                isOnboarded: a
                    .boolean()
                    .default(false)
                    .required(),
            })

New:

User: a
            .model({
                email: a.string().required(),
                relationships: a.hasMany("PlayerCharacterRelationship", "???"), //id needs to be manually defined
                isOnboarded: a
                    .boolean()
                    .default(false)
                    .required(),
            })

So now I have to manually check the table for each model definition to identify the exact id used and define it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gen 2 question Further information is requested transferred
Projects
None yet
Development

No branches or pull requests

7 participants