Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running R Keras model with custom loss crash when run twice on different data sets #1401

Open
william-denault opened this issue Feb 9, 2024 · 3 comments

Comments

@william-denault
Copy link

william-denault commented Feb 9, 2024

I am trying to develop some specific type of neural net with a special type of loss function. The loss is somewhat strange but in my project in actually make sens.

Any how I am trying to first set up the architecture and then fit the model to different data set x_train and y_train and x_train2 and y_train2.

I really don t get why it crashes when fitted to the x_train2 and y_train2.

Here the error

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : RuntimeError: in user code:

File  ...\DOCUME~1\VIRTUA~1\R-TENS~1\Lib\site-packages\keras\src\engine\training.py", line 1401, in train_function  *
    return step_function(self, iterator)
File  ...\R\cache\R\renv\library\comoR-4eef73a7\R-4.3\x86_64-w64-mingw32\reticulate\python\rpytools\call.py", line 16, in python_function  *
    raise error

RuntimeError: NA/NaN argument

and here a minimal reproducible example.

 rm(list=ls())
    y = rnorm(1000)
    x= rnorm (10rm(list=ls())
    y = rnorm(1000)
    x= rnorm (1000)
num_classes=10
mat <- matrix( rnorm (1000*num_classes), ncol=num_classes)
x_train =x
y_train =mat
length(x_train)
dim(y_train)

model <- keras_model_sequential() %>%
  layer_dense(units = 64, activation = 'relu', input_shape = c(1)) %>%
  layer_dense(units = 64, activation = 'relu' ) %>%
  layer_dense(units = num_classes, activation = 'softmax')


custom_loss <- function(y_true, y_pred) {
  tt <- 0
  for (i in 1:nrow(y_true)) {
    tt <- tt + log(sum(exp(y_true[i,]) * y_pred[i,]))
  }
  mse <- -tt
  return(mse)
}
blank_model <-  model %>% compile(
  loss = custom_loss,
  optimizer = 'adam',
  metrics = c('accuracy')
)


model1 <- blank_model
model2 <- blank_model


history <-model1 %>% fit(
  x_train, y_train,
  epochs = 40,
  batch_size = 100
)

x_train2 =x[-1]
y_train2 =mat[-1,]

history <-model2 %>% fit(
  x_train2, y_train2,
  epochs = 40,
  batch_size = 100
)
00)
num_classes=10
mat <- matrix( rnorm (1000*num_classes), ncol=num_classes)
x_train =x
y_train =mat
length(x_train)
dim(y_train)

model <- keras_model_sequential() %>%
  layer_dense(units = 64, activation = 'relu', input_shape = c(1)) %>%
  layer_dense(units = 64, activation = 'relu' ) %>%
  layer_dense(units = num_classes, activation = 'softmax')


custom_loss <- function(y_true, y_pred) {
  tt <- 0
  for (i in 1:nrow(y_true)) {
    tt <- tt + log(sum(exp(y_true[i,]) * y_pred[i,]))
  }
  mse <- -tt
  return(mse)
}
blank_model <-  model %>% compile(
  loss = custom_loss,
  optimizer = 'adam',
  metrics = c('accuracy')
)


model1 <- blank_model
model2 <- blank_model


history <-model1 %>% fit(
  x_train, y_train,
  epochs = 40,
  batch_size = 100
)

x_train2 =x[-1]
y_train2 =mat[-1,]

history <-model2 %>% fit(
  x_train2, y_train2,
  epochs = 40,
  batch_size = 100
)

The model should be easily fitted to different dataset but crashes instead without any clear reason.

@t-kalinowski
Copy link
Member

model1 <- blank_model
model2 <- blank_model

Keras models are modified-in-place, they are not copy-on-modify like most other R objects (they behave like R environments, not R lists). In this snippet above, model1 and model2 will continue to refer to the same model throughout the rest of the script. If you want to make a copy of a model, you can use clone_model() to generate a model with the same architecture. If you want the copied model to also have identical weight values, you can set_weights(model2, get_weights(model1))

Error in 1:nrow(y_true) : RuntimeError: NA/NaN argument

This is coming from the custom loss function. When defining a custom loss function or metric for the first time, I suggest inserting a diagnostic print() or browser() call to understand when and what it's called with. Keras compiles custom loss functions by calling them with a tracing Tensor. Tensors implement a dim() method, so nrow(y_true) will work just fine when if the Tensor has a defined batchsize. In the first fit call to model1, this is true. However, in the 2nd fit() call to model2, the input data set is not a multiple of the requested batch_size, so the last batch of the epoch will have a different size. Keras accounts for this by passing a tracing tensor with a unspecified batch size dim, hence why nrow(y_true) returns NA.

You can get the runtime batch size of y_true as a tensor with tf$shape(y_true)[[1]], and use tfautograph::autograph() to write for loops that can iterate using a tensor value.

I was unable to reproduce the crash by running the code. Please make sure you are running the latest version of reticulate, tensorflow, and keras, as some segfault-causing bugs were fixed in recent releases. You can also try a preview of keras version 3 under the keras3 package remotes::install_github("rstudio/keras").

@william-denault
Copy link
Author

william-denault commented Feb 12, 2024

Dear Tomasz,

Thank you very much for your detailed and swift answer. I think my main problem was coming from the miss matching between the dataset dimension and the batch size. Maybe this is a newbie question but how do handle case where the size of your dataset leads to poor divisibility condition (e.g. prime number or the size only allow small or large divisor?) Do you just drop some data ?

I am not sure what you mean by 'You can get the runtime batch size of y_true as a tensor with tf$shape(y_true)[[1]], and use tfautograph::autograph() to write for loops that can iterate using a tensor value.' What tf refers to in this message ?

@t-kalinowski
Copy link
Member

If you're using the new keras3, you can use the op_* namespace, in particular, the op_vectorized_map() to avoid writing a for loop:

custom_loss2 <- function(y_true, y_pred) {
  case_wise_tt <- op_vectorized_map(c(y_true, y_pred), function(x) {
    c(y_true1, y_pred1) %<-% x
    ## you can write from the debugger here and work with the tracing tensors
    ## in an interactive context
    # browser()
    message("y_true1 = ", y_true1)
    message("y_pred1 = ", y_pred1)
    op_log(op_sum(op_exp(y_true1) * y_pred1))
  })
  tt <- op_sum(case_wise_tt)
  mse <- -tt
  mse
}

y_true <- op_arange(20) |> op_reshape(c(4, -1))
y_pred <- y_true + 20

custom_loss2(y_true, y_pred)

If you're using keras (v2), then you can use tensorflow for operating directly on tensors:

library(tensorflow)
custom_loss3 <- function(y_true, y_pred) {
  case_wise_tt <- tf$vectorized_map(
    elems = c(y_true, y_pred),
    fn = function(x) {
      c(y_true1, y_pred1) %<-% x
      message("y_true1 = ", y_true1)
      message("y_pred1 = ", y_pred1)
      log(sum(exp(y_true1) * y_pred1))
    }
  )
  tt <- sum(case_wise_tt)
  mse <- -tt
  mse
}

y_true <- tensorflow::as_tensor(0:19, shape = c(4, -1), dtype = "float64") 
y_pred <- y_true + 20

custom_loss3(y_true, y_pred)

Note that the tensorflow R package defines many Group generics, so log, exp, sum, etc. all dispatch to tf$math$log(), tf$math$exp() and so on. If you're working with tensorflow tensors directly, docs on the tensorflow page are very helpful: e.g.,: https://www.tensorflow.org/api_docs/python/tf/math/exp

If you have an actual need for a for loop because the state of the tensor operations depends on the previous state of the loop iteration (e.g., as it might in a sequence processing model) then you can use tfautograph as shown here: https://t-kalinowski.github.io/tfautograph/articles/tfautograph.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants