Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow profiler running into OOM issue on GPU #655

Open
rahul-fnu opened this issue Aug 10, 2023 · 3 comments
Open

TensorFlow profiler running into OOM issue on GPU #655

rahul-fnu opened this issue Aug 10, 2023 · 3 comments

Comments

@rahul-fnu
Copy link

rahul-fnu commented Aug 10, 2023

Running TensorFlow profiler for longer than 10 second period results into OOM error, crashes the inference process and the profiler returns DEADLINE_EXCEEDED. Is there anyway to limit the sampling rate or way to reduce the amount of information being collected to avoid crashing the process?

Here is the code that I run:
tensorflow_profiler.experimental.client("grpc://localhost:3222", "profiles", 30000)

@ndeepesh
Copy link

Hi Tensorflow team

Can you help us with above? Is there a way to sample TensorFlow profiling on GPUs? This is blocking us from collecting any traces greater than 10s

@pritamdodeja
Copy link

Have you tried to do this with keras callbacks using something like this:

tensorboard_callback = tf.keras.callbacks.TensorBoard(                                                                                                                                    
          log_dir=fn_args.model_run_dir, profile_batch= (40,80), histogram_freq=1, write_steps_per_second=True, write_graph=False)

And passing the callback within model.fit?

@Rahulraj0308
Copy link

@rahul-fnu To limit the sampling rate or reduce the amount of information collected by the TensorFlow profiler, you can adjust the sampling_rate parameter in the tensorflow_profiler.experimental.client function.
Use- tensorflow_profiler.experimental.client("grpc://localhost:3222", "profiles", 30000, sampling_rate=0.5, events=["compute"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants