Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module view dose not show device time #733

Open
SolenoidWGT opened this issue Feb 25, 2023 · 4 comments
Open

Module view dose not show device time #733

SolenoidWGT opened this issue Feb 25, 2023 · 4 comments
Labels
bug Something isn't working plugin PyTorch Profiler TensorBoard Plugin related

Comments

@SolenoidWGT
Copy link

SolenoidWGT commented Feb 25, 2023

Hi guys, I'm recently trying to use torch.profile for profiling of a large NLP model. However, I have encountered some problems and would like to get some advice:

  1. I found that only with_modules=True and with_stack=True are set at the same time, the tensorboard page will display the view of module. Otherwise, the module view option cannot be found in the browser, like:
    image

  2. In the view of module, there is no device time data, but I have added ProfilerActivity.CUDA in profile activities.

image

Here is an example of my code :

    with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA],
            schedule=torch.profiler.schedule(
                skip_first=5,
                warmup=1,
                wait=10,
                active=5),
            on_trace_ready=torch.profiler.tensorboard_trace_handler('./tensorboard'),
            with_modules = True,
            with_stack = True
        ) as prof:
          for i in range(steps):
             do training
             prof.step()

I'm not sure if this is a version related issue, here is my version info:

torch 1.13.1+cu117
torch-tb-profiler 0.4.1
torchaudio 0.13.1+cu117
torchvision 0.14.1+cu117

UPDATE: Today I upgraded torch version, but this problem still exists:

torch                   2.0.0.dev20230226+cu117
torch-tb-profiler       0.4.1
torchaudio              2.0.0.dev20230223+cu117
torchvision             0.15.0.dev20230226+cu117
@SolenoidWGT
Copy link
Author

SolenoidWGT commented Feb 25, 2023

btw, because my trace.json data is very large, the opening of tensorboard is very slow, sometimes even OOM. So is there a way to output profiling data in raw str? like this:
image

@SolenoidWGT
Copy link
Author

SolenoidWGT commented Feb 27, 2023

2/27 UPDATE
I copy resnet18 profiling example from pytorch tutorial

import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T

transform = T.Compose(
    [T.Resize(224),
     T.ToTensor(),
     T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)


device = torch.device("cuda:0")
model = torchvision.models.resnet18(pretrained=True).cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()

def train(data):
    inputs, labels = data[0].to(device=device), data[1].to(device=device)
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA],
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        if step >= (1 + 1 + 3) * 2:
            break
        train(batch_data)
        prof.step()  # Need to call this at the end of each step to notify profiler of steps' boundary.

But the sad thing is that the module view still doesn't show the device time, however my pytorch version has been upgraded to the latest version:

image

pytorch-triton          2.0.0+b8b470bc59
torch                   2.0.0.dev20230226+cu117
torch-tb-profiler       0.4.1
torchaudio              2.0.0.dev20230223+cu117
torchvision             0.15.0.dev20230226+cu117

@aaronenyeshi aaronenyeshi added the bug Something isn't working label Feb 27, 2023
@aaronenyeshi aaronenyeshi added the plugin PyTorch Profiler TensorBoard Plugin related label Jun 23, 2023
@Oliverwang11
Copy link

I faced the same issue, have you found the solution?

@diichen
Copy link

diichen commented Apr 30, 2024

same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working plugin PyTorch Profiler TensorBoard Plugin related
Projects
None yet
Development

No branches or pull requests

4 participants