Module view dose not show device time #733

SolenoidWGT · 2023-02-25T13:27:13Z

Hi guys, I'm recently trying to use torch.profile for profiling of a large NLP model. However, I have encountered some problems and would like to get some advice:

I found that only with_modules=True and with_stack=True are set at the same time, the tensorboard page will display the view of module. Otherwise, the module view option cannot be found in the browser, like:
In the view of module, there is no device time data, but I have added ProfilerActivity.CUDA in profile activities.

Here is an example of my code :

    with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA],
            schedule=torch.profiler.schedule(
                skip_first=5,
                warmup=1,
                wait=10,
                active=5),
            on_trace_ready=torch.profiler.tensorboard_trace_handler('./tensorboard'),
            with_modules = True,
            with_stack = True
        ) as prof:
          for i in range(steps):
             do training
             prof.step()

I'm not sure if this is a version related issue, here is my version info:

torch 1.13.1+cu117
torch-tb-profiler 0.4.1
torchaudio 0.13.1+cu117
torchvision 0.14.1+cu117

UPDATE: Today I upgraded torch version, but this problem still exists:

torch                   2.0.0.dev20230226+cu117
torch-tb-profiler       0.4.1
torchaudio              2.0.0.dev20230223+cu117
torchvision             0.15.0.dev20230226+cu117

The text was updated successfully, but these errors were encountered:

SolenoidWGT · 2023-02-25T13:36:35Z

btw, because my trace.json data is very large, the opening of tensorboard is very slow, sometimes even OOM. So is there a way to output profiling data in raw str? like this:

SolenoidWGT · 2023-02-27T06:48:30Z

2/27 UPDATE
I copy resnet18 profiling example from pytorch tutorial

import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T

transform = T.Compose(
    [T.Resize(224),
     T.ToTensor(),
     T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)


device = torch.device("cuda:0")
model = torchvision.models.resnet18(pretrained=True).cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()

def train(data):
    inputs, labels = data[0].to(device=device), data[1].to(device=device)
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA],
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        if step >= (1 + 1 + 3) * 2:
            break
        train(batch_data)
        prof.step()  # Need to call this at the end of each step to notify profiler of steps' boundary.

But the sad thing is that the module view still doesn't show the device time, however my pytorch version has been upgraded to the latest version:

pytorch-triton          2.0.0+b8b470bc59
torch                   2.0.0.dev20230226+cu117
torch-tb-profiler       0.4.1
torchaudio              2.0.0.dev20230223+cu117
torchvision             0.15.0.dev20230226+cu117

Oliverwang11 · 2024-04-22T11:35:13Z

I faced the same issue, have you found the solution?

diichen · 2024-04-30T04:49:34Z

same issue

aaronenyeshi added the bug Something isn't working label Feb 27, 2023

aaronenyeshi added the plugin PyTorch Profiler TensorBoard Plugin related label Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module view dose not show device time #733

Module view dose not show device time #733

SolenoidWGT commented Feb 25, 2023 •

edited

SolenoidWGT commented Feb 25, 2023 •

edited

SolenoidWGT commented Feb 27, 2023 •

edited

Oliverwang11 commented Apr 22, 2024

diichen commented Apr 30, 2024

Module view dose not show device time #733

Module view dose not show device time #733

Comments

SolenoidWGT commented Feb 25, 2023 • edited

SolenoidWGT commented Feb 25, 2023 • edited

SolenoidWGT commented Feb 27, 2023 • edited

Oliverwang11 commented Apr 22, 2024

diichen commented Apr 30, 2024

SolenoidWGT commented Feb 25, 2023 •

edited

SolenoidWGT commented Feb 25, 2023 •

edited

SolenoidWGT commented Feb 27, 2023 •

edited