Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement _VariableFunctionsClass.empty of torch #331

Closed
athitten opened this issue May 1, 2024 · 6 comments 路 Fixed by #353
Closed

Implement _VariableFunctionsClass.empty of torch #331

athitten opened this issue May 1, 2024 · 6 comments 路 Fixed by #353
Labels
enhancement New feature or request good first issue Good for newcomers nemo Issues needed to support NVIDIA NeMo models. operators

Comments

@athitten
Copy link

athitten commented May 1, 2024

馃殌 Feature

Implement torch.empty

Motivation

NeMo Vision Transformer

cc @apaz-cli @tfogal

@athitten athitten added the enhancement New feature or request label May 1, 2024
@tfogal tfogal added the nemo Issues needed to support NVIDIA NeMo models. label May 1, 2024
@k223kim
Copy link
Contributor

k223kim commented May 3, 2024

Once this is implemented, I guess we can add torch.Tensor as well?

@k223kim
Copy link
Contributor

k223kim commented May 3, 2024

Hi Team! I am trying to figure out how I can allocate memory but not initialize the values in the tensor. I am assuming I could do something like torch.full. However, I am not sure how torch.full is returning a TensorProxy filled with a certain value as it is technically just returning a TensorProxy from prims.py, _full_meta. How does that fill_value come into play? I think I can do something similar but without the filled value. Or, potentially, am I approaching this problem incorrectly? Would appreciate if anyone can point at resources or suggest different methods to implement this!

@mruberry mruberry added good first issue Good for newcomers operators and removed triage review labels May 6, 2024
@mruberry
Copy link
Collaborator

mruberry commented May 6, 2024

@k223kim Excellent question!

So you can implement this like by adding a new EMPTY primitive, similar to the RANDN primitive:

def _randn_meta(

then implement the EMPTY primitive using the torch executor:

def _randn_prims_transform(

and finally define torch.empty to call clang.empty to call prims.empty, and add a direct implementation of torch.empty to the PyTorch executor, too, for completeness.

The difference between torch.empty and clang.empty is that torch.empty handles PyTorch objects and translates them to thunder objects before calling clang.empty. The implementation of torch.full is an example of this:

Let me know if you have any additional questions!

@k223kim k223kim mentioned this issue May 7, 2024
4 tasks
@k223kim
Copy link
Contributor

k223kim commented May 7, 2024

Hey @mruberry! Thank you so much for the detailed guidance. I have submitted a draft PR with (hopefully) everything you have mentioned. However, I actually do want to understand how this implementation work in more detail. I would appreciate it if you can answer these questions to help my understanding which will be extremely helpful in future works on Thunder.

  • Currently, torch.empty calls clang.empty which calls prims.empty. At the very end, it calls _empty_meta which returns a TensorProxy. What I don't get is, this logic seems identical to the implementation of torch.full. How does one fill the tensor with a desired value and the other just presumably allocate memory and not initialize data within it? What makes such difference?
  • I am trying to understand why we need clang.empty and prims.empty in the first place. Why are some methods only implemented in torch/__init__.py and others implemented in clang and/or prims?

Also, I have a question regarding the test case of torch.empty. In the case of torch.randn, I can see that in opinfos.py, it only checks the shape, device, and dtype consistency. I think something similar should be done for torch.empty as two empty tensors can be allocated in different memory which results in different uninitialized data. Does that make sense to you?

Thanks again for your time reviewing! 鈿★笍

@mruberry
Copy link
Collaborator

mruberry commented May 7, 2024

Hey @mruberry! Thank you so much for the detailed guidance. I have submitted a draft PR with (hopefully) everything you have mentioned. However, I actually do want to understand how this implementation work in more detail. I would appreciate it if you can answer these questions to help my understanding which will be extremely helpful in future works on Thunder.

Will do!

  • Currently, torch.empty calls clang.empty which calls prims.empty. At the very end, it calls _empty_meta which returns a TensorProxy. What I don't get is, this logic seems identical to the implementation of torch.full. How does one fill the tensor with a desired value and the other just presumably allocate memory and not initialize data within it? What makes such difference?

Functions like thunder.torch.empty and clang.empty and prims.empty are called when thunder constructs its Python program. At this time the meta functions are called to understand what the output of the operations will be, but no computation on the actual tensor data occurs.

After the program is constructed and compiled, it is executed. As part of the compilation process, the symbols like thunder.torch.empty that are recorded when the program is being constructed are translated into calls like torch.empty() that actually manipulate tensor data. It is those calls that are then executed.

So, program construction --> compilation --> execution. thunder.torch.empty and its meta are called at program construction time and don't create any values. torch.empty is called at execution time to create a tensor.

  • I am trying to understand why we need clang.empty and prims.empty in the first place. Why are some methods only implemented in torch/__init__.py and others implemented in clang and/or prims?

thunder is interested in understanding properties of operations, like how input metadata maps to output metadata, or how to create a grad formula. A lot of these properties can be implicitly defined by decomposing more complicated operators (like the ones in thunder.torch) to simpler operators like the prims. Without the implicit definition of these properties then each torch operator would have to define its own meta function and its own grad function, which would be challenging to maintain.

Additionally, some executors, like nvFuser, are interested in breaking down operations into a series of simpler operations that can be fused. Without the prims, each torch operation would effectively be a primitive, so executors like nvFuser would need special execution logic for each torch operation.

The core language (the "clang"), is intended to be common operations that are more usable than primitive operations and facilitate the creation of language definitions like the torch language definition, or the numpy language definition.

Also, I have a question regarding the test case of torch.empty. In the case of torch.randn, I can see that in opinfos.py, it only checks the shape, device, and dtype consistency. I think something similar should be done for torch.empty as two empty tensors can be allocated in different memory which results in different uninitialized data. Does that make sense to you?

That sounds great!

@k223kim
Copy link
Contributor

k223kim commented May 8, 2024

Thanks so much @mruberry! This helped me greatly with my understanding regard how things work for sure! Appreciate your help as always. 馃殌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers nemo Issues needed to support NVIDIA NeMo models. operators
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants