Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS backend thinks that subnormals are equal to zero #125051

Open
malfet opened this issue Apr 26, 2024 · 5 comments
Open

MPS backend thinks that subnormals are equal to zero #125051

malfet opened this issue Apr 26, 2024 · 5 comments
Assignees
Labels
module: correctness (silent) issue that returns an incorrect result silently module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@malfet
Copy link
Contributor

malfet commented Apr 26, 2024

馃悰 Describe the bug

import torch
x = torch.tensor([0.0, 0.0])
y = torch.tensor([1.0, -1.0])
na = torch.nextafter(x, y) 
print(na, na > x, na.to("mps") > x.to("mps"))

Prints

tensor([ 1.4013e-45, -1.4013e-45]) tensor([ True, False]) tensor([False, False], device='mps:0')

Versions

2.3.0, nightlies

cc @kulinseth @albanD @DenisVieriu97 @jhavukainen

@malfet malfet added module: correctness (silent) issue that returns an incorrect result silently module: mps Related to Apple Metal Performance Shaders framework labels Apr 26, 2024
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 26, 2024
@skotapati
Copy link

Confirmed this issue still occurs on the latest torch, we're investigating on the mps side

@skotapati
Copy link

MPS does not natively support double, so the float value next after zero will be subnormal. This is treated as zero, so MPS results not matching CPU in this case is intended behavior. Closing this issue for now

@hvaara
Copy link
Contributor

hvaara commented May 3, 2024

@skotapati Thanks for clarifying! Is this a hardware limitation? Could MPS possibly support subnormal numbers in fp32 in the future if it became a priority?

From what I can tell subnormal numbers are supported in fp16. Is this correct? Will operations on subnormal numbers be different than normal numbers in fp16 (eg. slower)?

Assuming adverse effects; Is there a way to flag FTZ/DAZ with passthrough in MPS?

@malfet
Copy link
Contributor Author

malfet commented May 7, 2024

Weird, indeed nextafter(0.0, 1.0) > 0.0 returns false on Metal device, according to

import Metal

let shader_source = """
#include <metal_stdlib>
using namespace metal;

kernel void nextafter_pred(device float *data [[buffer(0)]],
                           device bool *pred [[buffer(1)]],
                            uint thread_index [[thread_position_in_grid]]) {
  data[thread_index] = nextafter(float(thread_index) - 8.0, 1e4);
  pred[thread_index] = data[thread_index] > 0.0;
}
"""

guard let device = MTLCopyAllDevices().first else { fatalError("Not Metal device found") }

print("Using device \(device.name)")

let options = MTLCompileOptions()
options.languageVersion = .version3_1
options.fastMathEnabled = false
let library = try! device.makeLibrary(source:shader_source, options:options)
guard let mfunc = library.makeFunction(name: "nextafter_pred") else { fatalError("Can't find function") }

let nelem = 256;
guard let dbuf = device.makeBuffer(length:nelem * MemoryLayout<Float>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }
guard let pbuf = device.makeBuffer(length:nelem * MemoryLayout<Bool>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") }

guard let queue = device.makeCommandQueue() else { fatalError("Can't make queue") }
guard let cmdBuffer = queue.makeCommandBuffer() else { fatalError("Can't make command buffer") }
guard let computeEncoder = cmdBuffer.makeComputeCommandEncoder() else { fatalError("Can't make compute encoder") }
computeEncoder.setComputePipelineState(try! device.makeComputePipelineState(function: mfunc))
computeEncoder.setBuffer(dbuf, offset:0, index: 0)
computeEncoder.setBuffer(pbuf, offset:0, index: 1)
computeEncoder.dispatchThreads(MTLSizeMake(nelem, 1, 1), threadsPerThreadgroup:MTLSizeMake(nelem, 1, 1))
computeEncoder.endEncoding()
cmdBuffer.commit()
cmdBuffer.waitUntilCompleted()

let float_data = dbuf.contents().assumingMemoryBound(to: Float.self)
let bool_data = pbuf.contents().assumingMemoryBound(to: Bool.self)
for i in 0..<16 {
    print("\(i): \(float_data[i]) >0 is \(bool_data[i])")
}

@malfet malfet changed the title MPS backend thinks that small floats are less than zero MPS backend thinks that subnormals are equal to zero May 7, 2024
@hvaara
Copy link
Contributor

hvaara commented May 7, 2024

What does hardcoding data[thread_index] = nextafter(nextafter(float(0), float(1)), float(1)) give you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: correctness (silent) issue that returns an incorrect result silently module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants