In some models I have observed that many layers are executed on the CPU even though I use the GPUCompute runtime and all operations should be supported by the GPU.
I built models to reproduce the behavior:
The first model just takes an rendered image (on the GPU) as input and calculates its height and width using the Shape operation. This model is just to proove that Shape can be executed on the GPU. This is what the model looks like:
After execution, all tensors were on the GPU!
The next model is artificially blown up but it simply represents the creation of identity matrices (Depending on the batch size of the input). It is based on this pytorch code:
import torch
class IdentityModel(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
identity_mat = torch.zeros(x.shape[0], 3,3)
identity_mat[:,0,0] = 1
identity_mat[:,1,1] = 1
identity_mat[:,2,2] = 1
return identity_mat
model = IdentityModel()
model.eval()
test = torch.rand(5,3,112,224)
model(test)
torch.onnx.export(model, test, "identity_test.onnx", verbose=False, opset_version=11, input_names = ['input'], output_names = ['identity_matrices'], dynamic_axes={'input' : {0: 'batch', 2 : 'height',3:'width'}}, do_constant_folding=True)
This is how the beginning of the model looks (I added CPU and GPU tags depending on where the corresponding tensor ended up in my experiment):
So for some reason, “Shape” seems to work on the GPU for the first example but not for the second Image may be NSFW.
Clik here to view.

(The discussion started at Possible memory leak in 0-dimensional tensors - #5 by josh-o)
9 posts - 3 participants