Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

0 votes
0 answers
23 views

Compilation Errors with CUDA Fortran and cuBLAS

I am trying to compile a Fortran program using CUDA and cuBLAS as per an example from the NVIDIA HPC SDK documentation. My setup includes an NVIDIA A100 GPU, and I have configured the CUDA and cuBLAS ...
Mao Yang's user avatar
0 votes
1 answer
47 views

Problem with Cmake and including third-party library

I'm trying to properly configure Cmake for my CUDA project. I'm using third party library, CGBN: https://github.com/NVlabs/CGBN/tree/master and Catch2 for unit-tests. Basically I am trying to build ...
drzejus's user avatar
-2 votes
0 answers
20 views

Parallel Computing function value with Julia CUDA

I have a function f(x,y,z) defined as global a = 1.0 fucntion f(x, y, z) return a*x^2 + y*z end How to calculate the sum of function values at 10000 different points by using CUDA? I ask GPT and ...
Watheophy's user avatar
  • 115
-2 votes
0 answers
19 views

Cupy - Changes in included file not updated

I have an external kernel cuda function which I call using a RawKernel. This kernel function is defined in a first .cu file. From this kernel function, I call some auxiliary __device__ functions which ...
user23326971's user avatar
-2 votes
0 answers
39 views

How can I use multi cpu core while running cuda with one gpu? [closed]

I am trying to run a simulation programmed with CUDA on a server that has an single Intel Xeon with 36 cores and a single P4000 GPU. When I used mpirun -np 16 ./sim, it failed to run, stating that MPI ...
조준호's user avatar
0 votes
0 answers
25 views

How many apis does the cutlass library have?

I want to use the cutlass API, and access the cutlass website, found that's website lists a lot of class templates, including common cutlass::gemm::device::gemm, etc. The problem is that there are so ...
user23931088's user avatar
0 votes
0 answers
47 views

Inconsistent global memory access between blocks despite use of volatile, threadfence and disabling L1 cache

In the following minimal reproducible example for the construction of a tree, where bodies are inserted based on their position (so a 1D version of a Quad/Octree) when multiple blocks are used, some ...
larrycaverga's user avatar
-4 votes
0 answers
31 views

Problems for installing NVIDIA driver in Ubuntu 22.04.2 LTS [closed]

NVIDIA driver installing failed. I tried to install NVIDIA driver by the command below. sudo apt install nvidia-driver-545-open but I got the following log: Building for 6.5.0-44-generic 6.5.1-...
Mason Wong's user avatar
1 vote
0 answers
36 views

Unable to include thrust/host_vector.h and others with CUDA 12.5

This test program compiled fine with CUDA 12.4 and lower, but fails to compile w/ 12.5.1: #include <thrust/host_vector.h> #include <thrust/scan.h> #include <iostream> int main() { ...
Matt's user avatar
  • 20.6k
-4 votes
1 answer
44 views

continuously getting the error: 'nvidia-smi' is not recognized as an internal or external command, operable program or batch file [closed]

disclaimer: I am not super experienced with python I have been trying to set up SAM (segment anything model by meta), but have been running into issues with installing pytorch. I have followed ...
F.O.'s user avatar
  • 1
0 votes
0 answers
43 views

What are the risks of increasing cudaLimitDevRuntimePendingLaunchCount?

I encountered an error while using dynamic parallelism: launch failed because launch would exceed cudaLimitDevRuntimePendingLaunchCount To resolve this issue, I increased ...
Weimin Chan's user avatar
0 votes
1 answer
46 views

Can CUDA Thrust Kernels operate in parallel on multiple streams?

I am attempting to launch thrust::fill on two different device vectors in parallel on different CUDA streams. However, when I look at the kernel launches in NSight Systems, they appear to be ...
Nicolas Perrault's user avatar
-2 votes
0 answers
29 views

My benchmark and Nsight Compute don't agree which kernel is faster [closed]

I have two CUDA convolution kernels which perform convolution of 1024X1024 with a mask size of 3X3.ran both of them for 1000 times. Average execution time of kernel 1 is better than kernel 2 according ...
Madhusudana.A.V Madhusudan.A.V's user avatar
0 votes
1 answer
56 views

CUBLAS matrix multiplication with row-major data

I read some related posts here, and success using do the row majored matrixes multiplication with cuBLAS: A*B (column majored) = B*A (row majored) I write a wrapper to do this so that I can pass row ...
Weimin Chan's user avatar
1 vote
0 answers
49 views

Cannot determine Numba type of <class 'clr._internal.CLRMetatype'>

I am very new to CUDA... I have written a module in .Net 6.0 and I need to scale up the execution time utilizing CUDA in a Ubuntu machine. The method I need to call is defined as: namespace ...
Gregory Gasteratos's user avatar

15 30 50 per page
1
2 3 4 5
969