Questions tagged [cuda]

Ask Question

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

14,521 questions

0 votes

0 answers

23 views

Compilation Errors with CUDA Fortran and cuBLAS

I am trying to compile a Fortran program using CUDA and cuBLAS as per an example from the NVIDIA HPC SDK documentation. My setup includes an NVIDIA A100 GPU, and I have configured the CUDA and cuBLAS ...

Mao Yang

asked 7 hours ago

0 votes

1 answer

47 views

Problem with Cmake and including third-party library

I'm trying to properly configure Cmake for my CUDA project. I'm using third party library, CGBN: https://github.com/NVlabs/CGBN/tree/master and Catch2 for unit-tests. Basically I am trying to build ...

drzejus

asked 7 hours ago

-2 votes

0 answers

20 views

Parallel Computing function value with Julia CUDA

I have a function f(x,y,z) defined as global a = 1.0 fucntion f(x, y, z) return a*x^2 + y*z end How to calculate the sum of function values at 10000 different points by using CUDA? I ask GPT and ...

Watheophy

asked 14 hours ago

-2 votes

0 answers

19 views

Cupy - Changes in included file not updated

I have an external kernel cuda function which I call using a RawKernel. This kernel function is defined in a first .cu file. From this kernel function, I call some auxiliary __device__ functions which ...

user23326971

asked yesterday

-2 votes

0 answers

39 views

How can I use multi cpu core while running cuda with one gpu? [closed]

I am trying to run a simulation programmed with CUDA on a server that has an single Intel Xeon with 36 cores and a single P4000 GPU. When I used mpirun -np 16 ./sim, it failed to run, stating that MPI ...

조준호

asked yesterday

0 votes

0 answers

25 views

How many apis does the cutlass library have?

I want to use the cutlass API, and access the cutlass website, found that's website lists a lot of class templates, including common cutlass::gemm::device::gemm, etc. The problem is that there are so ...

user23931088

asked yesterday

0 votes

0 answers

47 views

Inconsistent global memory access between blocks despite use of volatile, threadfence and disabling L1 cache

In the following minimal reproducible example for the construction of a tree, where bodies are inserted based on their position (so a 1D version of a Quad/Octree) when multiple blocks are used, some ...

larrycaverga

asked 2 days ago

-4 votes

0 answers

31 views

Problems for installing NVIDIA driver in Ubuntu 22.04.2 LTS [closed]

NVIDIA driver installing failed. I tried to install NVIDIA driver by the command below. sudo apt install nvidia-driver-545-open but I got the following log: Building for 6.5.0-44-generic 6.5.1-...

Mason Wong

asked 2 days ago

1 vote

0 answers

36 views

Unable to include thrust/host_vector.h and others with CUDA 12.5

This test program compiled fine with CUDA 12.4 and lower, but fails to compile w/ 12.5.1: #include <thrust/host_vector.h> #include <thrust/scan.h> #include <iostream> int main() { ...

Matt

20.6k

asked Jul 23 at 15:05

-4 votes

1 answer

44 views

continuously getting the error: 'nvidia-smi' is not recognized as an internal or external command, operable program or batch file [closed]

disclaimer: I am not super experienced with python I have been trying to set up SAM (segment anything model by meta), but have been running into issues with installing pytorch. I have followed ...

F.O.

asked Jul 23 at 12:01

0 votes

0 answers

43 views

What are the risks of increasing cudaLimitDevRuntimePendingLaunchCount?

I encountered an error while using dynamic parallelism: launch failed because launch would exceed cudaLimitDevRuntimePendingLaunchCount To resolve this issue, I increased ...

Weimin Chan

asked Jul 23 at 10:27

0 votes

1 answer

46 views

Can CUDA Thrust Kernels operate in parallel on multiple streams?

I am attempting to launch thrust::fill on two different device vectors in parallel on different CUDA streams. However, when I look at the kernel launches in NSight Systems, they appear to be ...

Nicolas Perrault

asked Jul 22 at 18:28

-2 votes

0 answers

29 views

My benchmark and Nsight Compute don't agree which kernel is faster [closed]

I have two CUDA convolution kernels which perform convolution of 1024X1024 with a mask size of 3X3.ran both of them for 1000 times. Average execution time of kernel 1 is better than kernel 2 according ...

Madhusudana.A.V Madhusudan.A.V

asked Jul 22 at 15:14

0 votes

1 answer

56 views

CUBLAS matrix multiplication with row-major data

I read some related posts here, and success using do the row majored matrixes multiplication with cuBLAS: A*B (column majored) = B*A (row majored) I write a wrapper to do this so that I can pass row ...

Weimin Chan

asked Jul 21 at 18:56

1 vote

0 answers

49 views

Cannot determine Numba type of <class 'clr._internal.CLRMetatype'>

I am very new to CUDA... I have written a module in .Net 6.0 and I need to scale up the execution time utilizing CUDA in a Ubuntu machine. The method I need to call is defined as: namespace ...

Gregory Gasteratos

asked Jul 21 at 17:08

15 30 50 per page

2 3 4 5

…

969 Next

Collectives™ on Stack Overflow

Questions tagged [cuda]

Compilation Errors with CUDA Fortran and cuBLAS

Problem with Cmake and including third-party library

Parallel Computing function value with Julia CUDA

Cupy - Changes in included file not updated

How can I use multi cpu core while running cuda with one gpu? [closed]

How many apis does the cutlass library have?

Inconsistent global memory access between blocks despite use of volatile, threadfence and disabling L1 cache

Problems for installing NVIDIA driver in Ubuntu 22.04.2 LTS [closed]

Unable to include thrust/host_vector.h and others with CUDA 12.5

continuously getting the error: 'nvidia-smi' is not recognized as an internal or external command, operable program or batch file [closed]

What are the risks of increasing cudaLimitDevRuntimePendingLaunchCount?

Can CUDA Thrust Kernels operate in parallel on multiple streams?

My benchmark and Nsight Compute don't agree which kernel is faster [closed]

CUBLAS matrix multiplication with row-major data

Cannot determine Numba type of <class 'clr._internal.CLRMetatype'>

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [cuda]

Related Tags