Note that there are some explanatory texts on larger screens.

plurals
  1. POCannot create context on NVIDIA device with ECC enabled
    text
    copied!<p>On a node with 4 NVIDIA GPUs I enabled on device 0 the ECC memory protection (all other have ECC disabled). Since I enabled ECC on device 0 my application (CUDA, using just one device) hangs when it tries to create the context on this device 0 (driver API). I don't know why it hangs at that point. If I use a different device setting CUDA_VISIBLE_DEVICE accordingly to another device it works fine. It must have to do with enabling ECC. Any thoughts? Here the output of <code>nvidia-smi</code>: (Why does it report 99% volatile GPU utilization, nothing is running there?) </p> <pre><code>+------------------------------------------------------+ | NVIDIA-SMI 4.304.54 Driver Version: 304.54 | |-------------------------------+----------------------+----------------------+ | GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20m | 0000:02:00.0 Off | 1 | | N/A 29C P0 49W / 225W | 0% 12MB / 4799MB | 99% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m | 0000:03:00.0 Off | 0 | | N/A 22C P8 15W / 225W | 0% 12MB / 4799MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K20m | 0000:83:00.0 Off | 0 | | N/A 22C P8 24W / 225W | 0% 11MB / 4799MB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K20m | 0000:84:00.0 Off | 0 | | N/A 23C P8 25W / 225W | 0% 11MB / 4799MB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | No running compute processes found | +-----------------------------------------------------------------------------+ </code></pre> <p>EDIT: <code>nvidia-smi -a</code> reports ECC enabled on all devices. Strange!</p> <pre><code>==============NVSMI LOG============== Timestamp : Fri Apr 26 10:18:14 2013 Driver Version : 304.54 Attached GPUs : 4 GPU 0000:02:00.0 Product Name : Tesla K20m Display Mode : Disabled Persistence Mode : Enabled Driver Model Current : N/A Pending : N/A Serial Number : 0324512044699 VBIOS Version : 80.10.11.00.0B Inforom Version Image Version : 2081.0208.01.07 OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : Compute Pending : Compute PCI Bus : 0x02 Device : 0x00 Domain : 0x0000 Device Id : 0x102810DE Bus Id : 0000:02:00.0 Sub System Id : 0x101510DE GPU Link Info PCIe Generation Max : 2 Current : 2 Link Width Max : 16x Current : 16x Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Not Active User Defined Clocks : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Unknown : Not Active Memory Usage Total : 4799 MB Used : 12 MB Free : 4787 MB Compute Mode : Default Utilization Gpu : 99 % Memory : 0 % Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 1 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 1 Aggregate Single Bit Device Memory : 1 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 1 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Temperature Gpu : 29 C Power Readings Power Management : Supported Power Draw : 49.51 W Power Limit : 225.00 W Default Power Limit : 225.00 W Min Power Limit : 150.00 W Max Power Limit : 225.00 W Clocks Graphics : 758 MHz SM : 758 MHz Memory : 2600 MHz Applications Clocks Graphics : 705 MHz Memory : 2600 MHz Max Clocks Graphics : 758 MHz SM : 758 MHz Memory : 2600 MHz Compute Processes : None GPU 0000:03:00.0 Product Name : Tesla K20m Display Mode : Disabled Persistence Mode : Enabled Driver Model Current : N/A Pending : N/A Serial Number : 0324512044821 VBIOS Version : 80.10.11.00.0B Inforom Version Image Version : 2081.0208.01.07 OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : Compute Pending : Compute PCI Bus : 0x03 Device : 0x00 Domain : 0x0000 Device Id : 0x102810DE Bus Id : 0000:03:00.0 Sub System Id : 0x101510DE GPU Link Info PCIe Generation Max : 2 Current : 1 Link Width Max : 16x Current : 16x Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active User Defined Clocks : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Unknown : Not Active Memory Usage Total : 4799 MB Used : 12 MB Free : 4787 MB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Aggregate Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Temperature Gpu : 19 C Power Readings Power Management : Supported Power Draw : 15.22 W Power Limit : 225.00 W Default Power Limit : 225.00 W Min Power Limit : 150.00 W Max Power Limit : 225.00 W Clocks Graphics : 324 MHz SM : 324 MHz Memory : 324 MHz Applications Clocks Graphics : 705 MHz Memory : 2600 MHz Max Clocks Graphics : 758 MHz SM : 758 MHz Memory : 2600 MHz Compute Processes : None GPU 0000:83:00.0 Product Name : Tesla K20m Display Mode : Disabled Persistence Mode : Enabled Driver Model Current : N/A Pending : N/A Serial Number : 0324512044783 VBIOS Version : 80.10.11.00.0B Inforom Version Image Version : 2081.0208.01.07 OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : Compute Pending : Compute PCI Bus : 0x83 Device : 0x00 Domain : 0x0000 Device Id : 0x102810DE Bus Id : 0000:83:00.0 Sub System Id : 0x101510DE GPU Link Info PCIe Generation Max : 2 Current : 1 Link Width Max : 16x Current : 16x Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active User Defined Clocks : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Unknown : Not Active Memory Usage Total : 4799 MB Used : 11 MB Free : 4788 MB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Aggregate Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Temperature Gpu : 22 C Power Readings Power Management : Supported Power Draw : 24.74 W Power Limit : 225.00 W Default Power Limit : 225.00 W Min Power Limit : 150.00 W Max Power Limit : 225.00 W Clocks Graphics : 324 MHz SM : 324 MHz Memory : 324 MHz Applications Clocks Graphics : 705 MHz Memory : 2600 MHz Max Clocks Graphics : 758 MHz SM : 758 MHz Memory : 2600 MHz Compute Processes : None GPU 0000:84:00.0 Product Name : Tesla K20m Display Mode : Disabled Persistence Mode : Enabled Driver Model Current : N/A Pending : N/A Serial Number : 0324512044628 VBIOS Version : 80.10.11.00.0B Inforom Version Image Version : 2081.0208.01.07 OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : Compute Pending : Compute PCI Bus : 0x84 Device : 0x00 Domain : 0x0000 Device Id : 0x102810DE Bus Id : 0000:84:00.0 Sub System Id : 0x101510DE GPU Link Info PCIe Generation Max : 2 Current : 1 Link Width Max : 16x Current : 16x Fan Speed : N/A Performance State : P8 Clocks Throttle Reasons Idle : Active User Defined Clocks : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Unknown : Not Active Memory Usage Total : 4799 MB Used : 11 MB Free : 4788 MB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Aggregate Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : 0 Total : 0 Temperature Gpu : 23 C Power Readings Power Management : Supported Power Draw : 25.47 W Power Limit : 225.00 W Default Power Limit : 225.00 W Min Power Limit : 150.00 W Max Power Limit : 225.00 W Clocks Graphics : 324 MHz SM : 324 MHz Memory : 324 MHz Applications Clocks Graphics : 705 MHz Memory : 2600 MHz Max Clocks Graphics : 758 MHz SM : 758 MHz Memory : 2600 MHz Compute Processes : None </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload