Possibly "emerge nvidia-cuda-sdk" could be used to ensure all the missing components required by CUDA SDK are down loaded and installed. However this is not necessary if the three components are installed in order.
Some of these steps have to be done as root. Eg adding drivers to the Linux kernel and changing user account privileges. Other parts might be done as a normal user but this has not be investigated.
The ebuild option merge will both fetch the required installation kit and install it. However I preferred to separate the down load and installation phases by using ebuild fetch and later ebuild merge.
ebuild /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-190.42-r2.ebuild fetch ebuild /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-190.42-r2.ebuild merge(Still as root) the nVidia drivers installation said it needed the following commands:
modprobe -r nvidia eselect opengl set nvidia
ebuild /usr/portage/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-2.3.ebuild fetch ebuild /usr/portage/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-2.3.ebuild mergeI have not used /etc/profile
ebuild /usr/portage/dev-util/nvidia-cuda-sdk/nvidia-cuda-sdk-2.3.ebuild fetch ebuild /usr/portage/dev-util/nvidia-cuda-sdk/nvidia-cuda-sdk-2.3.ebuild mergeThere are various compilation warnings (eg for bank_checker) but the build seems to go through ok.
I have not (as yet) added nvidia-cuda-profiler or nvidia-cg-toolkit
This was by far the most troublesome part.
Most of the problems stem from the facts that the PC has three different video adapters.
Our gentoo X11 creates /etc/X11/XF86Config as the system starts (deleting the one that was there). Unfortunately it did not manage to get this correct automatically and so X11 failed to start. Eventually we found the hardware configuration (above) which works and adapted X11 to it. Saving X11 configuration in xorg.conf A fragment of xorg.conf:
Section "Device" Identifier "Card0" Driver "nvidia"
There are command scripts to do this but since I have only one device it seemed easier to create my own. (This needs to be run after each reboot.) The first mknod creates a Linux device for the hardware called "nvidia0". The 295 GTX appears as two devices, so nvidia1 is needed. If you have more nVidia cards, the next one should be nvidia2, and so on. The last mknod creates a (pseudo) Linux device "nvidiactl" shared by all your nVidia hardware.
#!/bin/bash # WBL 26 Nov 2009 GeForce 295 treated as two cards # #REHL.bat # much simplified by WBL 19 March 2009 # # Startup script for nVidia CUDA # # chkconfig: 345 80 20 # description: Startup/shutdown script for nVidia CUDA mknod -m 666 /dev/nvidia0 c 195 0 mknod -m 666 /dev/nvidia1 c 195 1 #mknod -m 666 /dev/nvidia2 c 195 2 #next card mknod -m 666 /dev/nvidiactl c 195 255Thats it. All should be well now.
The nVidia supplied SDK tool deviceQuery show now run and show details of your device:
Not done yet. See CUDA 2.1
Try example /opt/cuda/sdk/C/bin/linux/release/simpleMultiGPU
Fix: Enable remote logins by adding
/etc/init.d/sshd startto /etc/conf.d/local.start Disable X11 on the GTX 295. Login remotely and run CUDA jobs the GeForce 295 (which is now free). May now have to explicitly create /dev/nvidia* (cf. 11 above).
Run time limit on kernels: Noon both CUDA enabled devices (ie the two components of the 295).
Fix: Problem caused by moving to new version of GNU C++ compiler, which now deprecates iostream.h etc. Resolved by adding -I/usr/lib/gcc/i686-pc-linux-gnu/4.1.2/include/g++-v4/backward to g++ command line.