Installing nVidia CUDA 2.3 on gentoo GNU/Linux

Brief notes on installation using gentoo's ebuild portage system

(If your gentoo build is more than a few months old, you might want to ensure its portage files are up to date.)

  1. nVidia support CUDA on a number of Linux platforms. However this does not include gentoo.

  2. The three components of CUDA (nVidia device drivers, CUDA and CUDA SDK 2.3) can be installed in that order.

    Possibly "emerge nvidia-cuda-sdk" could be used to ensure all the missing components required by CUDA SDK are down loaded and installed. However this is not necessary if the three components are installed in order.

  3. These notes suggest using gentoo's ebuild package to down load the required nVidia files (rather than doing it yourself).

  4. login as root.
    Eg via sudo -i

    Some of these steps have to be done as root. Eg adding drivers to the Linux kernel and changing user account privileges. Other parts might be done as a normal user but this has not be investigated.

  5. The packages and their versions support by portage are stored in /usr/portage/
    (The unix find -iname '*nvidia*.ebuild' may be helpful.)

    The ebuild option merge will both fetch the required installation kit and install it. However I preferred to separate the down load and installation phases by using ebuild fetch and later ebuild merge.

  6. Install nVidia device drivers.
    For CUDA 2.3 it appears version 190.* is required. In have used 190.42-r2
    ebuild /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-190.42-r2.ebuild fetch
    ebuild /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-190.42-r2.ebuild merge
    
    (Still as root) the nVidia drivers installation said it needed the following commands:
    modprobe -r nvidia 
    eselect opengl set nvidia
    

  7. Install nVidia CUDA
    ebuild /usr/portage/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-2.3.ebuild fetch
    ebuild /usr/portage/dev-util/nvidia-cuda-toolkit/nvidia-cuda-toolkit-2.3.ebuild merge
    
    I have not used /etc/profile

  8. Install nVidia CUDA SDK 2.3
    ebuild /usr/portage/dev-util/nvidia-cuda-sdk/nvidia-cuda-sdk-2.3.ebuild fetch
    ebuild /usr/portage/dev-util/nvidia-cuda-sdk/nvidia-cuda-sdk-2.3.ebuild merge
    
    There are various compilation warnings (eg for bank_checker) but the build seems to go through ok.

  9. Added relevant uses to video group by editing /etc/group

    I have not (as yet) added nvidia-cuda-profiler or nvidia-cg-toolkit

  10. Setting up X11

    This was by far the most troublesome part.
    Most of the problems stem from the facts that the PC has three different video adapters.

    1. One on the mother board. This appears to be disabled by the addition of any video device on the IDE bus.
    2. The GeForce 295 GTX. Confusingly this GPU appears on the IDE bus and elsewhere as two devices.
    3. Finally there is a small Silicon Integrated Systems VGA compatible controller.
      In the previous configuration, whose main GPGPU was a videoless Tesla, this was used to drive the system monitor. I had hoped to continue to use it for graphics output and leave the 295 only for calculation. This has not been possible (so far). Which means the interactive system appears to hang whilst the 295 is too busy doing calculations and so not refreshing the screen.

    Our gentoo X11 creates /etc/X11/XF86Config as the system starts (deleting the one that was there). Unfortunately it did not manage to get this correct automatically and so X11 failed to start. Eventually we found the hardware configuration (above) which works and adapted X11 to it. Saving X11 configuration in xorg.conf A fragment of xorg.conf:

    Section "Device"
    	Identifier  "Card0"
    	Driver      "nvidia"
    

  11. create Linux devices for the new hardware

    There are command scripts to do this but since I have only one device it seemed easier to create my own. (This needs to be run after each reboot.) The first mknod creates a Linux device for the hardware called "nvidia0". The 295 GTX appears as two devices, so nvidia1 is needed. If you have more nVidia cards, the next one should be nvidia2, and so on. The last mknod creates a (pseudo) Linux device "nvidiactl" shared by all your nVidia hardware.

    #!/bin/bash
    # WBL 26 Nov 2009 GeForce 295 treated as two cards
    #
    #REHL.bat
    # much simplified by WBL 19 March 2009
    #
    # Startup script for nVidia CUDA
    #
    # chkconfig: 345 80 20
    # description: Startup/shutdown script for nVidia CUDA
    
    mknod  -m 666 /dev/nvidia0   c 195   0
    mknod  -m 666 /dev/nvidia1   c 195   1
    #mknod -m 666 /dev/nvidia2   c 195   2 #next card
    mknod  -m 666 /dev/nvidiactl c 195 255
    
    Thats it. All should be well now.

  12. Comfort checks. These should be done by the user (rather than root)
    The Linux command /usr/sbin/lspci, lists whats connected to the PC's PCI bus. Amongst these you should see your nVidia card.

    The nVidia supplied SDK tool deviceQuery show now run and show details of your device:

    /opt/cuda/sdk/C/bin/linux/release/deviceQuery
    

  13. startup scripts

    Not done yet. See CUDA 2.1

  14. Starting to code
    /opt/cuda/sdk/ contains many examples and includes their C++ source code. /opt/cuda/sdk/C/src/simpleTemplates and /opt/cuda/sdk/C/src/template are good places to start.

  15. Early Coding problems:

CUDA 2.1
W. B. Langdon 28 November 2009.