From Olivier.Marsden at ec-lyon.fr Wed Jul 1 02:12:03 2009 From: Olivier.Marsden at ec-lyon.fr (Olivier Marsden) Date: Wed, 01 Jul 2009 11:12:03 +0200 Subject: [cAos] problem booting caos/perceus nodes In-Reply-To: <571f1a060906301241wcb82249w22f6bd9b12c31a1f@mail.gmail.com> References: <4A4A2AFA.9090404@ec-lyon.fr> <571f1a060906301241wcb82249w22f6bd9b12c31a1f@mail.gmail.com> Message-ID: <4A4B2863.3080500@ec-lyon.fr> >> >> I now have two follow-up questions: >> - I would like the nic cards (two dual-port cards per node) to be named >> consistently, which doesn't happen for the >> moment; I know how to do this with udev and 70-persistent-net.rules on >> a harddisk install, but have no clue for >> a diskless install? >> > > What do you mean they are not consistent? Like on one boot, nic1 = > eth0 and on the second nic2 = eth0? > > Do they use the same kernel module/driver? > > You can always create a nodescript to add the correct entries for > udev, but it sounds like something else weirder is going on. > > Each node has a hard-wired pci-express broadcom dual nic operated by the tg3 driver, and an add-in pci-express intel dual nic card operated by the e1000e driver. The order between 2 nics on the same card is reproducible, ie. if nic1-port1=eth0 then nic1-port2 = eth1. However, occasionally I'll have nic1-port1=eth3 end nic2-port1=eth0 (hope that's understandable!). If I want to use udev to correct this, would the recommended method be to create a rule based on the driver, and put this in the vnfs? >> - I am using a scratch partition on local hard disks. I have put a line >> in the fstab in the vnfs, with the correct device name >> etc, and made the mountpoint in the vnfs file system, but the mount is >> not performed during the diskless install. What am >> I missing here? (sorry if this is more of a perceus question; should I >> post it there?) >> > > Try to add "mount -a" to the /etc/rc.local in the VNFS. It has to do > with the fact that /fastboot exists which bypasses local mounts. We > already have this listed as a bug and are undergoing testing of > solutions now. > > I'll try that, thanks. Olivier Marsden From stefan at mdy.univie.ac.at Thu Jul 9 04:31:34 2009 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Thu, 9 Jul 2009 13:31:34 +0200 Subject: [cAos] Nvidia CUDA under Caos NSA 1.x Message-ID: <20090709113134.GP19565@loop.mdy.univie.ac.at> Just wanted to ask whether anyone has hints / suggestions concerning the use of the NVIDIA CUDA environment under Caos NSA. In particular, when downloading from nvidia, what distribution should I click (RHEL 4, RHEL 5 ??) Thanks in advance, Stefan Boresch -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From glykos at mbg.duth.gr Thu Jul 9 05:41:35 2009 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Thu, 9 Jul 2009 15:41:35 +0300 (EEST) Subject: [cAos] Nvidia CUDA under Caos NSA 1.x In-Reply-To: <20090709113134.GP19565@loop.mdy.univie.ac.at> References: <20090709113134.GP19565@loop.mdy.univie.ac.at> Message-ID: > Just wanted to ask whether anyone has hints / suggestions concerning > the use of the NVIDIA CUDA environment under Caos NSA. In particular, > when downloading from nvidia, what distribution should I click (RHEL 4, > RHEL 5 ??) We had no problems with the following combination: NVIDIA-Linux-x86_64-180.22-pkg2.run cuda-sdk-linux-2.10.1215.2015-3233425.run cudatoolkit_2.1_linux64_fedora9.run -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ From gmkurtzer at gmail.com Thu Jul 9 08:05:58 2009 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 9 Jul 2009 08:05:58 -0700 Subject: [cAos] Nvidia CUDA under Caos NSA 1.x In-Reply-To: <20090709113134.GP19565@loop.mdy.univie.ac.at> References: <20090709113134.GP19565@loop.mdy.univie.ac.at> Message-ID: <571f1a060907090805v4b53f5bbr742d55b9021b1fc1@mail.gmail.com> Try: $ sudo smart install nvidia-cuda nvidia-cuda-devel We have version 185.18.14 in nsa-testing too. :) On Thu, Jul 9, 2009 at 4:31 AM, Stefan Boresch wrote: > Just wanted to ask whether anyone has hints / suggestions concerning > the use of the NVIDIA CUDA environment under Caos NSA. In particular, > when downloading from nvidia, what distribution should I click (RHEL 4, > RHEL 5 ??) > > Thanks in advance, > > Stefan Boresch > > -- > Stefan Boresch > Institute for Computational Biological Chemistry > University of Vienna, Waehringerstr. 17 ? ? ? A-1090 Vienna, Austria > Phone: -43-1-427752715 ? ? ? ? ? ? ? ? ? ? ? ?Fax: ? -43-1-427752790 > _______________________________________________ > cAos mailing list > cAos at caoslinux.org > http://lists.caosity.org/mailman/listinfo/caos > -- Greg Kurtzer http://www.infiscale.com/ http://www.perceus.org/ http://www.caoslinux.org/ From ron at nscee.edu Mon Jul 13 10:33:58 2009 From: ron at nscee.edu (Ron Young) Date: Mon, 13 Jul 2009 10:33:58 -0700 Subject: [cAos] newbie question of disk layout Message-ID: <20090713173358.C54CCFF892@ron2> Hi Everyone: I am a brand new caos user. I am trying to install caos-nsa on an x86-64 cluster and have a question about disk layout. When I installed the os, I specified layout=bigsrv. This created the following layout: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 19694868 2375672 16318752 13% / /dev/sda1 489992 15069 449623 4% /boot /dev/sda5 939262724 206068 891344816 1% /srv none 524288 4 524284 1% /tmp Question: how can I specify that I want to create multiple filesystems (i.e. /srv, /var, etc...)? Having one big filesystem will kill our backups. thanks -ron young =============================================================================== Ron Young, Research Support Analyst National Supercomputing Center for Energy and the Environment 4505 Maryland Parkway, Box 454028, Las Vegas, NV 89154-4028 v (702) 895-4017 / f (702) 895-4156 / email: ron.young at nscee.edu From gmkurtzer at gmail.com Mon Jul 13 10:54:35 2009 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Mon, 13 Jul 2009 10:54:35 -0700 Subject: [cAos] newbie question of disk layout In-Reply-To: <20090713173358.C54CCFF892@ron2> References: <20090713173358.C54CCFF892@ron2> Message-ID: <571f1a060907131054k5ac17734qb8358e2dd4e3d898@mail.gmail.com> Hello Ron, One way would be to use the "layout=manual" boot option. It isn't the simplest process in the world, but if you are familiar with fdisk, you will be just fine. We also have another installer that we hope to release soon (it is in testing now). I can point you to it if your interested. It takes a kickstart like configuration file and in there you can define any partition layout you wish. Thanks, Greg On Mon, Jul 13, 2009 at 10:33 AM, Ron Young wrote: > > Hi Everyone: > > ? I am a brand new caos user. I am trying to install caos-nsa on an x86-64 > ? cluster and have a question about disk layout. > > ? When I installed the os, I specified layout=bigsrv. This created the > ? following layout: > > ? Filesystem ? ? ? ? ? 1K-blocks ? ? ?Used Available Use% Mounted on > ? /dev/sda2 ? ? ? ? ? ? 19694868 ? 2375672 ?16318752 ?13% / > ? /dev/sda1 ? ? ? ? ? ? ? 489992 ? ? 15069 ? ?449623 ? 4% /boot > ? /dev/sda5 ? ? ? ? ? ?939262724 ? ?206068 891344816 ? 1% /srv > ? none ? ? ? ? ? ? ? ? ? ?524288 ? ? ? ? 4 ? ?524284 ? 1% /tmp > > ? Question: how can I specify that I want to create multiple filesystems > ? (i.e. /srv, /var, etc...)? Having one big filesystem will kill our > ? backups. > > ? thanks > > ? -ron young > > =============================================================================== > Ron Young, Research Support Analyst > National Supercomputing Center for Energy and the Environment > 4505 Maryland Parkway, Box 454028, Las Vegas, NV 89154-4028 > v (702) 895-4017 / f (702) 895-4156 / email: ron.young at nscee.edu > _______________________________________________ > cAos mailing list > cAos at caoslinux.org > http://lists.caosity.org/mailman/listinfo/caos > -- Greg Kurtzer http://www.infiscale.com/ http://www.perceus.org/ http://www.caoslinux.org/ From stefan at mdy.univie.ac.at Mon Jul 20 02:43:29 2009 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Mon, 20 Jul 2009 11:43:29 +0200 Subject: [cAos] Strange grub failure during install Message-ID: <20090720094329.GW19565@loop.mdy.univie.ac.at> Hi, I am trying to throw caos on a new box using the base CD-ROM (caos-nsa-base-1.0.8.x86_64.iso from caos.osuosl.org (md5sum checked both on image and CD)) installer claims to have full hardware support, partitions, formats and copies filesystem. However, grub install fails with something like: /sbin/grub-set-default line 96 cannot create tempfile mktemp [2 failure messages] /sbin/grub-install line 429 cannot create tempfile and then hangs. I am a little bit stuck at this point. Booting into rescue mode showed that there indeed is not /tmp on the root partition (but isn't /tmp usually a ram file system anyways ??) This is very recent hardware, but if the system knows how to format the disk (I can chroot into the filesystem from rescue), it should be able to install grub ?? Confused ... Having always left installing grub to installers, a manual attempt to install grub was without success (but that was most likely me) Hints appreciated, Stefan -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From stefan at mdy.univie.ac.at Mon Jul 20 03:59:08 2009 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Mon, 20 Jul 2009 12:59:08 +0200 Subject: [cAos] Strange grub failure during install In-Reply-To: <20090720094329.GW19565@loop.mdy.univie.ac.at> References: <20090720094329.GW19565@loop.mdy.univie.ac.at> Message-ID: <20090720105908.GX19565@loop.mdy.univie.ac.at> So, for what it's worth, the normal (full) image has no problems ... I think the problem is indeed a missing /tmp. When I rescue boot into the aborted installation from the base CD, /newroot does not contain a tmp directory. If I do this after an install from the full CD, there is /newroot/tmp Best regards, Stefan On Mon, Jul 20, 2009 at 11:43:29AM +0200, Stefan Boresch wrote: > Hi, > > I am trying to throw caos on a new box using the base CD-ROM > (caos-nsa-base-1.0.8.x86_64.iso from caos.osuosl.org (md5sum checked both > on image and CD)) > > installer claims to have full hardware support, partitions, formats and > copies filesystem. However, grub install fails with something like: > > /sbin/grub-set-default line 96 cannot create tempfile > mktemp [2 failure messages] > /sbin/grub-install line 429 cannot create tempfile > > and then hangs. I am a little bit stuck at this point. Booting into > rescue mode showed that there indeed is not /tmp on the root partition (but > isn't /tmp usually a ram file system anyways ??) > > This is very recent hardware, but if the system knows how to format > the disk (I can chroot into the filesystem from rescue), it should be > able to install grub ?? Confused ... > > Having always left installing grub to installers, a manual attempt > to install grub was without success (but that was most likely me) > > Hints appreciated, > > Stefan > > -- > Stefan Boresch > Institute for Computational Biological Chemistry > University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria > Phone: -43-1-427752715 Fax: -43-1-427752790 > _______________________________________________ > cAos mailing list > cAos at caoslinux.org > http://lists.caosity.org/mailman/listinfo/caos > -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From stefan at mdy.univie.ac.at Mon Jul 20 06:46:48 2009 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Mon, 20 Jul 2009 15:46:48 +0200 Subject: [cAos] Nvidia CUDA under Caos NSA 1.x In-Reply-To: <571f1a060907090805v4b53f5bbr742d55b9021b1fc1@mail.gmail.com> References: <20090709113134.GP19565@loop.mdy.univie.ac.at> <571f1a060907090805v4b53f5bbr742d55b9021b1fc1@mail.gmail.com> Message-ID: <20090720134648.GZ19565@loop.mdy.univie.ac.at> Greg, On Thu, Jul 09, 2009 at 08:05:58AM -0700, Greg Kurtzer wrote: > Try: > > $ sudo smart install nvidia-cuda nvidia-cuda-devel > > We have version 185.18.14 in nsa-testing too. > > :) > > thanks for the welcome info. Some feedback, because it wasn't all smooth sailing ... (all this applies to the 185.18.14 drivers in nsa-testing -- I didn't try the default 180.x drivers) Oh, and this is on 64bit Linux. * Initially I thought that the abovementioned rpms include the cuda compiler and the SDK -- these still have to be installed by hand (I used the RHEL 5.3 package from the nvidia site) That should not really be an issue because installing these is simple enough. However, for some reason the 2.2.1 SDK would not install for me (extraction occurs, but the perl install script just hangs). I solved the problem by a 'cp -a' from /tmp into the default location; anyways, this looks something nvidia has screwed up. * Installing nvidia-devel / nvidia-cuda-devel enforces the deinstallation of freeglut-devel and mesalibs-devel. Unfortunately, no more glu.h and friends ... Many cuda testcases don't compile because of this. Since this is a test machine, I solved this by brute force, in the hope of not being bitten later: rpm -i --nodeps freeglut-devel mesalibs-devel rpm -i --nodeps --force nvidia-devel nvidia-cuda-devel Thus, I hopefully have the correct nvidia libs and includefiles, together with e.g. glu.h from freeglut-devel. * The placing of libcuda in /usr/lib64/nvidia confuses the nvidia common.mk Makefile as it doesn't find -lcuda during linking. I fixed this by adding -L/usr/lib64/nvidia explicitly to $(LIB) in common.mk (strangely, this affects only a single test ...) * The most fun then was that the cuda device (GTX285) wasn't found ... This was fixed by rerunning ldconfig. I think the following happened: One of the nvidia rpms apparently adds nvidia.conf containing /usr/lib64/nvidia to /etc/ld.so.conf.d, but it does not rerun ldconfig (or maybe it was only my brute force rpm fix above). Of course, I had not rebooted after (re)installing the rpms and so this crucial path was missing ... Anyways, now things seem to work (all tests I actually executed seemed to run fine), so I can start trying to make something useful with the machine. Thanks, Stefan -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From stefan at mdy.univie.ac.at Tue Jul 21 06:06:43 2009 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Tue, 21 Jul 2009 15:06:43 +0200 Subject: [cAos] Linux software raid Message-ID: <20090721130643.GI19565@loop.mdy.univie.ac.at> Am I overlooking something or are there no provisions / support for a linux software raid in caos? Note, I envision a relatively simplistic setup, i.e., a lowly system disk which in case of a crash gets replaced and the OS reinstalled, and two (storage) disks which depending on the system are either configured as raid 0 (speed) or 1 (safety). In other words, I don't even ask for booting from software raid ... As far as I can figure out, caos doesn't provide rules for udev to generate /dev/md* (presumably one could misuse CREATE_DEVS in /etc/sysconfig/sysinit to generate missing devices); similarly, adding raid0/raid1 to LOAD_MODULES (should) take(s) care of the kernel support side. However, since mount -a -t tmpfs,ext2,ext3,ext4,reiserfs,xfs,jfs,btrfs,iso9660 -O no_netdev is hardwired in sysinit, I don't see how I can assemble my raid before sysinit is executed ... (other than editing /etc/init.d/sysconfig, which presumably will get overwritten whenever /etc/init.d/sysconfig is updated (*)) [(*) there is something going for the Debian policy that nothing ever is supposed to overwrite files in /etc] In one case I have worked around the issue by stuffing all the necessary commands in rc.local, but it feels really clumsy ... Please tell me that I am overlooking something obvious ! Thanks, Stefan PS: Offtopic: Any suggestions what file system to use for a 14TByte storage partition (this is a hardware raid6 ;-) Ext3 (which I run everywhere else) should be able to handle this, but I dread the first fsck I have to run ... The filesystem will mainly store large "trajectories" from MD simulations (large (binary) files (1-10GB each) which upon demand will be read sequentially) -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From gas5x at yahoo.com Tue Jul 21 19:24:35 2009 From: gas5x at yahoo.com (Grigory Shamov) Date: Tue, 21 Jul 2009 19:24:35 -0700 (PDT) Subject: [cAos] Linux software raid In-Reply-To: <20090721130643.GI19565@loop.mdy.univie.ac.at> Message-ID: <795739.51588.qm@web111310.mail.gq1.yahoo.com> Thats very interesting question! On my cluster, with nodes being booted from diskless Perceus capsules, I wanted to add a local scratch. So I built raid-0 build from nodes' local disks, for speed. Had to add mdadm to the caos capsule, created raid on each node by hand, and formatted as ext4, also by hand. Somehow it does mount after reboots -- although sometimes fsck stalls; usually I kill it and reformat the scratch disk anew. While it works, the procedure was quite un-elegant . Is there a better way? -- WBR, Grigory Shamov Chemistry, University of Manitoba --- On Tue, 7/21/09, Stefan Boresch wrote: > From: Stefan Boresch > Subject: [cAos] Linux software raid > To: caos at caoslinux.org > Date: Tuesday, July 21, 2009, 6:06 AM > Am I overlooking something or are > there no provisions / support for a linux > software raid in caos? > > Note, I envision a relatively simplistic setup, i.e., a > lowly > system disk which in case of a crash gets replaced and the > OS reinstalled, and two (storage) disks which depending on > the system > are either configured as raid 0 (speed) or 1 (safety). In > other words, > I don't even ask for booting from software raid ... > > As far as I can figure out, caos doesn't provide rules for > udev to > generate /dev/md* (presumably one could misuse CREATE_DEVS > in > /etc/sysconfig/sysinit to generate missing devices); > similarly, > adding raid0/raid1 to LOAD_MODULES (should) take(s) care of > the > kernel support side.? However, since > > mount -a -t > tmpfs,ext2,ext3,ext4,reiserfs,xfs,jfs,btrfs,iso9660 -O > no_netdev > > is hardwired in sysinit, I don't see how I can assemble my > raid before > sysinit is executed ... (other than editing > /etc/init.d/sysconfig, > which presumably will get overwritten whenever > /etc/init.d/sysconfig > is updated (*)) > > [(*) there is something going for the Debian policy that > nothing ever > is supposed to overwrite files in /etc] > > In one case I have worked around the issue by stuffing all > the > necessary commands in rc.local, but it feels really clumsy > ... > > Please tell me that I am overlooking something obvious ! > > Thanks, > > Stefan > > PS: Offtopic: Any suggestions what file system to use for a > 14TByte > storage partition (this is a hardware raid6 ;-) Ext3 (which > I run > everywhere else) should be able to handle this, but I dread > the first > fsck I have to run ... The filesystem will mainly store > large > "trajectories" from MD simulations (large (binary) files > (1-10GB each) > which upon demand will be read sequentially) > > > -- > Stefan Boresch > Institute for Computational Biological Chemistry > University of Vienna, Waehringerstr. 17? ? > ???A-1090 Vienna, Austria > Phone: -43-1-427752715? ? ? ? ? > ? ? ? ? ? ? ? > Fax:???-43-1-427752790 > _______________________________________________ > cAos mailing list > cAos at caoslinux.org > http://lists.caosity.org/mailman/listinfo/caos >