From glykos at mbg.duth.gr Tue Mar 2 06:53:00 2010 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Tue, 2 Mar 2010 16:53:00 +0200 (EET) Subject: [Caos] Caos & CUDA on a stateless node Message-ID: Dear All, We just got a new node to add to our toy CAOS-based cluster. The new box comes with a GTX-295 for CUDA calculations. After inserting the nvidia kernel module (dmesg looks OK: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 190.53 Wed Dec 9 15:29:46 PST 2009) and creating the entries in /dev (as per toolkit instructions), our cuda programs still complain that they can not detect any 'CUDA-capable devices'. What am I missing ? Do I need to install (on the node) all files and libraries distributed by nvidia with the driver (although there will be no X running on it ?). Thank you, Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ From astevens at infiscale.com Tue Mar 2 08:15:35 2010 From: astevens at infiscale.com (astevens at infiscale.com) Date: Tue, 2 Mar 2010 16:15:35 +0000 Subject: [Caos] Caos & CUDA on a stateless node Message-ID: <28125120-1267546532-cardhu_decombobulator_blackberry.rim.net-300111341-@bda505.bisx.prod.on.blackberry> I believe you do need to install all the extras on the newer ones. We had talked to some people from nvidia at SC09 but we never received the test gear they publicly promised so we can't verify locally. I will ping our DoD buddies that had us at that talk and see if it was just trade show talk ;) Anyone with non-flakey friends at nvidia are welcome to put us in contact, as full support would likely be as easy as getting us some local loaners. :) I will ping a couple of our users for you and see if I can verify some settings. Arthur ------Original Message------ From: Nicholas M Glykos Sender: caos-bounces at lists.infiscale.org To: Caos Linux discussion and support forum ReplyTo: Nicholas M Glykos ReplyTo: Caos Linux discussion and support forum Subject: [Caos] Caos & CUDA on a stateless node Sent: Mar 2, 2010 6:53 AM Dear All, We just got a new node to add to our toy CAOS-based cluster. The new box comes with a GTX-295 for CUDA calculations. After inserting the nvidia kernel module (dmesg looks OK: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 190.53 Wed Dec 9 15:29:46 PST 2009) and creating the entries in /dev (as per toolkit instructions), our cuda programs still complain that they can not detect any 'CUDA-capable devices'. What am I missing ? Do I need to install (on the node) all files and libraries distributed by nvidia with the driver (although there will be no X running on it ?). Thank you, Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ _______________________________________________ Caos mailing list Caos at lists.infiscale.org http://lists.infiscale.org/mailman/listinfo/caos Sent via BlackBerry from T-Mobile From glykos at mbg.duth.gr Tue Mar 2 08:27:26 2010 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Tue, 2 Mar 2010 18:27:26 +0200 (EET) Subject: [Caos] Caos & CUDA on a stateless node In-Reply-To: References: Message-ID: Apologies for the e-mail traffic and for answering my own question! As it turned-out the cuda libraries (libcuda.so,libcuda.so.1,libcuda.so.190.53) are also needed for cuda programs to run. Nicholas On Tue, 2 Mar 2010, Nicholas M Glykos wrote: > > Dear All, > > We just got a new node to add to our toy CAOS-based cluster. The new box > comes with a GTX-295 for CUDA calculations. After inserting the nvidia > kernel module (dmesg looks OK: NVRM: loading NVIDIA UNIX x86_64 Kernel > Module 190.53 Wed Dec 9 15:29:46 PST 2009) and creating the entries in > /dev (as per toolkit instructions), our cuda programs still complain that > they can not detect any 'CUDA-capable devices'. What am I missing ? Do I > need to install (on the node) all files and libraries distributed by > nvidia with the driver (although there will be no X running on it ?). > > Thank you, > Nicholas > > > -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ From stefan at mdy.univie.ac.at Wed Mar 3 00:05:36 2010 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Wed, 3 Mar 2010 09:05:36 +0100 Subject: [Caos] Softwareraid Message-ID: <20100303080535.GJ3827@loop.mdy.univie.ac.at> Just wondering whether there has been any progress concerning the support of software RAID under caos. For some of our compute nodes used for the analysis of large data sets, having a raid0 partition would be quite useful (this can be set up from rc.local, *although* earlier in the boot sequence wouldn't harm); in addition, I on some machines I wouldn't mind a full install of the OS on (software) raid 1 (just did this under Debian, where it works out of the box like a charmm, and it's nice to see a working system with one harddisk plugged). I am ready to do some work on my own (having the Debian system as template should help): I assume that the install should be doable after some trial and error (the live CD should help tremendously in this respect, right?) What worries me is maintaining the system, since the last time I checked there was none (very limited) support in the default init scripts. Thus, upgrading the kernel and/or the sysinit package is likely to screw everything (it used to be that even rc.local was overwritten, but I believe this is fixed now). Info, hints, comments, including why you think software raid a bad idea(*) in the first place are appreciated -- thanks in advance! Stefan (*) Why do I consider software raid: (1) we are poor, even the 120-150 Euros for a two-channel 3ware controller can be steep for us. (2) In the case of a crazy system crash (controller dead), I'd actually be more comfortable with a software raid (we are talking raid1) which any linux machine should be able to read without the need for add. hardware. Given (1), I don't have any spare raid controllers lying around ... -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From yates at cct.lsu.edu Mon Mar 8 15:11:44 2010 From: yates at cct.lsu.edu (Adam Yates) Date: Mon, 08 Mar 2010 17:11:44 -0600 Subject: [Caos] papi / perfctr Message-ID: <4B958430.9030704@cct.lsu.edu> Hi all; I need to recompile the kernel in caos-nsa (2.6.31.6-2.caos) to support papi / perfctr. Has anyone had experience doing this previously? I don't want to spend a ton of time on this if someone else has previously ironed out the issues in getting this accomplished. Any help is greatly appreciated! Thanks, Adam -- Adam Yates Systems Administrator -- Research Infrastructure Center for Computation and Technology 232 Johnston Hall, Baton Rouge, LA 70803 W: 225.578.8235 C: 225.663.0218 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.infiscale.org/pipermail/caos/attachments/20100308/8681d388/attachment.html From gartim at gmail.com Wed Mar 17 11:50:22 2010 From: gartim at gmail.com (gary artim) Date: Wed, 17 Mar 2010 11:50:22 -0700 Subject: [Caos] adding library (iblzma.so.0) to vnfs for R? Message-ID: Hi -- If you need liblzma.so.0 added to your nodes vnfs what is the cleanest way of doing this? Thanks for any advise! -- G. From glykos at mbg.duth.gr Wed Mar 17 12:01:02 2010 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Wed, 17 Mar 2010 21:01:02 +0200 (EET) Subject: [Caos] adding library (iblzma.so.0) to vnfs for R? In-Reply-To: References: Message-ID: > If you need liblzma.so.0 added to your nodes vnfs what is the cleanest > way of doing this? Thanks for any advise! You'll hear expert advice from the developers, but why would you want application-specific libraries in the capsule ? Just place it in a suitable NFS-mounted volume (say, /usr/local/lib) and adjust LD_LIBRARY_PATH accordingly (and I'm sure I'll regret giving you this advice .-) Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ From gartim at gmail.com Wed Mar 17 12:05:18 2010 From: gartim at gmail.com (gary artim) Date: Wed, 17 Mar 2010 12:05:18 -0700 Subject: [Caos] adding library (iblzma.so.0) to vnfs for R? In-Reply-To: References: Message-ID: will try it out, thanks for the quick reply! -- Gary On Wed, Mar 17, 2010 at 12:01 PM, Nicholas M Glykos wrote: > > >> If you need liblzma.so.0 added to your nodes vnfs what is the cleanest >> way of doing this? Thanks for any advise! > > You'll hear expert advice from the developers, but why would you want > application-specific libraries in the capsule ? Just place it in a > suitable NFS-mounted volume (say, /usr/local/lib) and adjust > LD_LIBRARY_PATH accordingly (and I'm sure I'll regret giving you this > advice .-) > > > Nicholas > > > > -- > > > ? ? ? ? ?Dr Nicholas M. Glykos, Department of Molecular Biology > ? ? and Genetics, Democritus University of Thrace, University Campus, > ?Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, > ? ?Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ > > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > From gartim at gmail.com Wed Mar 17 12:09:48 2010 From: gartim at gmail.com (gary artim) Date: Wed, 17 Mar 2010 12:09:48 -0700 Subject: [Caos] adding library (iblzma.so.0) to vnfs for R? In-Reply-To: References: Message-ID: yep, that worked. how do you keep track of what you did on those exported libs...just curious...G. On Wed, Mar 17, 2010 at 12:05 PM, gary artim wrote: > will try it out, thanks for the quick reply! -- Gary > > On Wed, Mar 17, 2010 at 12:01 PM, Nicholas M Glykos wrote: >> >> >>> If you need liblzma.so.0 added to your nodes vnfs what is the cleanest >>> way of doing this? Thanks for any advise! >> >> You'll hear expert advice from the developers, but why would you want >> application-specific libraries in the capsule ? Just place it in a >> suitable NFS-mounted volume (say, /usr/local/lib) and adjust >> LD_LIBRARY_PATH accordingly (and I'm sure I'll regret giving you this >> advice .-) >> >> >> Nicholas >> >> >> >> -- >> >> >> ? ? ? ? ?Dr Nicholas M. Glykos, Department of Molecular Biology >> ? ? and Genetics, Democritus University of Thrace, University Campus, >> ?Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, >> ? ?Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ >> >> _______________________________________________ >> Caos mailing list >> Caos at lists.infiscale.org >> http://lists.infiscale.org/mailman/listinfo/caos >> > From gas5x at yahoo.com Wed Mar 17 20:30:07 2010 From: gas5x at yahoo.com (Grigory Shamov) Date: Wed, 17 Mar 2010 20:30:07 -0700 (PDT) Subject: [Caos] kipmi kernel errors on Sun cluster Message-ID: <639334.16902.qm@web111304.mail.gq1.yahoo.com> Dear All, I have somewhat puzzling problems with my small Perceus/CaosLinux cluster. I have a server and seven compute nodes, all Sun X2200 M2'boxes, dual quad AMD Opterons, with kernel 2.6.28.3-1. The nodes have Sun ILOMs. The cluster works, but at some point (that I cannot reproduce, but it happened more than once) on some of the nodes a lot of errors comes up, related to kernels and IPMI. The flood of error logs eventually reaches the server and sometimes makes it hang. During the tipe, the kipmi and ksoftirqd/1 processes on the nodes are active, consume 100% of CPU time each, and produce lot of messages in the system log, "kipmi refcount overflow" and "ksoftirq refcount overflow". Could you please suggest, what might cause/trigger these kipmi problems, and how to rid of them? Thank you very much in advance! -- With best regards, Grigory Shamov Research Associate Chemistry, University of Manitoba From astevens at infiscale.com Thu Mar 18 01:25:41 2010 From: astevens at infiscale.com (Arthur Stevens) Date: Thu, 18 Mar 2010 01:25:41 -0700 Subject: [Caos] kipmi kernel errors on Sun cluster In-Reply-To: <639334.16902.qm@web111304.mail.gq1.yahoo.com> References: <639334.16902.qm@web111304.mail.gq1.yahoo.com> Message-ID: <4BA1E385.8020003@infiscale.com> Have you tried updating to the 2.6.31 kernel? Could you send me a dmsg? Thanks, Arthur On 3/17/2010 8:30 PM, Grigory Shamov wrote: > Dear All, > > I have somewhat puzzling problems with my small Perceus/CaosLinux cluster. > > I have a server and seven compute nodes, all Sun X2200 M2'boxes, dual quad AMD Opterons, with kernel 2.6.28.3-1. The nodes have Sun ILOMs. The cluster works, but at some point (that I cannot reproduce, but it happened more than once) on some of the nodes a lot of errors comes up, related to kernels and IPMI. > > The flood of error logs eventually reaches the server and sometimes makes it hang. During the tipe, the kipmi and ksoftirqd/1 processes on the nodes are active, consume 100% of CPU time each, and produce lot of messages in the system log, "kipmi refcount overflow" and "ksoftirq refcount overflow". > > Could you please suggest, what might cause/trigger these kipmi problems, and how to rid of them? Thank you very much in advance! > > -- > With best regards, > Grigory Shamov > Research Associate > Chemistry, > University of Manitoba > > > > > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > From stefan at mdy.univie.ac.at Tue Mar 23 05:33:46 2010 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Tue, 23 Mar 2010 13:33:46 +0100 Subject: [Caos] new packages / updates Message-ID: <20100323123346.GH31216@loop.mdy.univie.ac.at> Am I mistaken or have no new rpms shown up in the testing (let alone main) repository of CAOS-NSA since late December 2009? Is this something I should be worried about? I hope this doesn't indicate that caos can't be supported properly (which would be a damned shame!). In this sense a sign of life would be appreciated -- thanks! Best regards, Stefan -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From gartim at gmail.com Tue Mar 23 08:52:32 2010 From: gartim at gmail.com (gary artim) Date: Tue, 23 Mar 2010 08:52:32 -0700 Subject: [Caos] new packages / updates In-Reply-To: <20100323123346.GH31216@loop.mdy.univie.ac.at> References: <20100323123346.GH31216@loop.mdy.univie.ac.at> Message-ID: Good question, I just launched a cluster using caos, now I'm worried...Gary On Tue, Mar 23, 2010 at 5:33 AM, Stefan Boresch wrote: > Am I mistaken or have no new rpms shown up in the testing (let alone > main) repository of CAOS-NSA since late December 2009? Is this > something I should be worried about? I hope this doesn't indicate that > caos can't be supported properly (which would be a damned shame!). In > this sense a sign of life would be appreciated -- thanks! > > Best regards, > > Stefan > > -- > Stefan Boresch > Institute for Computational Biological Chemistry > University of Vienna, Waehringerstr. 17 ? ? ? A-1090 Vienna, Austria > Phone: -43-1-427752715 ? ? ? ? ? ? ? ? ? ? ? ?Fax: ? -43-1-427752790 > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > From ryan.dooley at gmail.com Tue Mar 23 10:10:29 2010 From: ryan.dooley at gmail.com (Ryan Dooley) Date: Tue, 23 Mar 2010 10:10:29 -0700 Subject: [Caos] new packages / updates In-Reply-To: References: <20100323123346.GH31216@loop.mdy.univie.ac.at> Message-ID: <4BA8F605.3090004@gmail.com> On 3/23/2010 8:52 AM, gary artim wrote: > Good question, I just launched a cluster using caos, now I'm worried...Gary > > On Tue, Mar 23, 2010 at 5:33 AM, Stefan Boresch wrote: > >> Am I mistaken or have no new rpms shown up in the testing (let alone >> main) repository of CAOS-NSA since late December 2009? Is this >> something I should be worried about? I hope this doesn't indicate that >> caos can't be supported properly (which would be a damned shame!). In >> this sense a sign of life would be appreciated -- thanks! >> According to http://lists.infiscale.org/pipermail/caos-devel/ there haven't been any commits since December 2009. And wiki.caoslinux.org seems to be down as well. I wonder what we can do to revive the project. Cheers, Ryan From astevens at infiscale.com Tue Mar 23 10:41:50 2010 From: astevens at infiscale.com (astevens at infiscale.com) Date: Tue, 23 Mar 2010 17:41:50 +0000 Subject: [Caos] new packages / updates Message-ID: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> Project is not DOA, it is there are just a couple of us that maintain it and we have all been pretty hammered. I have been working on a debian rebuild (GravityOS) and the new Perceus. I preffer .deb over .rpm and like maintaining a smaller subset (kernel, libs, etc) and having 40K debian packages makes more sense when its just a couple guys maintaining stuff. Especially for Cloud stuff vs HPC. The current iso seemed very stable, got Intel Cluster Ready certified and have not had any package request in a couple of months. If anyone wants to help out, we could totally use some more packagers. We have a nice shell box for building as well. I have a new kernel and have been playing with an upgrade to gcc 4.4 I can add to testing. Feel free to ping me directly if anyone wants to help out or needs something specific packaged. Thanks, Arthur ------Original Message------ From: Ryan Dooley Sender: caos-bounces at lists.infiscale.org To: caos at lists.infiscale.org ReplyTo: Caos Linux discussion and support forum Subject: Re: [Caos] new packages / updates Sent: Mar 23, 2010 10:10 AM On 3/23/2010 8:52 AM, gary artim wrote: > Good question, I just launched a cluster using caos, now I'm worried...Gary > > On Tue, Mar 23, 2010 at 5:33 AM, Stefan Boresch wrote: > >> Am I mistaken or have no new rpms shown up in the testing (let alone >> main) repository of CAOS-NSA since late December 2009? Is this >> something I should be worried about? I hope this doesn't indicate that >> caos can't be supported properly (which would be a damned shame!). In >> this sense a sign of life would be appreciated -- thanks! >> According to http://lists.infiscale.org/pipermail/caos-devel/ there haven't been any commits since December 2009. And wiki.caoslinux.org seems to be down as well. I wonder what we can do to revive the project. Cheers, Ryan _______________________________________________ Caos mailing list Caos at lists.infiscale.org http://lists.infiscale.org/mailman/listinfo/caos Sent via BlackBerry from T-Mobile From astevens at infiscale.com Tue Mar 23 10:49:52 2010 From: astevens at infiscale.com (Arthur Stevens) Date: Tue, 23 Mar 2010 10:49:52 -0700 Subject: [Caos] new packages / updates In-Reply-To: <4BA8F605.3090004@gmail.com> References: <20100323123346.GH31216@loop.mdy.univie.ac.at> <4BA8F605.3090004@gmail.com> Message-ID: <4BA8FF40.2000303@infiscale.com> One last thing, we would provide upgrade scripts and assistance if we ever did cut over to a new distro. We are a small camp, but we love our users and would never leave anyone in limbo, at least I never would :) Arthur On 3/23/2010 10:10 AM, Ryan Dooley wrote: > On 3/23/2010 8:52 AM, gary artim wrote: > >> Good question, I just launched a cluster using caos, now I'm worried...Gary >> >> On Tue, Mar 23, 2010 at 5:33 AM, Stefan Boresch wrote: >> >> >>> Am I mistaken or have no new rpms shown up in the testing (let alone >>> main) repository of CAOS-NSA since late December 2009? Is this >>> something I should be worried about? I hope this doesn't indicate that >>> caos can't be supported properly (which would be a damned shame!). In >>> this sense a sign of life would be appreciated -- thanks! >>> >>> > According to http://lists.infiscale.org/pipermail/caos-devel/ there > haven't been any commits since December 2009. And wiki.caoslinux.org > seems to be down as well. > > I wonder what we can do to revive the project. > > Cheers, > Ryan > > > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > From glykos at mbg.duth.gr Tue Mar 23 12:45:46 2010 From: glykos at mbg.duth.gr (Nicholas M Glykos) Date: Tue, 23 Mar 2010 21:45:46 +0200 (EET) Subject: [Caos] new packages / updates In-Reply-To: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> References: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> Message-ID: > The current iso seemed very stable, got Intel Cluster Ready certified > and have not had any package request in a couple of months. I am happy to second that. I will admit that our cluster is more like a toy (with only 10 nodes, 40 cores), and that our applications are rather limited (molecular dynamics and some crystallography), but: We set-up caos on January 2009, and since that time the only way to stop the cluster from running smoothly has been to cut its power (which, unfortunately, happens all too often in our part of the world). In other words, even if updates did existed, I would hesitate to fix something that, at least to our experience, is working flawlessly. My twocents, Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ From gartim at gmail.com Tue Mar 23 14:16:21 2010 From: gartim at gmail.com (gary artim) Date: Tue, 23 Mar 2010 14:16:21 -0700 Subject: [Caos] new packages / updates In-Reply-To: References: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> Message-ID: sound good, had me worried -- I'm still training users and learning myself! as far as maintaining packages, I may be interested. Is the a single place to read up on what you need to know? -- Gary On Tue, Mar 23, 2010 at 12:45 PM, Nicholas M Glykos wrote: > > >> The current iso seemed very stable, got Intel Cluster Ready certified >> and have not had any package request in a couple of months. > > > I am happy to second that. I will admit that our cluster is more like a > toy (with only 10 nodes, 40 cores), and that our applications are rather > limited (molecular dynamics and some crystallography), but: > ?We set-up caos on January 2009, and since that time the only way to stop > the cluster from running smoothly has been to cut its power (which, > unfortunately, happens all too often in our part of the world). In other > words, even if updates did existed, I would hesitate to fix something > that, at least to our experience, is working flawlessly. > > My twocents, > Nicholas > > > -- > > > ? ? ? ? ?Dr Nicholas M. Glykos, Department of Molecular Biology > ? ? and Genetics, Democritus University of Thrace, University Campus, > ?Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, > ? ?Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/ > > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > From stefan at mdy.univie.ac.at Wed Mar 24 02:47:48 2010 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Wed, 24 Mar 2010 10:47:48 +0100 Subject: [Caos] new packages / updates In-Reply-To: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> References: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> Message-ID: <20100324094747.GA2680@loop.mdy.univie.ac.at> Arthur, On Tue, Mar 23, 2010 at 05:41:50PM +0000, astevens at infiscale.com wrote: > Project is not DOA, it is there are just a couple of us that > maintain it and we have all been pretty hammered. first, I want to echo Nicholas' praise; caos overall has performed very nicely, there are no immediate problems, and perceus is just a gift from the heavans for someone doing system admin on the side and having to setup and maintain clusters. > The current iso seemed very stable, got Intel Cluster Ready certified Agreed, however, I have to play devil's advocate: ... > and have not had any package request in a couple of months. ... and no security issues either? In addition, I am a bit confused what to do in my situation: Over the next months I have to upgrade most of the group's infrastructure, which includes desktops, NFS servers, an older, second cluster not yet converted to perceus, mail server, web server, local dns server etc. My plan was to use Ubuntu and/or Debian squeeze for the desktops, and have the rest run caos; in fact, some of our big RAIDs for data (incidentally, we also do molecular dynamics) are already running caos. I can live with a small core of packages and add missing pieces myself as needed, even trying to package them up for others as well; *however*, aside from the cluster clients I need reasonably reliable *security updates* (some of these machines would/will/have to be open to the wide, evil net ...). I can't risk being rooted because of a security issue that was fixed upstream 6 months ago (and, my primary job description is research and teaching, not sysadmin) My gut reaction at this point is to switch to debian (testing/squeeze) for the infrastructure machines, which I have on some boxes and which is shaping up very nicely (IMHO). Incidentally, The cluster I can/could keep on caos. Then, of course, 'gravityOS' seems intriguing also, but where does it fit my plans? Is this mainly intended to replace caos as a 'lightweight' OS on the nodes? what would be the relation with debian upstream? Of course, this also raises a bit the question: given the problems of caos, what is the long time status of perceus (and while I can/could do without caos, loosing perceus would be catastrophic ...) Again, cheers and thanks for what you did with caos/perceus so far!! Best regards, Stefan -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790 From astevens at infiscale.com Wed Mar 24 09:41:27 2010 From: astevens at infiscale.com (Arthur Stevens) Date: Wed, 24 Mar 2010 09:41:27 -0700 Subject: [Caos] new packages / updates In-Reply-To: <20100324094747.GA2680@loop.mdy.univie.ac.at> References: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> <20100324094747.GA2680@loop.mdy.univie.ac.at> Message-ID: <4BAA40B7.3090803@infiscale.com> Thanks guys. Really appreciate all the kind words :) Basically long term plans are to eventually migrate all Caos users to the new more feature rich OS by next year but not discontinue Caos, especially with so many offering to help with packaging this week. We also have several HPC partners that use it comercially so we are planning base support through 2012. I accidentally ended up in charge of Caos when GMK got busy with his day job, I love it for HPC (it totally annihilates) but as a workstation or full featured desktop it makes me still boot into Windows too often. Also doing a lot of cloud lately, the OS needs more mainstream support and hundreds if not thousands of packages to meet all users needs. We started looking around to see what the state of OS's are and we liked the easy of use of Ubuntu (as well as it's community support) but were turned off by the lack of performance, the heavy integration of Eucalyptus (do they not know what eucalyptus does to koalas? hehe) and inclusion of closed source applications (that's not how we do linux guys), combined with the fact that we liked the sanity of Debian and how easy package management is, were inundated with user request to support it, we decided to reroll GravityOS ( www.gravityos.org ) and are currently testing an alpha. We are not killing off Caos at all, just focusing more time in the direction requested most by our user base. I will see about an updated Caos ISO with the new kernel and the few security updates we have to roll into the repo. Basically we applied all the Caos sanity to Debian (small, lightweight core, hardened grsec protected kernel, well tested binaries, etc) and hope people will like it. VNFS footprint is very similar to Caos and aside from more hardware being supported, we don't see too much of a difference aside from having access to 40k + packages that allow us to sleep more :) On the Perceus front (and we will ALWAYS support Perceus), we are making it very distro neutral with support for rhel, suse, debian (ubuntu), and even gentoo and slackware eventually. We also have it working on BSD now. As always everyone has my commitment to assist any way possible. I can also be reached off the list directly and you can mail us at support at infiscale.com as well. The new ticketing system is going public this week as well where bug/feature request can be generated as well as voting on OS direction. Time to put the community back in the Community Assembles OS (caos). :) Thanks everyone, Arthur On 3/24/2010 2:47 AM, Stefan Boresch wrote: > Arthur, > > On Tue, Mar 23, 2010 at 05:41:50PM +0000, astevens at infiscale.com wrote: > > >> Project is not DOA, it is there are just a couple of us that >> maintain it and we have all been pretty hammered. >> > first, I want to echo Nicholas' praise; caos overall has performed very > nicely, there are no immediate problems, and perceus is just a gift from > the heavans for someone doing system admin on the side and having to setup > and maintain clusters. > > >> The current iso seemed very stable, got Intel Cluster Ready certified >> > Agreed, however, I have to play devil's advocate: ... > > >> and have not had any package request in a couple of months. >> > ... and no security issues either? > > In addition, I am a bit confused what to do in my situation: Over the > next months I have to upgrade most of the group's infrastructure, > which includes desktops, NFS servers, an older, second cluster not yet > converted to perceus, mail server, web server, local dns server etc. > My plan was to use Ubuntu and/or Debian squeeze for the desktops, and have > the rest run caos; in fact, some of our big RAIDs for data (incidentally, we > also do molecular dynamics) are already running caos. > I can live with a small core of packages and add missing pieces myself as > needed, even trying to package them up for others as well; > *however*, aside from the cluster clients I need reasonably reliable > *security updates* (some of these machines would/will/have to be open to > the wide, evil net ...). I can't risk being rooted because of a security > issue that was fixed upstream 6 months ago (and, my primary job description > is research and teaching, not sysadmin) > > My gut reaction at this point is to switch to debian (testing/squeeze) > for the infrastructure machines, which I have on some boxes and which > is shaping up very nicely (IMHO). Incidentally, The cluster I > can/could keep on caos. Then, of course, 'gravityOS' seems intriguing > also, but where does it fit my plans? Is this mainly intended to replace caos > as a 'lightweight' OS on the nodes? what would be the relation with > debian upstream? > > Of course, this also raises a bit the question: given the problems of > caos, what is the long time status of perceus (and while I can/could > do without caos, loosing perceus would be catastrophic ...) > > Again, cheers and thanks for what you did with caos/perceus so far!! > > Best regards, > > Stefan > > From stefan at mdy.univie.ac.at Thu Mar 25 01:06:42 2010 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Thu, 25 Mar 2010 09:06:42 +0100 Subject: [Caos] new packages / updates In-Reply-To: <4BAA40B7.3090803@infiscale.com> References: <2107516966-1269366109-cardhu_decombobulator_blackberry.rim.net-592902775-@bda505.bisx.prod.on.blackberry> <20100324094747.GA2680@loop.mdy.univie.ac.at> <4BAA40B7.3090803@infiscale.com> Message-ID: <20100325080642.GC2680@loop.mdy.univie.ac.at> Arthur, thanks for the outlook. On Wed, Mar 24, 2010 at 09:41:27AM -0700, Arthur Stevens wrote: > Thanks guys. Really appreciate all the kind words :) credit where credit is due !! > I accidentally ended up in charge of Caos when GMK got busy with his > day job, I love it for HPC (it totally annihilates) but as a workstation > or full featured desktop it makes me still boot into Windows too often. my experience exactly ... (lacking utf-8 support, font handling ca. 5 years behind the mainstream distros; I tried in the fall, but these are some of the reasons why I run at the moment ubuntu 9.10 on my desktop instead of caos ..) [snip] > We are not killing off Caos at all, just focusing more time in the > direction requested most by our user base. I will see about an updated > Caos ISO with the new kernel and the few security updates we have to > roll into the repo. great, thanks! (I'll attach one mini bug report as a PS ...) > On the Perceus front (and we will ALWAYS support Perceus), we are making > it very distro neutral with support for rhel, suse, debian (ubuntu), and > even gentoo and slackware eventually. We also have it working on BSD now. again great! [some rationale between gravityOS snipped] > Basically we applied all the Caos sanity to Debian (small, lightweight > core, hardened grsec protected kernel, well tested binaries, etc) and > hope people will like it. VNFS footprint is very similar to Caos and > aside from more hardware being supported, we don't see too much of a > difference aside from having access to 40k + packages that allow us to > sleep more :) All cheers from me if Debian becomes a first class citizen; this means I can and will (eventually) switch everything here to Debian (for me, even on the desktop Debian squeeze performs saner than ubuntu 'lucid'). I am going to take a serious look at gravityOS once it is in beta, but I am very curious as to the core differences. Or, more pointedly, what do I loose (on a server, not integrated into a perceus / abstractual network) if I just run plain Debian? Running just an alternative kernel should not be an issue; we use modified debian kernels by one of our hardware vendors on some servers (thomas-krenn.com) and I run the 'liquorix' kernel on my netbook that is otherwise pure squeeze/testing. Beyond a custom kernel, however, I have some qualms: Running/maintaining a parallel version of debian may not be as straightforward as it seems. Some years ago I observed how transtec's HPC cluster division (transtec is sort of a European Dell for business users, and in some areas I like doing business with them; I guess in Europe their HPC division would be one of your competitors) started with a debian based solution for their clusters (sort of their proprietory 'perceus') that became more and more work for them and the admins in charge of the systems as opinions of transtec's team and debian diverged how to set up x86-64 (again, this was some years ago). It ended by transtec giving up and switching (of all things) to some suse based solution (at least the last time I checked). That being said, I trust you'll let us know once gravityOS is ready for closer inspection ... Best regards, Stefan PS: [BUG REPORT:] Filenames ending in ~ in /etc/sysconfig/nics, e.g., /etc/sysconfig/nics/eth1~ , should be ignored. (I thought this was fixed at some point, or was it a similar problem in perceus ...) I was hunting 30 minutes yesterday why eth1 would always come up with the wrong IP ;-) -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790