From justin at infiscale.com Fri Oct 1 00:32:22 2010 From: justin at infiscale.com (Justin Brown) Date: Fri, 01 Oct 2010 00:32:22 -0700 Subject: [Caos] xhpl not working? Message-ID: I will look into the package to see if I can replicate that. Would you be able to test recompiling it on your system? Thanks. -Justin Brown. Greg Kennedy wrote: >I just did a cd install of Caos 1.0.29, choosing the Cluster image >(which then downloaded a whole bunch of files). I've been able to >provision a node successfully using a gpxe disk in a second PC. > >However I am running into problems trying to run the included xhpl app: >[admin at locutus /]$ xhpl >xhpl: Symbol `ompi_mpi_int' has different size in shared object, >consider re-linking >xhpl: Symbol `ompi_mpi_comm_null' has different size in shared object, >consider re-linking >xhpl: Symbol `ompi_mpi_double' has different size in shared object, >consider re-linking >xhpl: Symbol `ompi_mpi_comm_world' has different size in shared >object, consider re-linking >xhpl: Symbol `ompi_mpi_byte' has different size in shared object, >consider re-linking >Segmentation fault > >Does anyone have an idea of how this can be fixed? It seems like one >of the libraries has been updated in the meantime and xhpl needs a >recompile / repackage. >_______________________________________________ >Caos mailing list >Caos at lists.infiscale.org >http://lists.infiscale.org/mailman/listinfo/caos From robert.hamon at recherche-ste-justine.qc.ca Fri Oct 1 11:22:31 2010 From: robert.hamon at recherche-ste-justine.qc.ca (Robert Hamon) Date: Fri, 1 Oct 2010 14:22:31 -0400 Subject: [Caos] mpi and infiniband error Message-ID: I'm trying to use mpirun on my installation of CAOS Linux 1.0.29 and it's giving me the error: libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 My setup is SunBlade 6000 cluster with Mellanox Technologies infiniband hca, all booting from the head node with caos linux. It works fine if I specify '-mca btl tcp' but I'd really like to get openib working. Any tip for me to get this working properly? thank you. Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.infiscale.org/pipermail/caos/attachments/20101001/62b30cf0/attachment.html From pierreyves.langlois at gmail.com Fri Oct 1 11:52:04 2010 From: pierreyves.langlois at gmail.com (Pierre-Yves Langlois) Date: Fri, 1 Oct 2010 14:52:04 -0400 Subject: [Caos] xhpl not working? In-Reply-To: References: Message-ID: I used xhpl a long time ago but if my memory is good, you need to put the the xhpl binary in your /root folder of your nodes to make this works... PY On Fri, Oct 1, 2010 at 3:32 AM, Justin Brown wrote: > I will look into the package to see if I can replicate that. Would you be > able to test recompiling it on your system? > > Thanks. > > -Justin Brown. > > Greg Kennedy wrote: > > >I just did a cd install of Caos 1.0.29, choosing the Cluster image > >(which then downloaded a whole bunch of files). I've been able to > >provision a node successfully using a gpxe disk in a second PC. > > > >However I am running into problems trying to run the included xhpl app: > >[admin at locutus /]$ xhpl > >xhpl: Symbol `ompi_mpi_int' has different size in shared object, > >consider re-linking > >xhpl: Symbol `ompi_mpi_comm_null' has different size in shared object, > >consider re-linking > >xhpl: Symbol `ompi_mpi_double' has different size in shared object, > >consider re-linking > >xhpl: Symbol `ompi_mpi_comm_world' has different size in shared > >object, consider re-linking > >xhpl: Symbol `ompi_mpi_byte' has different size in shared object, > >consider re-linking > >Segmentation fault > > > >Does anyone have an idea of how this can be fixed? It seems like one > >of the libraries has been updated in the meantime and xhpl needs a > >recompile / repackage. > >_______________________________________________ > >Caos mailing list > >Caos at lists.infiscale.org > >http://lists.infiscale.org/mailman/listinfo/caos > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.infiscale.org/pipermail/caos/attachments/20101001/1c6bf0c3/attachment.html From tmattox at gmail.com Fri Oct 1 12:45:33 2010 From: tmattox at gmail.com (Tim Mattox) Date: Fri, 1 Oct 2010 15:45:33 -0400 Subject: [Caos] mpi and infiniband error In-Reply-To: References: Message-ID: I'd suggest you follow the Open MPI help directions and FAQ, which may lead to the answer, or to the Open MPI users mailing list: http://www.open-mpi.org/community/help/ -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ ?timattox at open-mpi.org || tmattox at gmail.com ? ? I'm a bright... http://www.the-brights.net/ From kennedy.greg at gmail.com Sat Oct 2 19:16:18 2010 From: kennedy.greg at gmail.com (Greg Kennedy) Date: Sat, 2 Oct 2010 21:16:18 -0500 Subject: [Caos] xhpl not working? In-Reply-To: References: Message-ID: Sure, I can give a recompile a go. Just give me the instructions and I'll try it out : ) I have only one node running now (the master) and no compute nodes. I did attempt as root anyway cd /root cp `which xhpl` . ./xhpl but still got a segfault. I'm having another problem provisioning a couple of nodes too but I'll start another thread about that. I'm not sure if it is a Caos issue or a Perceus one. Guess it depends on how up-to-date the Perceus package in Caos is. On Fri, Oct 1, 2010 at 2:32 AM, Justin Brown wrote: > I will look into the package to see if I can replicate that. Would you be able to test recompiling it on your system? > > Thanks. > > -Justin Brown. > > Greg Kennedy wrote: > >>I just did a cd install of Caos 1.0.29, choosing the Cluster image >>(which then downloaded a whole bunch of files). ?I've been able to >>provision a node successfully using a gpxe disk in a second PC. >> >>However I am running into problems trying to run the included xhpl app: >>[admin at locutus /]$ xhpl >>xhpl: Symbol `ompi_mpi_int' has different size in shared object, >>consider re-linking >>xhpl: Symbol `ompi_mpi_comm_null' has different size in shared object, >>consider re-linking >>xhpl: Symbol `ompi_mpi_double' has different size in shared object, >>consider re-linking >>xhpl: Symbol `ompi_mpi_comm_world' has different size in shared >>object, consider re-linking >>xhpl: Symbol `ompi_mpi_byte' has different size in shared object, >>consider re-linking >>Segmentation fault >> >>Does anyone have an idea of how this can be fixed? ?It seems like one >>of the libraries has been updated in the meantime and xhpl needs a >>recompile / repackage. >>_______________________________________________ >>Caos mailing list >>Caos at lists.infiscale.org >>http://lists.infiscale.org/mailman/listinfo/caos > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos > From kennedy.greg at gmail.com Sat Oct 2 19:26:06 2010 From: kennedy.greg at gmail.com (Greg Kennedy) Date: Sat, 2 Oct 2010 21:26:06 -0500 Subject: [Caos] Can't provision nodes using ADMtek chipset-based NICs Message-ID: I'm having problems when trying to get my master node to provision compute nodes, when the compute node has a NIC installed using an ADMtek 10/100 chipset. I've tried multiple nodes and three different cards from different manufacturers (using the same chips), and each time I get as far as the Perceus ASCII banner, then a "No supported network cards found" message. I think this is a weird one to get because 1) this network card is supported (though in a "legacy NIC wrapper" mode?) on the gpxe-1.0.1 floppy disk image I am using to boot the nodes, and 2) this network card is supported and works fine in the latest Knoppix distribution. Is this a Caos problem, or something I can fix? Seems like maybe a missing driver in a netbooted kernel image. Or is it an upstream problem with Perceus itself and I need to take this message over to that mailing list? -Greg From justin at infiscale.com Sat Oct 2 19:49:10 2010 From: justin at infiscale.com (Justin Brown) Date: Sat, 02 Oct 2010 19:49:10 -0700 Subject: [Caos] Can't provision nodes using ADMtek chipset-based NICs Message-ID: If you are using a boot floppy to supply the network driver, you may be lacking the support in perceus' kernel, which you are using when you see that ASCII banner. You may want to try the perceus list. Someone may have experience with that chipset, or could advise on implementing the support. -Justin Brown. Greg Kennedy wrote: >I'm having problems when trying to get my master node to provision >compute nodes, when the compute node has a NIC installed using an >ADMtek 10/100 chipset. I've tried multiple nodes and three different >cards from different manufacturers (using the same chips), and each >time I get as far as the Perceus ASCII banner, then a "No supported >network cards found" message. > >I think this is a weird one to get because >1) this network card is supported (though in a "legacy NIC wrapper" >mode?) on the gpxe-1.0.1 floppy disk image I am using to boot the >nodes, and >2) this network card is supported and works fine in the latest Knoppix >distribution. > >Is this a Caos problem, or something I can fix? Seems like maybe a >missing driver in a netbooted kernel image. Or is it an upstream >problem with Perceus itself and I need to take this message over to >that mailing list? > >-Greg >_______________________________________________ >Caos mailing list >Caos at lists.infiscale.org >http://lists.infiscale.org/mailman/listinfo/caos From kennedy.greg at gmail.com Sat Oct 2 20:26:12 2010 From: kennedy.greg at gmail.com (Greg Kennedy) Date: Sat, 2 Oct 2010 22:26:12 -0500 Subject: [Caos] Can't provision nodes using ADMtek chipset-based NICs In-Reply-To: References: Message-ID: Actually, I just pulled the source for Perceus 1.5.3 and checked the kernel-i386.config file: # CONFIG_NET_TULIP is not set which, according to this link: http://cateee.net/lkddb/web-lkddb/NET_TULIP.html means Perceus kernel has no support in place for my card(s). I'll kick this over to the Perceus list. Maybe they can turn on module support for this and get it into the next version, or I could get instructions on building a new kernel in the meantime that includes support for this chip. -Greg On Sat, Oct 2, 2010 at 9:49 PM, Justin Brown wrote: > If you are using a boot floppy to supply the network driver, you may be lacking the support in perceus' kernel, which you are using when you see that ASCII banner. > > You may want to try the perceus list. Someone may have experience with that chipset, or could advise on implementing the support. > > -Justin Brown. > > Greg Kennedy wrote: > >>I'm having problems when trying to get my master node to provision >>compute nodes, when the compute node has a NIC installed using an >>ADMtek 10/100 chipset. ?I've tried multiple nodes and three different >>cards from different manufacturers (using the same chips), and each >>time I get as far as the Perceus ASCII banner, then a "No supported >>network cards found" message. >> >>I think this is a weird one to get because >>1) this network card is supported (though in a "legacy NIC wrapper" >>mode?) on the gpxe-1.0.1 floppy disk image I am using to boot the >>nodes, and >>2) this network card is supported and works fine in the latest Knoppix >>distribution. >> >>Is this a Caos problem, or something I can fix? ?Seems like maybe a >>missing driver in a netbooted kernel image. ?Or is it an upstream >>problem with Perceus itself and I need to take this message over to >>that mailing list? >> >>-Greg >>_______________________________________________ >>Caos mailing list >>Caos at lists.infiscale.org >>http://lists.infiscale.org/mailman/listinfo/caos > _______________________________________________ > Caos mailing list > Caos at lists.infiscale.org > http://lists.infiscale.org/mailman/listinfo/caos >