Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#43381 closed defect (fixed)

openmpi-default @1.7.5_1+gcc48 does not work with hwloc @1.9_0

Reported by: dstrubbe (David Strubbe) Owned by: seanfarley (Sean Farley)
Priority: Normal Milestone:
Component: ports Version: 2.2.1
Keywords: Cc: dstrubbe (David Strubbe), michael-lists@…
Port: openmpi-default hwloc

Description

I get the error below for a simple test program. It does work if I downgrade to hwloc @1.8.1_0 and rebuild openmpi though. (Activating hwloc 1.9_0 without rebuilding openmpi makes the error come again.) I have OSX 10.8.5, XCode 5.1. It did work fine on my other computer, with openmpi-default @1.7.5_1+gcc45, OSX 10.6.8, XCode 3.2.6.

$ mpif90-openmpi-mp test_new.f90
$[mpiexec-openmpi-mp -n 1 ./a.out 

 [[38689,1],0] ORTE_ERROR_LOG: Error in file /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_tarballs_ports_science_openmpi/openmpi-default/work/openmpi-1.7.5/orte/util/nidmap.c at line 106
 [[38689,1],0] ORTE_ERROR_LOG: Error in file /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_tarballs_ports_science_openmpi/openmpi-default/work/openmpi-1.7.5/orte/mca/ess/env/ess_env_module.c at line 154
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relev-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
 Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpiexec-openmpi-mp detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[38689,1],0]
  Exit code:    1
--------------------------------------------------------------------------

A couple of threads describing a similar situation on Fedora suggests it is related to some conflict between different versions of OpenMPI, which inspired me to try swapping the hwloc version: http://www.open-mpi.org/community/lists/users/2013/07/22346.php https://lists.fedoraproject.org/pipermail/users/2013-July/438349.html

Change History (5)

comment:1 Changed 10 years ago by mf2k (Frank Schima)

Cc: sean@… removed
Owner: changed from macports-tickets@… to sean@…
Port: openmpi-default, hwlocopenmpi-default hwloc

comment:2 Changed 10 years ago by seanfarley (Sean Farley)

Thanks for the repo, I'll look into it.

comment:3 Changed 10 years ago by dstrubbe (David Strubbe)

Cc: dstrubbe@… added

Cc Me!

comment:4 Changed 10 years ago by seanfarley (Sean Farley)

Resolution: fixed
Status: newclosed

Fixed in r122954. Sorry for the delay!

comment:5 Changed 10 years ago by michael-lists@…

Cc: michael-lists@… added
Last edited 10 years ago by michael-lists@… (previous) (diff)
Note: See TracTickets for help on using tickets.