Opened 3 years ago

Closed 3 years ago

#62175 closed defect (fixed)

mpich @3.4 runtime error

Reported by: derek-teaney Owned by: eborisch (Eric A. Borisch)
Priority: Normal Milestone:
Component: ports Version: 2.6.4
Keywords: Cc:
Port: mpich

Description

I install mpich and mpich-default smoothly, and compile a simple program "cpi.cc"

Then at runtime, I am getting a runtime error when I try to run the simplest mpi program. This seems related to the following mail on the mpich web page

https://lists.mpich.org/pipermail/discuss/2020-August/006031.html

I am not sure if this is a mac-ports problem. If not any advice is greatly appreciated.

This is the output

➜ MPI git:(master) ✗ mpiexec -np 2 ./a.out

Assertion failed in file src/mpid/ch4/netmod/ofi/ofi_init.c at line 1988: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/ofi_init.c at line 1988: mapped_table[i] != FI_ADDR_NOTAVAIL

0   libpmpi.12.dylib                    0x000000010bc55d24 MPL_backtrace_show + 52
1   libpmpi.12.dylib                    0x000000010bbe1694 MPIR_Assert_fail + 36
2   libpmpi.12.dylib                    0x000000010bc2af97 MPIDI_OFI_mpi_init_hook + 6487
3   libpmpi.12.dylib                    0x000000010bc0b86f MPID_Init + 2383
4   libpmpi.12.dylib                    0x000000010ba27cb4 MPIR_Init_thread + 228
5   libmpi.12.dylib                     0x000000010b816a47 MPI_Init + 279
6   a.out                               0x000000010b7b0bb8 main + 104
7   libdyld.dylib                       0x00007fff684a8cc9 start + 1
0   libpmpi.12.dylib                    0x0000000105f31d24 MPL_backtrace_show + 52
1   libpmpi.12.dylib                    0x0000000105ebd694 MPIR_Assert_fail + 36
2   libpmpi.12.dylib                    0x0000000105f06f97 MPIDI_OFI_mpi_init_hook + 6487
3   libpmpi.12.dylib                    0x0000000105ee786f MPID_Init + 2383
4   libpmpi.12.dylib                    0x0000000105d03cb4 MPIR_Init_thread + 228
5   libmpi.12.dylib                     0x0000000104e3ea47 MPI_Init + 279
6   a.out                               0x0000000104ddabb8 main + 104
7   libdyld.dylib                       0x00007fff684a8cc9 start + 1
Abort(1) on node 0: Internal error
Abort(1) on node 0: Internal error

Without the multiple process all runs smoothly

➜ MPI git:(master) ✗ mpiexec ./a.out

Process 0 on MacBook-Pro-5.local
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.000071

I am running:

Catalina 10.15.5 (19F96), 2.2 GHz Quad-Core Intel Core i7

Change History (8)

comment:1 Changed 3 years ago by ryandesign (Ryan Carsten Schmidt)

Owner: set to eborisch
Status: newassigned

comment:2 Changed 3 years ago by eborisch (Eric A. Borisch)

Huh. I can recreate this with mpich-default, but I typically use mpich-clang* (because I also want OpenMP support baked in), and it doesn't occur on mpich-clang10, to be sure.

Short term, I would recommend one of the mpich-clang* ports; long term, we may need to revisit if -default should use one of our provided clangs rather than the system one, as well as reporting upstream to see if we can get some resolution.

comment:3 Changed 3 years ago by eborisch (Eric A. Borisch)

comment:4 Changed 3 years ago by eborisch (Eric A. Borisch)

Well, it's more complicated than just using mpich-clang10; I was running mpich-clang10 +tuned and some additional customizations of my own; I'm tracking down just what made it work.

comment:5 Changed 3 years ago by eborisch (Eric A. Borisch)

It looks (with two quick spot-checks) like installing +tuned makes it work.

comment:6 Changed 3 years ago by derek-teaney

Thanks, installing clang10 +tuned worked. The +tuned variant is not documented by the command port info mpich.

I did try to compile mpich directly from source "out of the box" with default tools, which means (as I learned) "/usr/bin/gcc = apples clang 12", and ran into the same runtime err.

I was unable to compile mpich with macports /opt/local/bin/gcc-mp-10, for the stupid reason that ./configure ran into a configure error complaining about -std=c99 not working (it does), but that is a discussion for another group.

comment:7 Changed 3 years ago by eborisch (Eric A. Borisch)

This should be resolved now (as of switch back to ch3.)

comment:8 Changed 3 years ago by eborisch (Eric A. Borisch)

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.