New Ticket     Tickets     Wiki     Browse Source     Timeline     Roadmap     Ticket Reports     Search

Ticket #35508 (closed defect: fixed)

Opened 10 months ago

Last modified 10 months ago

arpack port does not work on Lion with GFortran 4.6.2 due to Accelerate problem

Reported by: gcrosswhite@… Owned by: mmoll@…
Priority: Normal Milestone:
Component: ports Version: 2.1.2
Keywords: Cc:
Port: arpack

Description

I have seen problem this before and thought it had been squashed in this port but it has appeared again.

ARPACK has a problem in that it uses the BLAS routine ZDOTC which has a different calling convention in Accelerate.framework then that used by GFortran which causes crashes that I have encountered in my code. I know that this was the source of the problem because when I downloaded arpack-ng and patched it manually, replacing

X = ZDOTC(....)

with

call ZDOTC(X,...)

then the problems went away.

I am not sure how people would prefer to see this problem solved, but I could submit a patch making the changes above if you all would like.

Attachments

patches.tar.gz (3.8 KB) - added by gcrosswhite@… 10 months ago.
Patches to change all CDOTC and ZDOTC calls to work with Accelerate.
patch-SRC-cneupd.f.diff (1.0 KB) - added by gcrosswhite@… 10 months ago.
Corrected patch for the file SRC/cneupd.f

Change History

comment:1 Changed 10 months ago by macsforever2000@…

  • Port set to arpack

In the future, please fill in the Port field and Cc the port maintainer(s).

comment:2 Changed 10 months ago by macsforever2000@…

  • Owner changed from macports-tickets@… to mmoll@…

comment:3 Changed 10 months ago by mmoll@…

I have Mountain Lion installed and can't reproduce this. I just reinstalled arpack @3.1.1_2+accelerate+gcc46+openmpi. Can you attach your main.log file?

comment:4 Changed 10 months ago by gcrosswhite@…

I didn't see anything in /opt/local/var/macports/logs, but I wasn't expecting to as the port builds just fine; the problem is that the resulting library is not okay because it segfaults at runtime because it is using the wrong calling convention for some BLAS routines such as zdotc.

To create a simple test case that illustrates the problem, I compiled the test program zndrv1.f in EXAMPLES/COMPLEX of the main ARPACK distribution and linked it against the MacPorts build of libarpack.a. The result was:

$ gfortran zndrv1.f /opt/local/lib/libarpack.a -framework Accelerate
$ ./a.out
zsh: segmentation fault  ./a.out

We can see where the segmentation fault is coming from by using gdb:


$ gdb ./a.out

GNU gdb 6.3.50-20050815 (Apple version gdb-1752) (Sat Jan 28 03:02:46 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ..

(gdb) run

Starting program: /Users/gcross/Downloads/ARPACK/EXAMPLES/COMPLEX/a.out 
Reading symbols for shared libraries +++++................................ done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000015
0x00007fff87f02d9a in zdotc_ ()

(gdb) backtrace

#0  0x00007fff87f02d9a in zdotc_ ()
#1  0x0000000100009f8f in zneupd_ ()
#2  0x0000000100002ac1 in MAIN__ ()
#3  0x0000000100003833 in main ()

So in conclusion the crash is related to zdotc, and when I linked the test program against my own version of libarpack.a which did the replacement I discussed earlier the program ran just fine:

$ gfortran zndrv1.f /usr/local/lib/libarpack.a -framework Accelerate
$ ./a.out

 Ritz values (Real, Imag) and relative residuals
 -----------------------------------------------
               Col   1       Col   2       Col   3
  Row   1:    7.16197D+02   1.02958D+03   6.80426D-15
  Row   2:    7.16197D+02  -1.02958D+03   9.03466D-15
  Row   3:    6.87583D+02   1.02958D+03   1.11184D-14
  Row   4:    6.87583D+02  -1.02958D+03   1.58575D-14
  
  
 _NDRV1
 ====== 
  
  Size of the matrix is          100
  The number of Ritz values requested is            4
  The number of Arnoldi vectors generated (NCV) is           20
  What portion of the spectrum: LM
  The number of converged Ritz values is            4
  The number of Implicit Arnoldi update iterations taken is           25
  The number of OP*x is          392
  The convergence criterion is   1.11022302462515654E-016

So, this doesn't quite answer your question, but it is the closest answer I can think of at the moment that provides you with a log that records the problem, as well as an example easily available test case that triggers it.

comment:5 Changed 10 months ago by mmoll@…

Ah, I get it now. If you could submit a patch, that'd be great.

Changed 10 months ago by gcrosswhite@…

Patches to change all CDOTC and ZDOTC calls to work with Accelerate.

comment:6 Changed 10 months ago by gcrosswhite@…

I did a grep through the sources and changed every call to either CDOTC or ZDOTC so that they were treated like subroutines with the return value stored in the first argument rather than like functions. I did some spot checks to make sure that the resulting library is good; the changes made the double-precision complex valued tests work (e.g., zndrv* in EXAMPLES/COMPLEX) but for some reason lots of other test including the single-precision complex tests in COMPLEX/ fail both before and after makings the changes; however, they do so with an error message rather than a segfault so I don't think that their problem is related to this one, and in particular these changes don't seem to be making anything worse.

I have attached the patches for all of the files that I changed; there are 24 in total: 4 base files * 2 precisions * 3 modes (sequential, parallel MPI, parallel BLACS).

VERY IMPORTANT: You most likely already were going to do this but just to be sure: make sure that this patch is only applied when using Accelerate! This is because only Accelerate has the weird ABI issue that requires this rather strange form of patch in order to work the quirk, so if the path it is applied when using, say, atlas, then it will actually break things rather than fixing them.

comment:7 Changed 10 months ago by mmoll@…

I committed a change in the Portfile that applies your patches in r96280. Please give it a try. One of the patches, patch-SRC-cneupd.f.diff, was 0 bytes. Is that correct?

Changed 10 months ago by gcrosswhite@…

Corrected patch for the file SRC/cneupd.f

comment:8 Changed 10 months ago by gcrosswhite@…

Ugh, indeed you caught that one of my patches got screwed up somehow; the corrected version has been attached above. As cneupd.f is not used by my own program, I will try out the new port now.

comment:9 Changed 10 months ago by gcrosswhite@…

It works! :-)

comment:10 Changed 10 months ago by mmoll@…

  • Status changed from new to closed
  • Resolution set to fixed

Thanks for your patches. The last patch was added in r96338. Closing this issue.

Note: See TracTickets for help on using tickets.