Opened 4 months ago

Last modified 9 days ago

#56954 assigned defect

py-numpy: numpy.polyfit broken with +gfortran variant on High Sierra

Reported by: mojca (Mojca Miklavec) Owned by: michaelld (Michael Dickens)
Priority: Normal Milestone:
Component: ports Version:
Keywords: Cc: jmroot (Joshua Root), dershow, jsalort (Julien Salort), DanielO (Daniel O'Connor), reneeotten (Renee Otten)
Port: py-numpy

Description

There seems to be an issue with numpy.polyfit under various versions of python I tested (2.7, 3.6, 3.7).

Running the following code under system python:

import numpy as np

points = np.array([[0, -2.1108348e+04], [3.2768000e+04, -2.7959160e+03], [6.5534000e+04,  1.4279546e+04], [4.9151000e+04,  6.6721514e+03], [1.6384000e+04, -1.3232387e+04]], dtype=np.float32)

koef1 = np.polyfit(points[:,0], points[:,1], 1)
koef2 = np.polyfit(points[:,0], points[:,1], 2)

works and returns

>>> koef1
array([  5.53485811e-01,  -2.13732832e+04], dtype=float32)
>>> koef2
array([ -4.00165135e-07,   5.79710186e-01,  -2.15881035e+04], dtype=float32)

However using python from MacPorts it looks much worse:

>>> koef1 = np.polyfit(points[:,0], points[:,1], 1)
Python(83745,0x7fff9e546380) malloc: *** mach_vm_map(size=18446744072450498560) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
init_dgelsd failed init
__main__:1: RankWarning: Polyfit may be poorly conditioned
>>> koef2 = np.polyfit(points[:,0], points[:,1], 2)
Python(83745,0x7fff9e546380) malloc: *** mach_vm_map(size=18446744072450498560) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
init_dgelsd failed init
__main__:1: RankWarning: Polyfit may be poorly conditioned
>>> koef1
array([2.2937462e-317, 1.0438839e-312])
>>> koef2
array([5.05031302e+09, 8.97368555e+04, 2.23606798e+00])

Attachments (1)

numpy-test-suite.txt (149.9 KB) - added by mojca (Mojca Miklavec) 4 months ago.
failed tests of numpy

Download all attachments as: .zip

Change History (30)

comment:1 Changed 4 months ago by mojca (Mojca Miklavec)

The test suite for numpy can be run via

import numpy
numpy.test('full')

and returns a number of errors:

20 failed, 4529 passed, 417 skipped, 7 xfailed, 17 warnings in 161.32 seconds

Changed 4 months ago by mojca (Mojca Miklavec)

Attachment: numpy-test-suite.txt added

failed tests of numpy

comment:2 Changed 4 months ago by mf2k (Frank Schima)

I cannot reproduce your polyfit error. It works fine for me with python36 and ipython. Here are my test suite results:

NumPy version 1.15.0
...
4946 passed, 20 skipped, 7 xfailed in 274.98 seconds

For reference, what are your installed versions of the relevant ports?

$ port installed python36 py36-numpy py36-ipython
The following ports are currently installed:
  py36-ipython @6.4.0_0 (active)
  py36-numpy @1.15.0_0+gcc8 (active)
  python36 @3.6.6_0+optimizations (active)

comment:3 Changed 4 months ago by michaelld (Michael Dickens)

The commands you list work for me using MacPorts' Python 2.7, 3.6, and 3.7 && latest NumPy; all on 10.12 latest.

comment:4 Changed 4 months ago by reneeotten (Renee Otten)

I am seeing the same errors as reported by mojca with Python 2.7, 3.6, and 3.7. The only difference for me, compared to what Frank reports, is that all py-numpy ports were installed with the default +gfortran variant:

  py36-numpy @1.15.0_0+gfortran (active)

comment:5 Changed 4 months ago by mf2k (Frank Schima)

How about these ports? There was a problem with using llvm for them that surfaced recently.

$ port installed ld64 cctools
The following ports are currently installed:
  cctools @895_6+xcode (active)
  ld64 @3_1+ld64_xcode (active)

comment:6 Changed 4 months ago by michaelld (Michael Dickens)

All of my NumPy ports are installed using the default:

  py27-numpy @1.15.0_0+gfortran (active)
  py36-numpy @1.15.0_0+gfortran (active)
  py37-numpy @1.15.0_0+gfortran (active)

my cctools & ld64 are up to date, but since these NumPy were installed way before these latest issues with cctools & ld64, I doubt that's what's causing the issue here (though one never knows ;)

comment:7 in reply to:  5 Changed 4 months ago by reneeotten (Renee Otten)

Replying to mf2k:

How about these ports? There was a problem with using llvm for them that surfaced recently.

$ port installed ld64 cctools
The following ports are currently installed:
  cctools @895_6+xcode (active)
  ld64 @3_1+ld64_xcode (active)

seem to be the latest as well:

~> port installed ld64 cctools
The following ports are currently installed:
  cctools @895_6+xcode (active)
  ld64 @3_1+ld64_xcode (active)

doing an uninstall and clean, followed by sudo port -vst install py37-numpy didn't help either.

comment:8 Changed 4 months ago by mf2k (Frank Schima)

I switched my py36-numpy to use +gfortran and I can confirm the error reported in this ticket. I installed with the binary from the buildbot and also built from source, and the same failed result occurs. So there is definitely something wrong with the +gfortran variant for py-numpy. I think py-numpy should switch to +gcc8 as the default.

Last edited 4 months ago by mf2k (Frank Schima) (previous) (diff)

comment:9 Changed 4 months ago by michaelld (Michael Dickens)

What OSX version are folks having issues on?

py*-numpy +gfortran built from source works for me on 10.12. Haven't tried this on other OSX versions, but I can easily do so; I do "import numpy; numpy.test()" on all OSX I have around &.it works about the same on all of them. All built from source.

comment:10 Changed 4 months ago by michaelld (Michael Dickens)

I have no objection to switching to +gcc8; just want to make sure that's the correct fix.

comment:11 Changed 4 months ago by mf2k (Frank Schima)

I'm on High Sierra (10.13).

comment:12 Changed 4 months ago by michaelld (Michael Dickens)

Gotcha. I confirm the issue with 10.13. Doesn't happen with 10.12 or any prior (in my testing). So this bug is 10.13 only (in my testing).

Thus, wondering if this is NumPy or something else. The routine "init_dgelsd" looks like it's from LAPACK(E).

comment:13 Changed 4 months ago by mf2k (Frank Schima)

Summary: python: numpy.polyfit brokenpy-numpy: numpy.polyfit broken with +gfortran variant on High Sierra

comment:14 Changed 4 months ago by mf2k (Frank Schima)

Port: python27 python36 python37 removed

comment:15 Changed 4 months ago by michaelld (Michael Dickens)

Following the instructions to set a breakpoint at malloc_error_break, here's the backtrace for Python 2.7:

% lldb /opt/local/bin/python2.7
(lldb) target create "/opt/local/bin/python2.7"
Current executable set to '/opt/local/bin/python2.7' (x86_64).
(lldb) b malloc_error_break
Breakpoint 1: where = libsystem_malloc.dylib`malloc_error_break, address = 0x0000000000011962
(lldb) r test_numpy_10_13.py
Process 25452 launched: '/opt/local/bin/python2.7' (x86_64)
Process 25452 stopped
* thread #2, stop reason = exec
    frame #0: 0x000000010000519c dyld`_dyld_start
dyld`_dyld_start:
->  0x10000519c <+0>: popq   %rdi
    0x10000519d <+1>: pushq  $0x0
    0x10000519f <+3>: movq   %rsp, %rbp
    0x1000051a2 <+6>: andq   $-0x10, %rsp
Target 0: (Python) stopped.
(lldb) c
Process 25452 resuming
Python(25452,0x7fff9cda4380) malloc: *** mach_vm_map(size=18446744072618995712) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Process 25452 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00007fff64772962 libsystem_malloc.dylib`malloc_error_break
libsystem_malloc.dylib`malloc_error_break:
->  0x7fff64772962 <+0>: pushq  %rbp
    0x7fff64772963 <+1>: movq   %rsp, %rbp
    0x7fff64772966 <+4>: nop    
    0x7fff64772967 <+5>: nopl   (%rax)
Target 0: (Python) stopped.
(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00007fff64772962 libsystem_malloc.dylib`malloc_error_break
    frame #1: 0x00007fff6476fa08 libsystem_malloc.dylib`szone_error + 392
    frame #2: 0x00007fff64771ed6 libsystem_malloc.dylib`mvm_allocate_pages + 256
    frame #3: 0x00007fff64767475 libsystem_malloc.dylib`large_malloc + 464
    frame #4: 0x00007fff6476339d libsystem_malloc.dylib`szone_malloc_should_clear + 388
    frame #5: 0x00007fff647631bd libsystem_malloc.dylib`malloc_zone_malloc + 103
    frame #6: 0x00007fff647624c7 libsystem_malloc.dylib`malloc + 24
    frame #7: 0x0000000104fb5a2f _umath_linalg.so`DOUBLE_lstsq + 735
    frame #8: 0x0000000104934e27 umath.so`PyUFunc_GenericFunction + 19415
    frame #9: 0x00000001049378be umath.so`ufunc_generic_call + 174
    frame #10: 0x00000001000b0201 Python`PyObject_Call + 97
    frame #11: 0x000000010015b9aa Python`PyEval_EvalFrameEx + 9130
    frame #12: 0x00000001001593a4 Python`PyEval_EvalCodeEx + 2212
    frame #13: 0x0000000100163f0d Python`fast_function + 109
    frame #14: 0x000000010015b80c Python`PyEval_EvalFrameEx + 8716
    frame #15: 0x00000001001593a4 Python`PyEval_EvalCodeEx + 2212
    frame #16: 0x0000000100163f0d Python`fast_function + 109
    frame #17: 0x000000010015b80c Python`PyEval_EvalFrameEx + 8716
    frame #18: 0x00000001001593a4 Python`PyEval_EvalCodeEx + 2212
    frame #19: 0x0000000100158af2 Python`PyEval_EvalCode + 34
    frame #20: 0x0000000100186fed Python`PyRun_FileExFlags + 157
    frame #21: 0x0000000100186b24 Python`PyRun_SimpleFileExFlags + 740
    frame #22: 0x000000010019e71f Python`Py_Main + 3279
    frame #23: 0x00007fff645ba015 libdyld.dylib`start + 1
(lldb)

comment:16 Changed 4 months ago by michaelld (Michael Dickens)

and Python 3.6:

% lldb /opt/local/bin/python3.6
(lldb) target create "/opt/local/bin/python3.6"
Current executable set to '/opt/local/bin/python3.6' (x86_64).
(lldb) b malloc_error_break
Breakpoint 1: where = libsystem_malloc.dylib`malloc_error_break, address = 0x00007fff64772962
(lldb) r test_numpy_10_13.py
Process 69019 launched: '/opt/local/Library/Frameworks/Python.framework/Versions/3.6/Resources/Python.app/Contents/MacOS/Python' (x86_64)
python3.6(69019,0x7fff9cda4380) malloc: *** mach_vm_map(size=18446744072618991616) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Process 69019 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00007fff64772962 libsystem_malloc.dylib`malloc_error_break
libsystem_malloc.dylib`malloc_error_break:
->  0x7fff64772962 <+0>: pushq  %rbp
    0x7fff64772963 <+1>: movq   %rsp, %rbp
    0x7fff64772966 <+4>: nop    
    0x7fff64772967 <+5>: nopl   (%rax)
Target 0: (Python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00007fff64772962 libsystem_malloc.dylib`malloc_error_break
    frame #1: 0x00007fff6476fa08 libsystem_malloc.dylib`szone_error + 392
    frame #2: 0x00007fff64771ed6 libsystem_malloc.dylib`mvm_allocate_pages + 256
    frame #3: 0x00007fff64767475 libsystem_malloc.dylib`large_malloc + 464
    frame #4: 0x00007fff6476339d libsystem_malloc.dylib`szone_malloc_should_clear + 388
    frame #5: 0x00007fff647631bd libsystem_malloc.dylib`malloc_zone_malloc + 103
    frame #6: 0x00007fff647624c7 libsystem_malloc.dylib`malloc + 24
    frame #7: 0x0000000104ed4a2f _umath_linalg.cpython-36m-darwin.so`DOUBLE_lstsq + 735
    frame #8: 0x0000000104789267 umath.cpython-36m-darwin.so`PyUFunc_GenericFunction + 19415
    frame #9: 0x000000010478be3e umath.cpython-36m-darwin.so`ufunc_generic_call + 174
    frame #10: 0x00000001000ad243 Python`_PyObject_FastCallDict + 143
    frame #11: 0x00000001000ad5fc Python`_PyObject_FastCallKeywords + 97
    frame #12: 0x000000010014c356 Python`call_function + 443
    frame #13: 0x0000000100144c25 Python`_PyEval_EvalFrameDefault + 4479
    frame #14: 0x000000010014cb06 Python`_PyEval_EvalCodeWithName + 1747
    frame #15: 0x000000010014d1e9 Python`fast_function + 218
    frame #16: 0x000000010014c35d Python`call_function + 450
    frame #17: 0x0000000100144b8d Python`_PyEval_EvalFrameDefault + 4327
    frame #18: 0x000000010014cb06 Python`_PyEval_EvalCodeWithName + 1747
    frame #19: 0x000000010014d1e9 Python`fast_function + 218
    frame #20: 0x000000010014c35d Python`call_function + 450
    frame #21: 0x0000000100144b8d Python`_PyEval_EvalFrameDefault + 4327
    frame #22: 0x000000010014cb06 Python`_PyEval_EvalCodeWithName + 1747
    frame #23: 0x0000000100143a2c Python`PyEval_EvalCode + 42
    frame #24: 0x000000010016cd8f Python`run_mod + 54
    frame #25: 0x000000010016bd9e Python`PyRun_FileExFlags + 164
    frame #26: 0x000000010016b489 Python`PyRun_SimpleFileExFlags + 283
    frame #27: 0x000000010018026a Python`Py_Main + 3466
    frame #28: 0x0000000100001e1d Python`___lldb_unnamed_symbol1$$Python + 227
    frame #29: 0x00007fff645ba015 libdyld.dylib`start + 1
(lldb)

comment:17 Changed 4 months ago by michaelld (Michael Dickens)

and Python 3.7:

% lldb /opt/local/bin/python3.7
(lldb) target create "/opt/local/bin/python3.7"
Current executable set to '/opt/local/bin/python3.7' (x86_64).
(lldb) b malloc_error_break
Breakpoint 1: where = libsystem_malloc.dylib`malloc_error_break, address = 0x0000000000011962
(lldb) r test_numpy_10_13.py
Process 79296 launched: '/opt/local/bin/python3.7' (x86_64)
Process 79296 stopped
* thread #2, stop reason = exec
    frame #0: 0x000000010000519c dyld`_dyld_start
dyld`_dyld_start:
->  0x10000519c <+0>: popq   %rdi
    0x10000519d <+1>: pushq  $0x0
    0x10000519f <+3>: movq   %rsp, %rbp
    0x1000051a2 <+6>: andq   $-0x10, %rsp
Target 0: (Python) stopped.
(lldb) c
Process 79296 resuming
Python(79296,0x7fff9cda4380) malloc: *** mach_vm_map(size=18446744072618991616) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Process 79296 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00007fff64772962 libsystem_malloc.dylib`malloc_error_break
libsystem_malloc.dylib`malloc_error_break:
->  0x7fff64772962 <+0>: pushq  %rbp
    0x7fff64772963 <+1>: movq   %rsp, %rbp
    0x7fff64772966 <+4>: nop    
    0x7fff64772967 <+5>: nopl   (%rax)
Target 0: (Python) stopped.
(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00007fff64772962 libsystem_malloc.dylib`malloc_error_break
    frame #1: 0x00007fff6476fa08 libsystem_malloc.dylib`szone_error + 392
    frame #2: 0x00007fff64771ed6 libsystem_malloc.dylib`mvm_allocate_pages + 256
    frame #3: 0x00007fff64767475 libsystem_malloc.dylib`large_malloc + 464
    frame #4: 0x00007fff6476339d libsystem_malloc.dylib`szone_malloc_should_clear + 388
    frame #5: 0x00007fff647631bd libsystem_malloc.dylib`malloc_zone_malloc + 103
    frame #6: 0x00007fff647624c7 libsystem_malloc.dylib`malloc + 24
    frame #7: 0x0000000105256a2f _umath_linalg.cpython-37m-darwin.so`DOUBLE_lstsq + 735
    frame #8: 0x0000000104389267 umath.cpython-37m-darwin.so`PyUFunc_GenericFunction + 19415
    frame #9: 0x000000010438be3e umath.cpython-37m-darwin.so`ufunc_generic_call + 174
    frame #10: 0x00000001000bc6dd Python`_PyObject_FastCallKeywords + 359
    frame #11: 0x00000001001525d3 Python`call_function + 568
    frame #12: 0x000000010014a6ab Python`_PyEval_EvalFrameDefault + 2706
    frame #13: 0x0000000100152f31 Python`_PyEval_EvalCodeWithName + 1837
    frame #14: 0x00000001000bc83c Python`_PyFunction_FastCallKeywords + 225
    frame #15: 0x00000001001525da Python`call_function + 575
    frame #16: 0x000000010014a617 Python`_PyEval_EvalFrameDefault + 2558
    frame #17: 0x0000000100152f31 Python`_PyEval_EvalCodeWithName + 1837
    frame #18: 0x00000001000bc83c Python`_PyFunction_FastCallKeywords + 225
    frame #19: 0x00000001001525da Python`call_function + 575
    frame #20: 0x000000010014a586 Python`_PyEval_EvalFrameDefault + 2413
    frame #21: 0x0000000100152f31 Python`_PyEval_EvalCodeWithName + 1837
    frame #22: 0x0000000100149b91 Python`PyEval_EvalCode + 42
    frame #23: 0x0000000100177e7f Python`run_mod + 54
    frame #24: 0x0000000100176e9a Python`PyRun_FileExFlags + 164
    frame #25: 0x0000000100176579 Python`PyRun_SimpleFileExFlags + 283
    frame #26: 0x000000010018dc4e Python`pymain_main + 5114
    frame #27: 0x000000010018e3e0 Python`_Py_UnixMain + 104
    frame #28: 0x00007fff645ba015 libdyld.dylib`start + 1
    frame #29: 0x00007fff645ba015 libdyld.dylib`start + 1
(lldb) 

comment:18 Changed 4 months ago by michaelld (Michael Dickens)

So it seems like there's a memory allocation error in umath_linalg routine DOUBLE_lstsq ... or, something like that.

comment:19 Changed 4 months ago by michaelld (Michael Dickens)

lstsq == "least squares" ... which is for solving linear problems. So not a huge surprise given that the ticket issue is for polyfit ... fitting polynomial curves to data, which is a least squares type of problem.

comment:20 Changed 4 months ago by michaelld (Michael Dickens)

Here is where the error message is coming from: https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src#L2561 .

and here's the printed warning: https://github.com/numpy/numpy/blob/master/numpy/lib/polynomial.py#L585 .

No idea if this is useful, but here we are!

comment:21 Changed 4 months ago by michaelld (Michael Dickens)

I don't have gcc8 installed yet, so I can't test with it. That said py27-numpy +gcc7 works for me on 10.13 .. which is strange because the +gfortran variant has the same dependencies as +gcc7 ... which makes sense because ${prefix}/bin/gfortran-mp-7 is provided by gcc7 ... so, what is the difference in the Portfile when using the 2 variants? That seems to make the difference.

comment:22 Changed 4 months ago by mojca (Mojca Miklavec)

These are the ports that I have installed:

  cctools @895_6+xcode (active)
  ld64 @3_1+ld64_xcode (active)
  py27-numpy @1.15.0_0+gfortran (active)
  py36-numpy @1.15.0_0+gfortran (active)
  py37-numpy @1.15.0_0+openblas (active) # after playing with the idea that gfortran might be the cause, but it didn't change anything
  OpenBLAS @0.3.2_0+clang+gcc7+lapack (active)

Apparently either

sudo port install py37-numpy +gcc7

or

sudo port install py37-numpy +gcc8

fixed the problem, but it would be ideal to figure out why exactly before blindly changing the default.

comment:23 Changed 4 months ago by dershow

Cc: dershow added

comment:24 Changed 3 months ago by andreavicere

On my system (MacOS Mojave 10.14, macports 2.5.3) switching only to gcc8 did not suffice.

I had to also switch to openblas: running

sudo port install py27-numpy +gcc8 +openblas
sudo port install py37-numpy +gcc8 +openblas

fixed the numpy.polyfit issue in both Python 2 and 3.

Last edited 3 months ago by andreavicere (previous) (diff)

comment:25 Changed 8 weeks ago by jsalort (Julien Salort)

Cc: jsalort added

comment:26 Changed 7 weeks ago by DanielO (Daniel O'Connor)

Cc: DanielO added

comment:27 Changed 5 weeks ago by reneeotten (Renee Otten)

Cc: reneeotten added

comment:28 Changed 9 days ago by lpsinger (Leo Singer)

Just selecting openblas and not gcc8 was sufficient for me.

sudo port install py27-numpy +gcc8 +openblas

comment:29 Changed 9 days ago by reneeotten (Renee Otten)

FYI, the upstream bug-report is here.

Installing with +gcc8 and +openblas did work for me though...

  py27-numpy @1.15.4_0+gcc8+openblas (active)
  OpenBLAS @0.3.3_0+clang+gcc8+lapack (active)
Note: See TracTickets for help on using tickets.