Opened 17 months ago

Closed 7 months ago

#59022 closed defect (fixed)

py-numpy: Anything that imports numpy crashes python on sierra/10.12 in a vm/virtual machine

Reported by: ryandesign (Ryan Schmidt) Owned by: michaelld (Michael Dickens)
Priority: Normal Milestone:
Component: ports Version:
Keywords: sierra haspatch Cc: jmroot (Joshua Root), stromnov (Andrew Stromnov), reneeotten (Renee Otten), JackDunnNZ (Jack Dunn)
Port: py-numpy

Description

py36-matplotlib and py37-matplotlib do not build:

running build_ext
sh: line 1: 98187 Illegal instruction: 4  /opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 setup.py --no-user-cfg build
Command failed:  cd "/opt/local/var/macports/build/_opt_bblocal_var_buildworker_ports_build_ports_python_py-matplotlib/py37-matplotlib/work/matplotlib-3.1.1" && /opt/local/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 setup.py --no-user-cfg build 
Exit code: 132

py35-healpy fails with a similar error

Change History (27)

comment:1 Changed 17 months ago by reneeotten (Renee Otten)

It seems this only happens for matplotlib version 3.1.1 and specific versions of macOS. I am currently traveling, but will take a closer look after next week.

comment:2 Changed 16 months ago by reneeotten (Renee Otten)

I am not really sure how to proceed... I can't really tell why it does not work on macOS 10.12 (on 10.11 and lower the failures seem to be caused by its dependencies).

The failure on macOS 10.12 happens after: running build_ext

On macOS 10.13 and 10.14, the build continues after that with the following:

building 'matplotlib.ft2font' extension
creating build/temp.macosx-10.13-x86_64-3.7
creating build/temp.macosx-10.13-x86_64-3.7/src
/usr/bin/clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -pipe -Os -arch x86_64 -DFREETYPE_BUILD_TYPE=system -DPY_ARRAY_UNIQUE_SYMBOL=MPL_matplotlib_ft2font_ARRAY_API -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -D__STDC_FORMAT_MACROS=1 -Iextern/agg24-svn/include -I/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/include -I/opt/local/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/checkdep_freetype2.c -o build/temp.macosx-10.13-x86_64-3.7/src/checkdep_freetype2.o -I/opt/local/include/freetype2 -I/opt/local/include/libpng16
src/checkdep_freetype2.c:7:9: warning: Compiling with FreeType version 2.10.1. [-W#pragma-messages]
#pragma message("Compiling with FreeType version " \
        ^
1 warning generated.

somehow it appears that this clang command does not work on macOS 10.12. I do not have access to a machine with that OS installed and don't really know how to continue with this - does anyone have a suggestion?

comment:3 Changed 16 months ago by reneeotten (Renee Otten)

also py35-svipc fails with a similar error and only on macOS 10.12.

To me this suggests it's something specific with macOS 10.12, and it started to happen more recently: at least py{36/37}-matplotlib used to build fine at the time of the last update (July 3) (and before the rev-bump due to qhull on August 26). From the log file it seems unlikely that the qhull update has anything to do with it, so is there anything else that has changed on the buildbot in between?

comment:4 Changed 16 months ago by ryandesign (Ryan Schmidt)

In #59383 we've found that it seems to be a NumPy problem.

comment:5 Changed 13 months ago by ryandesign (Ryan Schmidt)

Cc: jmroot stromnov reneeotten added
Keywords: sierra added
Owner: changed from reneeotten to michaelld
Port: py-numpy added; py-healpy py-matplotlib removed
Summary: py35-healpy, py36-matplotlib, py37-matplotlib: Illegal instruction: 4py-numpy: Anything that imports numpy crashes python on sierra/10.12 in a vm/virtual machine

Has duplicates #59383, #59979. Crash logs are attached to #59979.

The upstream bug report we have been presuming this relates to is https://github.com/numpy/numpy/issues/10330. However it is closed because it was thought not to be a numpy bug but rather a defect in the Xen hypervisor that that user was using. There is a sample program which was reporting that AVX was supported but trying to use AVX functions was causing a crash.

I have tried that test program on our Sierra and High Sierra buildbot workers. On both, it reports that AVX is not available. Also, the upstream bug report says the problem occurs when doing "simple array additions", but in our case, the problem occurs immediately after "import numpy". So perhaps our problem is not the same one.

Our buildbot workers are virtual machines (running under VMware ESXi on Xserves) so that part may still be relevant. If anybody experiences this problem on Sierra running directly on a Mac, not in a VM, let us know.

comment:6 Changed 13 months ago by ryandesign (Ryan Schmidt)

A more likely upstream bug report is https://github.com/numpy/numpy/issues/13059 which is still open and in which a user on Linux reports "import numpy" crashing in a vm with a backtrace that looks slightly different from ours but we're both crashing in npy_cpu_supports.

comment:7 Changed 13 months ago by ryandesign (Ryan Schmidt)

I wrote up my own upstream bug report: https://github.com/numpy/numpy/issues/15342; we'll see what they say.

comment:8 Changed 12 months ago by lpsinger (Leo Singer)

Cc: lpsinger removed

comment:9 Changed 12 months ago by JackDunnNZ (Jack Dunn)

Cc: JackDunnNZ added

comment:10 Changed 12 months ago by ryandesign (Ryan Schmidt)

They pointed me to a PR from April that solves the problem by rewriting their CPU detection code. But it conflicts with another PR they want so I'm not sure when it will be merged.

comment:11 Changed 11 months ago by ryandesign (Ryan Schmidt)

The PR was merged February 5. Let's see if the next release after that fixes the problem. If so, we should trigger new builds of everything that uses numpy on the 10.12 buildbot worker.

comment:12 Changed 10 months ago by reneeotten (Renee Otten)

In 6f7e3ab64b991a8164164f2248d64f7c006f8b0e/macports-ports (master):

py-numpy: update to 1.18.2 and 1.16.4 (PY27)

See: #59022

comment:13 Changed 10 months ago by reneeotten (Renee Otten)

Ryan: how do we go about re-scheduling some builds (for example the ones listed in ticket description) on the 10.12 buildbot to see if this resolve the issue? If you can/want to take care of it: great! Otherwise please provide me with some instructions on how one should do this. Thanks!

comment:14 Changed 10 months ago by ryandesign (Ryan Schmidt)

I scheduled builds for py-healpy and py-matplotlib, unfortunately they still failed the same way as before. The release notes for numpy 1.18.2 do not mention PR 13421 and the fix does not appear to have been included in this version.

You can schedule builds for 10.12 by filling a space-separated list of ports into the "port list" field; putting in a short explanation in the "reason" field (maybe with a ticket URL) is nice too. You need to log in with your buildbot account before the fields become visible. If you don't have a buildbot account I can make one for you.

comment:15 in reply to:  14 Changed 10 months ago by reneeotten (Renee Otten)

Replying to ryandesign:

I scheduled builds for py-healpy and py-matplotlib, unfortunately they still failed the same way as before. The release notes for numpy 1.18.2 do not mention PR 13421 and the fix does not appear to have been included in this version.

thanks for doing that Ryan. I must admit that I assumed (but didn't actually check) whether the fix was included in the release. That's very unfortunate, I would have thought this could have gone into a minor release.

comment:16 Changed 7 months ago by reneeotten (Renee Otten)

In 199f5a1f76135d6eda4af6696ae656df7d0039b8/macports-ports (master):

py-numpy: update to 1.19.0

  • pin to version 1.18.5 for PY35

See: #59022

comment:17 Changed 7 months ago by ryandesign (Ryan Schmidt)

Thanks, that's great! The 10.12 buildbot worker is offline at the moment so we can't see if that fixed it but I will try to get it back online soon. I need to order a new SSD first.

Now what do we do about py35 and earlier? Can we backport the fix to those versions?

comment:18 in reply to:  17 Changed 7 months ago by reneeotten (Renee Otten)

Now what do we do about py35 and earlier? Can we backport the fix to those versions?

one probably could with enough motivation and time, but I'm not volunteering ;) Python 2.7 is already EOL and Python 3.5 will follow in a few months... so perhaps it's not worth the trouble anymore considering this only happens on an old system (macOS 10.12) and for the older PY versions?

comment:19 Changed 7 months ago by ryandesign (Ryan Schmidt)

If I volunteered to backport this, would you accept a PR?

I like looking through the buildbot waterfall screen to see what failed so I can investigate fixing it or reporting it. But so many of the failed builds on Sierra were from ports depending in some way on numpy which failed due to this bug. That wastes my time examining build failures only to determine they all have the same root cause that we haven't fixed, and it wastes our build server's time to keep trying to build things we already know don't work. I'd rather spend a little time actually fixing it.

comment:20 in reply to:  19 Changed 7 months ago by reneeotten (Renee Otten)

Replying to ryandesign:

If I volunteered to backport this, would you accept a PR?

that's up to @michaelld to comment on, but I personally don't see a reason why such a PR would not be accepted.

comment:21 Changed 7 months ago by michaelld (Michael Dickens)

Please do!

comment:22 Changed 7 months ago by ryandesign (Ryan Schmidt)

Keywords: haspatch added

The patch was easy to backport. It required no changes to apply to py35-numpy @1.18.5 and only a tiny change to apply to py27-numpy @1.16.6.

https://github.com/macports/macports-ports/pull/7550

To test this, I first installed py35-numpy @1.18.5 on a 10.12 VM and verified that trying to install py35-matplotlib and py35-svipc failed as reported in this ticket. I then added my patch, rebuilt py35-numpy, and verified that py35-matplotlib and py35-svipc were then able to build.

For py27-numpy @1.16.6 it's a slightly different situation. py27-matplotlib and py27-svipc built successfully with or without the numpy patch. Presumably numpy 1.18 uses the cpu detection functions during startup and so it crashes right away whereas 1.16 doesn't. However we do know that the pre-1.19 cpu detection functions are broken, so I think they should still be replaced in 1.16. I suspect that there are other codepaths that would lead to the cpu detection functions being used in 1.16 (just not right at startup) so that a crash might still happen during usage of numpy if we don't fix it.

comment:23 Changed 7 months ago by ryandesign (Ryan Schmidt)

Note that I didn't read the code. I'm not particularly proficient in python. I know there are some differences between python 2 and 3. I hope there are no such relevant differences in this code. I figured if the code was not python 2 compatible then the build would have failed, and it didn't.

comment:24 Changed 7 months ago by ryandesign (Ryan Schmidt)

The bulk of the new code in the patch is written in C so that should be fine. The only big chunk of new python code that might potentially be incompatible with python 2 is the new test case, but the py-numpy port doesn't enable tests anyway.

I would like to merge this PR before bringing the 10.12 builder back online. Any objection to merging the PR now?

comment:25 Changed 7 months ago by reneeotten (Renee Otten)

looks good to me Ryan - thank you! I checked the Python code in the test, it seems PY2-compatible to me, but even it it were not the tests are only executed on Linux so it doesn't really matter in this case.

comment:26 Changed 7 months ago by ryandesign (Ryan Schmidt)

In bc1a483dafed38c3692c5cb2aa89b6b35220145f/macports-ports (master):

py35-numpy: Backport new CPU detection code

See: #59022

comment:27 Changed 7 months ago by ryandesign (Ryan Schmidt)

Resolution: fixed
Status: assignedclosed

In 3f78e4b15ec81d8989010f4b386a2e14f62aabfa/macports-ports (master):

py27-numpy: Backport new CPU detection code

Closes: #59022

Note: See TracTickets for help on using tickets.