Opened 3 years ago

Closed 21 months ago

Last modified 21 months ago

#63480 closed defect (fixed)

pspp-devel @1.5.3_10+gui+x11: failed assertion `!"reached"' when building

Reported by: evanmiller (Evan Miller) Owned by: nerdling (Jeremy Lavergne)
Priority: Normal Milestone:
Component: ports Version: 2.7.1
Keywords: Cc: mascguy (Christopher Nielsen)
Port: pspp-devel

Description

This happens when attempting to build pspp-devel using both cairo and cairo-devel.

:info:build LSAN_OPTIONS="suppressions=/opt/local/var/macports/build/_Users_emiller_macports.local_math_pspp-devel/pspp-devel/work/pspp-1.5.3-gee1bfc/tests/lsan.supp:print_suppressions=0:$LSAN_OPTIONS" utilities/pspp-output convert doc/pspp-figures/descriptives.spv doc/pspp-figures/descriptives.png -O trim=true -O left-margin=0in -O right-margin=0in -O top-margin=0in -O bottom-margin=0in -O paper-size=7.5x99in --table-look=./doc/tutorial.stt
:info:build cairo-pattern.c:3392: failed assertion `!"reached"'
:info:build make[2]: *** [doc/pspp-figures/descriptives.png] Abort trap

It looks like the failed assertion is here:

https://cgit.freedesktop.org/cairo/tree/src/cairo-pattern.c#n3392

Full log to follow. It's a 32-bit PPC system, which is often relevant.

Attachments (1)

pspp-devel-main.log (169.9 KB) - added by evanmiller (Evan Miller) 3 years ago.

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by evanmiller (Evan Miller)

Attachment: pspp-devel-main.log added

comment:2 Changed 3 years ago by evanmiller (Evan Miller)

The plot thickens.... running it again, I get

:info:build LSAN_OPTIONS="suppressions=/opt/local/var/macports/build/_Users_emiller_macports.local_math_pspp-devel/pspp-devel/work/pspp-1.5.3-gee1bfc/tests/lsan.supp:print_suppressions=0:$LSAN_OPTIONS" utilities/pspp-output convert doc/pspp-figures/aggregate.spv doc/pspp-figures/aggregate.png -O trim=true -O left-margin=0in -O right-margin=0in -O top-margin=0in -O bottom-margin=0in -O paper-size=7.5x99in --table-look=./doc/tutorial.stt
:info:build pspp-output(674,0xa000ed88) malloc: *** error for object 0x49079f0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug
:info:build pspp-output(674,0xa000ed88) malloc: *** set a breakpoint in szone_error to debug

So it looks like there's memory corruption happening somewhere. The initial assert failure is likely just a symptom of that.

comment:3 Changed 3 years ago by kencu (Ken)

no, not memory corruption in the app, exactly, I don't think

This is the error we see caused by the new libgcc_s.1.dylib in gcc7 conflicting with the old libgcc_s.1.dylib in /usr/lib.

We saw the same error in several (but not all) software linked against libgcc7. The exact nature of what causes it is not 100% clear to me at least, but it did not happen with gcc 7.4.0 and it does happen since then.

There are two fixes I know of. Build with static-libgcc (I believe that will work, haven't actually done it to prove it) or use DYLD_LIBRARY_PATH set to /opt/local/lib/libgcc. Or use gcc 7.4.0.

MacPorts decided to do the DYLD setting fix. There is an option in the legacysupport PortGroup to wrap the binaries that error, and set this automatically with a wrapper.

Inspect the legacysupport 1.1 PortGroup, and look in the cmake Portfile for a good example.

Last edited 3 years ago by kencu (Ken) (previous) (diff)

comment:4 Changed 3 years ago by kencu (Ken)

The darwin gcc maintainer and I were talking also about the idea of replacing some of the older libraries in /usr/lib with some newer ones that would not show this incompatability, in particular the one mentioned. I have not as yet tried that to see just what would happen, but it is an available option we might consider, depending on how much trouble this causes...

comment:5 Changed 3 years ago by evanmiller (Evan Miller)

Running the failing command manually, sometimes it succeeds, sometimes I get

GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.  Aborting.

Other times I get the incorrect checksum message.

To test your GCC theory, I used install_name_tool to remove the /usr/lib/libgcc_s.1.dylib linkage, and get the same errors:

$ sudo -u macports install_name_tool -change /usr/lib/libgcc_s.1.dylib /opt/local/lib/libgcc/libgcc_s.1.dylib utilities/.libs/pspp-output
$ sudo -u macports ./utilities/pspp-output  convert doc/pspp-figures/aggregate.spv doc/pspp-figures/aggregate.png -O trim=true -O left-margin=0in -O right-margin=0in -O top-margin=0in -O bottom-margin=0in -O paper-size=7.5x99in --table-look=./doc/tutorial.stt
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.  Aborting.
fish: Job 1, 'sudo -u macports ./utilities/...' terminated by signal SIGABRT (Abort)
$ sudo -u macports ./utilities/pspp-output  convert doc/pspp-figures/aggregate.spv doc/pspp-figures/aggregate.png -O trim=true -O left-margin=0in -O right-margin=0in -O top-margin=0in -O bottom-margin=0in -O paper-size=7.5x99in --table-look=./doc/tutorial.stt
pspp-output(314,0xa000ed88) malloc: *** error for object 0x491fcd0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug
pspp-output(314,0xa000ed88) malloc: *** set a breakpoint in szone_error to debug
$ otool -l utilities/.libs/pspp-output | grep 'name /'
         name /usr/lib/dyld (offset 12)
         name /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (offset 24)
         name /opt/local/lib/pspp/libpspp-1.5.3-gee1bfc.dylib (offset 24)
         name /opt/local/lib/libgsl.25.dylib (offset 24)
         name /opt/local/lib/pspp/libpspp-core-1.5.3-gee1bfc.dylib (offset 24)
         name /opt/local/lib/libxml2.2.dylib (offset 24)
         name /opt/local/lib/libpangocairo-1.0.0.dylib (offset 24)
         name /opt/local/lib/libpango-1.0.0.dylib (offset 24)
         name /opt/local/lib/libgobject-2.0.0.dylib (offset 24)
         name /opt/local/lib/libglib-2.0.0.dylib (offset 24)
         name /opt/local/lib/libharfbuzz.0.dylib (offset 24)
         name /opt/local/lib/libcairo.2.dylib (offset 24)
         name /opt/local/lib/libiconv.2.dylib (offset 24)
         name /opt/local/lib/libreadline.8.dylib (offset 24)
         name /opt/local/lib/libgslcblas.0.dylib (offset 24)
         name /opt/local/lib/libz.1.dylib (offset 24)
         name /opt/local/lib/libintl.8.dylib (offset 24)
         name /opt/local/lib/libgcc/libgcc_s.1.dylib (offset 24)
         name /opt/local/lib/libgcc/libgcc_s.1.dylib (offset 24)
         name /usr/lib/libSystem.B.dylib (offset 24)

comment:6 Changed 3 years ago by kencu (Ken)

You can't change the /usr/lib/libgcc_s.1.dylib to /opt/local/lib/libgcc/libgcc_s.1.dylib and have it work correctly.

There is funky stuff that goes on. As I recall, some of it is stub library stuff. Some of it involves objects being passed between some of those other linked in dylibs and the main executable, and those other linked in dylibs might be linked themselves against /usr/lib/libgcc_s.1.dylib .

So that is not actually as good as test as you would have hoped it might be.

Instead, do this (on the original unmodified executable):

DYLD_LIBRARY_PATH=/opt/local/lib/libgcc  ./utilities/pspp-output

and if it is like the other 10,000 examples of this, it will work properly.

Last edited 3 years ago by kencu (Ken) (previous) (diff)

comment:7 Changed 3 years ago by kencu (Ken)

BTW I held my open repos back to gcc 7.4.0 for two years after MacPorts upgraded to 7.5.0 so I would not be bothered by this "bug".

It's not actually a bug, it's a true ABI incompatibility and it was never meant to work to pass objects back and forth between libgcc-4.2 and libgcc-7.5 -- gcc makes no such promises.

Last edited 3 years ago by kencu (Ken) (previous) (diff)

comment:8 Changed 3 years ago by evanmiller (Evan Miller)

Same errors using DYLD_LIBRARY_PATH=/opt/local/lib/libgcc after rebuilding the executable.

The symptoms look to me like a classic case of memory corruption due to programmer error.

comment:9 Changed 3 years ago by kencu (Ken)

ok, this one may be different then.

here's an example of the libgcc7 error

https://trac.macports.org/ticket/59832

comment:10 Changed 3 years ago by evanmiller (Evan Miller)

Just for the record, the linked GCC issue presents as Non-aligned pointer being freed. Here we are seeing incorrect checksum for freed object. Without knowing more, perhaps that distinction will help to debug malloc errors in other projects.

comment:11 Changed 3 years ago by kencu (Ken)

perhaps that might hold up as a solid differentiator.

having this constant underlying toolchain issue always in the wings, and knowing that the manifestations of it are unpredictable, certainly reduces confidence in the process.

whenever there is a disconnect between a created and freed object, the toolchain is the first thing to consider, unfortunately.

comment:12 Changed 3 years ago by evanmiller (Evan Miller)

Running in a debugger, I intermittently get:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000005
0x0028e5b4 in output_driver_destroy ()
(gdb) bt                                                                                    
#0  0x0028e5b4 in output_driver_destroy ()
#1  0x0028e678 in output_engine_pop ()
#2  0x00007fd8 in run_convert ()
#3  0x0000af7c in main ()

Still get the incorrect checksum for freed object occasionally but haven't gotten a backtrace on it yet.

Given the intermittency I suspect some kind of uninitialized memory somewhere.

comment:13 Changed 3 years ago by kencu (Ken)

Methinks there is something wrong in glib2, on tiger at least. This should not be happening:

GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.

and seems possibly to be the root of all further evil.

comment:14 Changed 3 years ago by evanmiller (Evan Miller)

My hypothesis with that error was that the argument was being memory corrupted. But it will be hard to know without either deep debugging or a sanitizer.

comment:15 Changed 3 years ago by evanmiller (Evan Miller)

Well, -fsanitize=address isn't supported on this machine, but it would be a useful flag to add on a platform where ASan supported and see if that turns up anything.

comment:16 Changed 3 years ago by evanmiller (Evan Miller)

Seeing a similar-looking issue with pspp-1.5.3-g39e99a:

:info:build LSAN_OPTIONS="suppressions=/opt/local/var/macports/build/_Users_emiller_macports.local_math_pspp-devel/pspp-devel/work/pspp-1.5.3-g39e99a/tests/lsan.supp:print_suppressions=0:$LSAN_OPTIONS" utilities/pspp-output convert doc/pspp-figures/crosstabs.spv doc/pspp-figures/crosstabs.png -O trim=true -O left-margin=0in -O right-margin=0in -O top-margin=0in -O bottom-margin=0in -O paper-size=7.5x99in --table-look=./doc/tutorial.stt
:info:build cairo-pattern.c:371: failed assertion `other->status == CAIRO_STATUS_SUCCESS'
:info:build make[2]: *** [doc/pspp-figures/crosstabs.png] Abort trap

comment:17 Changed 2 years ago by evanmiller (Evan Miller)

The latest beta (g82a757) seems to have resolved this issue.

comment:18 Changed 2 years ago by mascguy (Christopher Nielsen)

Cc: mascguy added

comment:19 Changed 21 months ago by nerdling (Jeremy Lavergne)

Resolution: fixed
Status: assignedclosed

According to the referenced commit from evanmiller, this should have been fixed and shipped in v1.5.5. That'd map to pspp-devel @1.6.0_0 for macports.

Given that, this looks to have been fixed upstream already.

Please re-open if that's not the case as I have no hardware to verify this.

Last edited 21 months ago by nerdling (Jeremy Lavergne) (previous) (diff)
Note: See TracTickets for help on using tickets.