Opened 2 years ago

Closed 20 months ago

Last modified 20 months ago

#64252 closed defect (fixed)

darktable @4.x+quartz: Crash when opening "darkroom" view of Sony NEX-6 AWR file

Reported by: thomasrussellmurphy (Thomas Russell Murphy) Owned by: mascguy (Christopher Nielsen)
Priority: Normal Milestone:
Component: ports Version: 2.7.0
Keywords: pending Cc: MarcusCalhoun-Lopez (Marcus Calhoun-Lopez), parafin, ryandesign (Ryan Carsten Schmidt)
Port: darktable GraphicsMagick

Description

Initially reported to upstream (https://github.com/darktable-org/darktable/issues/10653) and told that this crash does not replicate on an official build for macOS.

Describe the bug/issue Importing a Sony NEX-6 AWR (RAW) file works initially in the lighttable view, but attempting to open the image into the darkroom view results in a crash while trying to load the file.

To Reproduce _Please provide detailed steps to reproduce the behaviour, for example:_

  1. Acquire a sample NEX-6 AWR [Real world shot ISO 100 (Zipped file - 16.4MB)](http://download.dpreview.com/sony_nex6/DSC00421.ARW.zip) from dpreview [Sony NEX-6 Review page 15](https://www.dpreview.com/reviews/sony-alpha-nex-6/15)
  2. Unzip file and place in a convenient directory
  3. Select import > add to library > (navigate to directory) DSC00421.ARW
  4. RAW file loads in lighttable view
  5. Select the darkroom view
  6. Image starts to load and render the RAW file
  7. Crash to desktop
  8. Restart results in immediate crash

Expected behavior The darkroom view renders the AWR with default RAW development settings. No crash and able to start adjusting RAW settings.

Platform _Please fill as much information as possible in the list given below. Please state "unknown" where you do not know the answer and remove any sections that are not applicable _

Additional context _Please provide any additional information you think may be useful, for example:_

  • Can you reproduce with another darktable version(s)? _Not attempted_
  • Can you reproduce with a RAW or Jpeg or both? RAW-file-format
  • Are the steps above reproducible with a fresh edit (i.e. after discarding history)? N/a
  • If the issue is with the output image, attach an XMP file if (you'll have to change the extension to .txt)
  • Is the issue still present using an empty/new config-dir (e.g. start darktable with --configdir "/tmp")? yes (very repeatable, have to clear config to fix crash-on-reopen)
  • Do you use lua scripts? no

Top of stack trace from crash reporter

Process:               darktable [45286]
Path:                  /opt/local/bin/darktable
Identifier:            org.darktable.darktable
Version:               3.6.1 (3.6.1)
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           darktable [45286]
User ID:               502

Date/Time:             2021-12-18 19:23:14.687 -0600
OS Version:            Mac OS X 10.14.6 (18G9323)
Report Version:        12
Bridge OS Version:     5.5 (18P4759a)
Anonymous UUID:        A72D88A0-DE09-2340-0C1A-CCC3D5030DA7


Time Awake Since Boot: 4700000 seconds

System Integrity Protection: enabled

Crashed Thread:        6  worker res 1

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000015dd5d000
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x15dd5d000:
    MALLOC_LARGE           000000015b858000-000000015dd5d000 [ 37.0M] rw-/rwx SM=PRV  
--> 
    STACK GUARD            000070000df9f000-000070000dfa0000 [    4K] ---/rwx SM=NUL  stack guard for thread 15

Application Specific Information:
abort() called

Change History (43)

comment:1 Changed 2 years ago by mascguy (Christopher Nielsen)

Cc: mascguy removed
Owner: set to mascguy
Status: newassigned

comment:2 Changed 2 years ago by mascguy (Christopher Nielsen)

Tested with the X11 variant, and the crash doesn't occur in that case. So that's good news.

Next up is installing the Quartz variant - which is the setup covered by this ticket - to see what's happening.

comment:3 Changed 2 years ago by Christopher Nielsen <mascguy@…>

In 5c0ef123a355a4ddacab7c97de88305f68bb1857/macports-ports (master):

darktable: update to 3.8.1
See: #64252

comment:4 Changed 2 years ago by thomasrussellmurphy (Thomas Russell Murphy)

Now darktable seems to consistently crash, with a fresh config (clearing ~./config/darktable since I don't have it set up well) each time: a) on quit and b) upon importing the AWR file to the lighttable (can't get to the darkroom view, even).

Process:               darktable [6712]
Path:                  /opt/local/bin/darktable
Identifier:            org.darktable.darktable
Version:               3.8.1 (3.8.1)
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           darktable [6712]
User ID:               502

Date/Time:             2022-02-27 10:56:56.576 -0600
OS Version:            Mac OS X 10.14.6 (18G9323)
Report Version:        12
Bridge OS Version:     5.5 (18P4759a)
Anonymous UUID:        A72D88A0-DE09-2340-0C1A-CCC3D5030DA7


Time Awake Since Boot: 2900000 seconds

System Integrity Protection: enabled

Crashed Thread:        10  worker res 1

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000144c56000
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x144c56000:
    MALLOC_LARGE           0000000140150000-0000000144c56000 [ 75.0M] rw-/rwx SM=PRV  
--> 
    STACK GUARD            000070000a788000-000070000a789000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Application Specific Information:
abort() called

comment:5 Changed 22 months ago by Christopher Nielsen <mascguy@…>

In 439af838c5d96bc23bec55d8e9b594154eb2097d/macports-ports (master):

darktable: update to 4.0.0; add libheif

  • CC: @MarcusCalhoun-Lopez

See: #64252

comment:6 Changed 21 months ago by mascguy (Christopher Nielsen)

Keywords: pending added

comment:7 Changed 21 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: assignedclosed

Testing locally with version 4.0.0 +quartz, I no longer see a crash.

Let me know if you're still having issues. If so, we can reopen.

comment:8 Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

With darktable @4.0.0_2+quartz I get a crash at step 4 of my reproduction process, still. Fresh config directory.

However, it does appear to quit cleanly if I haven't imported anything, now.

comment:9 in reply to:  8 Changed 21 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: closedreopened

comment:10 in reply to:  8 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to thomasrussellmurphy:

With darktable @4.0.0_2+quartz I get a crash at step 4 of my reproduction process, still. Fresh config directory.

However, it does appear to quit cleanly if I haven't imported anything, now.

I tested on multiple macOS releases, from 10.12 through 10.15. And while most of those work fine, the crash did indeed occur with 10.14. (Which matches with what you're testing on, so that's good.)

Need to do more digging.

comment:11 Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

Thanks for continuing to investigate! Please let me know if you need additional input.

comment:12 Changed 21 months ago by mascguy (Christopher Nielsen)

Cc: parafin added
Summary: darktable @3.6.1_0+quartz: Crash when opening "darkroom" view of Sony NEX-6 AWR filedarktable @4.x+quartz: Crash when opening "darkroom" view of Sony NEX-6 AWR file

Interestingly enough, this crash doesn't occur when darktable is built with +debug.

Also, it appears that the issue is originating from within GraphicsMagick:

Magick: abort due to signal 11 (SIGSEGV) "Segmentation Fault"...
Abort trap: 6

Confirmed by thread stack trace:

Thread 8 Crashed:: worker res 1
0   libsystem_kernel.dylib        	0x00007fff6069f2c2 __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff6075abf1 pthread_kill + 284
2   libsystem_c.dylib             	0x00007fff606096a6 abort + 127
3   libGraphicsMagick.3.dylib     	0x000000011021c4bb MagickPanicSignalHandler + 52
4   libsystem_platform.dylib      	0x00007fff6074fb5d _sigtramp + 29
5   ???                           	0x00007f9a9f941f00 0 + 140302078975744
6   libdarktable.dylib            	0x000000010cf09bec dt_interpolation_resample + 252
7   libdarktable.dylib            	0x000000010cf0a5c8 dt_interpolation_resample_roi + 88
8   libdarktable.dylib            	0x000000010cfa4560 dt_iop_clip_and_zoom_roi + 80
9   libdemosaic.so                	0x000000011a07591a process + 21322
10  libdarktable.dylib            	0x000000010cfebfb5 pixelpipe_process_on_CPU + 405
11  libdarktable.dylib            	0x000000010cfe9150 dt_dev_pixelpipe_process_rec + 4848

[...more dt_dev_pixelpipe_process_rec...]

89  libdarktable.dylib            	0x000000010cfe6753 dt_dev_pixelpipe_process + 1075
90  libdarktable.dylib            	0x000000010cf90081 dt_dev_process_preview_job + 481
91  libdarktable.dylib            	0x000000010cf65ff1 dt_dev_process_preview_job_run + 17
92  libdarktable.dylib            	0x000000010cf5ef4d dt_control_work_res + 525
93  libsystem_pthread.dylib       	0x00007fff607582eb _pthread_body + 126
94  libsystem_pthread.dylib       	0x00007fff6075b249 _pthread_start + 66
95  libsystem_pthread.dylib       	0x00007fff6075740d thread_start + 13

Getting warmer!

comment:13 Changed 21 months ago by mascguy (Christopher Nielsen)

Just for kicks, I also tried installing GraphicsMagic with +q32, for another point of reference. But that didn't make a difference.

Version 0, edited 21 months ago by mascguy (Christopher Nielsen) (next)

comment:14 Changed 21 months ago by mascguy (Christopher Nielsen)

Cc: ryandesign added
Port: GraphicsMagick added

Presently, port GraphicsMagick doesn't have a +debug variant, which might help diagnosing this.

I'll see if I can quickly figure out what's necessary, and create a PR for @ryandesign.

comment:15 Changed 21 months ago by parafin

GraphicsMagick is a red-herring in the backtrace - it installs its own SIGSEGV handler, so unless application does the same it shows up in any backtrace of an application linking to it. Which, I would say, borders on malicious behaviour.

For source of the crash see frames 5-6, which is darktable's own code. This is probably a compiler bug. darktable ended official support for building on macOS 10.14 as of 4.0 release (Xcode version which is possible to install there is too old for the features we want, especially in rawspeed). It's possible though to build on macOS 10.15 with newer Xcode, but target 10.14 as deployment target (older ones are again not officially supported). This is how official macOS DMG package is created.

As a debug suggestion I can propose disabling OpenMP to see if it avoids the crash. It of course will degrade performance significantly.

comment:16 in reply to:  15 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to parafin:

For source of the crash see frames 5-6, which is darktable's own code. This is probably a compiler bug.

Confirmed, and the crash doesn't occur if we build with a newer MacPorts Clang.

I'll commit a fix shortly. Thanks for the quick response!

comment:17 Changed 21 months ago by mascguy (Christopher Nielsen)

Thomas, I may not be able to push a formal fix until later today, or perhaps even tomorrow.

So in the interim, you can fix your installation, by doing the following:

$ sudo port -f uninstall darktable
$ sudo port -N install darktable +quartz configure.compiler=macports-clang-13

comment:18 Changed 21 months ago by Christopher Nielsen <mascguy@…>

Resolution: fixed
Status: reopenedclosed

In 9c746d99bc7766acf129ee2dd264119a52595bc9/macports-ports (master):

darktable/darktable-devel: fix 10.14 crash; enable AVIF
Fixes: #64252
See: #65474

comment:19 in reply to:  17 Changed 21 months ago by mascguy (Christopher Nielsen)

Thomas, now that a formal fix has been pushed, you needn't do anything special. Just wait at least two hours, resync your ports, and then upgrade.

Once that's been done, let me know if all is well!

comment:20 Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

Cleanly updating to darktable @4.0.0_3+quartz (active) still results in the import -> crash behavior with the specified file and null configuration before start. I also encountered a build failure since cleaned after the recommended forced uninstall and assigned compiler suggestion.

comment:21 in reply to:  20 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to thomasrussellmurphy:

Cleanly updating to darktable @4.0.0_3+quartz (active) still results in the import -> crash behavior with the specified file and null configuration before start. I also encountered a build failure since cleaned after the recommended forced uninstall and assigned compiler suggestion.

The previous compilation error was expected, as there was another change needed. But the crash is troubling, as it's no longer occurring for my 10.14 installations.

Is the stack trace still similar, per our earlier comments from today?

comment:22 Changed 21 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: closedreopened

comment:23 Changed 21 months ago by mascguy (Christopher Nielsen)

Also, can you install with both +quartz and +debug, and test with that?

comment:24 in reply to:  12 ; Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

Reinstalling with +debug added results in. . . no crash. So potential workaround in that. Switching back tot he default, I get the same pattern of offsets as comment 12, now as Thread 9 Crashed:: worker res 1.

comment:25 in reply to:  24 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to thomasrussellmurphy:

Reinstalling with +debug added results in. . . no crash. So potential workaround in that. Switching back tot he default, I get the same pattern of offsets as comment 12, now as Thread 9 Crashed:: worker res 1.

That's great news!

Can you also provide the output from port info --depends_build darktable? I want to verify which MacPorts clang is being used. (It should default to clang-13, but just want to be sure before going any further.)

comment:26 Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

Indeed not. I get depends_build: cmake, cctools, gettext, intltool, pkgconfig, po4a, perl5.34 with the straight install.

comment:27 in reply to:  26 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to thomasrussellmurphy:

Indeed not. I get depends_build: cmake, cctools, gettext, intltool, pkgconfig, po4a, perl5.34 with the straight install.

Ah, you must have Xcode 11.x installed, which is a potential scenario that I missed.

Can you provide the output from xcodebuild -version?

comment:28 Changed 21 months ago by Christopher Nielsen <mascguy@…>

In 71126ccc8d425643ceb20cc30fc97ccabddc811f/macports-ports (master):

darktable/darktable-devel: avoid xcode clang 11; causes runtime crash
See: #64252

comment:29 Changed 21 months ago by mascguy (Christopher Nielsen)

Thomas, once you sync your ports, can you re-check the output from port info --depends_build darktable?

comment:30 Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

Xcode 11.3.1
Build version 11C504

Heading off to sync and re-install. At sync, I now see depends_build: cmake, cctools, gettext, intltool, pkgconfig, po4a, perl5.34, clang-13.

Thanks for all the support!

comment:31 in reply to:  30 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to thomasrussellmurphy:

Heading off to sync and re-install. At sync, I now see depends_build: cmake, cctools, gettext, intltool, pkgconfig, po4a, perl5.34, clang-13.

Thanks for all the support!

My pleasure, glad we could help!

And let me know when you've verified that the issue is fixed. I'll keep the ticket open until then, just-in-case.

comment:32 Changed 21 months ago by thomasrussellmurphy (Thomas Russell Murphy)

I got one crash-on-quit with the new install, but haven't been able to replicate with fresh config. In any case, the basic image loading now does work with this sample file, as does moving between modes once the file is loaded.

comment:33 in reply to:  32 Changed 21 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: reopenedclosed

Replying to thomasrussellmurphy:

I got one crash-on-quit with the new install, but haven't been able to replicate with fresh config. In any case, the basic image loading now does work with this sample file, as does moving between modes once the file is loaded.

That's awesome news, thanks for confirming!

comment:34 Changed 21 months ago by jmroot (Joshua Root)

I'd be very wary of assuming this to be a compiler bug. Yes, those exist sometimes, but when code mysteriously behaves differently, it is much more likely to be the optimiser taking different shortcuts in response to undefined behaviour in the code. I guess +debug probably compiles with -O0, hiding the issue, so to diagnose this it might be necessary to add -g to the normal flags in order to get line number information in the backtrace. (Note that the build directory needs to be present at runtime for this to work, so use port's -k flag to keep it.)

comment:35 in reply to:  34 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to jmroot:

I'd be very wary of assuming this to be a compiler bug. Yes, those exist sometimes, but when code mysteriously behaves differently, it is much more likely to be the optimiser taking different shortcuts in response to undefined behaviour in the code. I guess +debug probably compiles with -O0, hiding the issue, so to diagnose this it might be necessary to add -g to the normal flags in order to get line number information in the backtrace. (Note that the build directory needs to be present at runtime for this to work, so use port's -k flag to keep it.)

For sure, an optimization-related bug seems like a good possibility. Eventually I'll revisit this in more detail, though my first priority was to ensure it's no longer a blocker for Thomas and other users.

First I'd like to disable the GraphicsMagick lib signal handler though, per your comments in issue:65630.

comment:36 Changed 21 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: closedreopened

Upstream PR opened for darktable, to disable signal handlers for GraphicsMagick. Thanks @parafin!

12324 - graphicsmagick: use new API to not install signal handlers

comment:37 in reply to:  36 Changed 21 months ago by mascguy (Christopher Nielsen)

Replying to mascguy:

Upstream PR opened for darktable, to disable signal handlers for GraphicsMagick. Thanks @parafin!

Testing locally with this patch - combined with building via Xcode Clang - results in the following stacktrace:

Thread 7 Crashed:: worker res 1
0   libsystem_platform.dylib      	0x00007fff7c71f6de _platform_memmove$VARIANT$Nehalem + 254
1   libdarktable.dylib            	0x000000010ce921fc dt_interpolation_resample + 252
2   libdarktable.dylib            	0x000000010ce92bd8 dt_interpolation_resample_roi + 88
3   libdarktable.dylib            	0x000000010cf2cb70 dt_iop_clip_and_zoom_roi + 80
4   libdemosaic.so                	0x000000011a90f91a process + 21322
5   libdarktable.dylib            	0x000000010cf745c5 pixelpipe_process_on_CPU + 405
6   libdarktable.dylib            	0x000000010cf71760 dt_dev_pixelpipe_process_rec + 4848

Definitely an improvement detail-wise, with the GraphicsMagick signal handlers disabled!

Thoughts relative to the crash within _platform_memmove?

comment:38 Changed 21 months ago by Christopher Nielsen <mascguy@…>

In f90992e6447fc4682f9835cbd3c9351358192745/macports-ports (master):

darktable/darktable-devel: disable GraphicsMagick signal handlers
See: #64252

comment:39 Changed 21 months ago by jmroot (Joshua Root)

There's clearly inlining happening, as dt_interpolation_resample just calls either dt_interpolation_resample_sse or dt_interpolation_resample_plain. That line number information would be really helpful.

comment:40 Changed 20 months ago by mascguy (Christopher Nielsen)

Resolution: fixed
Status: reopenedclosed

While it would be awesome if we could determine the issue with Xcode Clang, there simply isn't enough time to go down this rabbit hole. Closing as fixed.

comment:41 Changed 20 months ago by jmroot (Joshua Root)

Undefined behaviour is (by definition, for better or worse) not an issue with the compiler, it's an issue with the code. Just because the code doesn't crash when built with a different compiler doesn't mean it's doing what it's supposed to do. I wouldn't call building with -g a rabbit hole, it's pretty basic debugging. I realise upstream doesn't seem interested in investigating, and if you don't want to either then fair enough I guess, I just wanted to make it clear that this is not so much "fixed" as "it compiles, ship it."

comment:42 in reply to:  41 Changed 20 months ago by mascguy (Christopher Nielsen)

Replying to jmroot:

Undefined behaviour is (by definition, for better or worse) not an issue with the compiler, it's an issue with the code. Just because the code doesn't crash when built with a different compiler doesn't mean it's doing what it's supposed to do. I wouldn't call building with -g a rabbit hole, it's pretty basic debugging. I realise upstream doesn't seem interested in investigating, and if you don't want to either then fair enough I guess, I just wanted to make it clear that this is not so much "fixed" as "it compiles, ship it."

Yep, understood.

But to correct you on one point: It's not accurate to state that upstream isn't interested in investigating. Indeed, they've dealt with this in the past, and require a newer compiler because of it.

comment:43 in reply to:  41 Changed 20 months ago by parafin

Replying to jmroot:

Undefined behaviour is (by definition, for better or worse) not an issue with the compiler, it's an issue with the code. Just because the code doesn't crash when built with a different compiler doesn't mean it's doing what it's supposed to do. I wouldn't call building with -g a rabbit hole, it's pretty basic debugging. I realise upstream doesn't seem interested in investigating, and if you don't want to either then fair enough I guess, I just wanted to make it clear that this is not so much "fixed" as "it compiles, ship it."

Compiler bugs do exist and I’ve encountered several myself. Well, you just have to open gcc bugzilla to believe it;)

So I don’t see why exactly you state that darktable triggers an undefined behavior as if it were a fact. While there’s for sure enough bugs of various nature in darktable, I don’t actually believe this to be one of them. Specifically OpenMP support historically was very problematic in various compilers, with differences between implementations and bugs (e.g. compiler just crashing, or is undefined behaviour again to blame?).

Note: See TracTickets for help on using tickets.