Opened 2 years ago

Closed 5 months ago

#64141 closed defect (fixed)

gdal @3.3.1_2+postgresql13+proj8: Crash with GeoTIFF that uses JPEG compression

Reported by: bal-agates Owned by: Veence (Vincent)
Priority: Normal Milestone:
Component: ports Version: 2.7.1
Keywords: Cc: cooljeanius (Eric Gallager)
Port: gdal

Description

The initial symptoms was QGIS crashing for a specific GeoTIFF file. The problem appeared to be in gdal. Reference https://github.com/OSGeo/gdal/issues/4948 for details of the problem. A quick summary is the MacPorts gdal project uses configuration settings:

--with-libtiff=internal
--with-jpeg=internal
--with-proj=${prefix}/lib/proj8

Port proj8 @8.2.0_0+tiff depends on port tiff. libproj.22.dylib references libtiff.5.dylib.

Port tiff @4.3.0_0 depends on port libjpeg_turbo. libtiff.5.dylib references libjpeg.8.dylib

Port libjpeg-turbo @2.1.2_0 provides libjpeg.8.dylib

Apparently references to libjpeg.8.dylib anywhere in the libgdal hierarchy has the potential to use those rather than the built-in functions generated with "--with-jpeg=internal". The built-in JPEG functions are version 62 and the libjpeg-turbo are version 80 and apparently they are completely incompatible.

I am not certain of the best way to fix the problem. What I did on my machine was modify the Portfile to use

--with-libtiff=/opt/local
--with-jpeg=/opt/local

That would appear safe for builds with variant proj8. Probably better to use ${prefix}. Those settings could be changed just for that variant. One would need to check the other variants to see if the same solution applies. Another solution might be to explicitly make port tiff and jpeg-turbo dependents of gdal and then change the above settings for all builds. Maybe there is another solution I haven't thought of?

Attachments (3)

libgdal_libs.txt (12.5 KB) - added by bal-agates 2 years ago.
libgdal library references (recursive) for 3.3.1_2 on macOS12 arm64
Portfile.patch (1.4 KB) - added by bal-agates 2 years ago.
gdal Portfile proposed revision 3 changes
Portfile.patch.2 (1.5 KB) - added by bal-agates 2 years ago.
gdal Portfile patch that changes --with-libtiff, --with-geotiff, --with-jpeg

Download all attachments as: .zip

Change History (11)

comment:1 in reply to:  description Changed 2 years ago by ryandesign (Ryan Carsten Schmidt)

Keywords: gdal GeoTIFF JPEG compression removed
Owner: set to Veence
Status: newassigned

Replying to bal-agates:

Another solution might be to explicitly make port tiff and jpeg-turbo dependents of gdal and then change the above settings for all builds.

While I don't know if there is some reason why the port currently uses bundled libtiff and jpeg, your suggestion seems ideal.

The Portfile switched from using MacPorts libtiff, geotiff and jpeg to using the internal ones in [81f2b8da05f2fc855bb44c69e9a8e483790f4a15/macports-ports] but the commit message does not say why.

Changed 2 years ago by bal-agates

Attachment: libgdal_libs.txt added

libgdal library references (recursive) for 3.3.1_2 on macOS12 arm64

comment:2 Changed 2 years ago by bal-agates

That commit changed (3) related settings

-                 --with-libtiff=${prefix} \
-                 --with-geotiff=${prefix} \
-                 --with-jpeg=${prefix}\
+                 --with-libtiff=internal \
+                 --with-geotiff=internal \
+                 --with-jpeg=internal\

I only changed (2) of them which fixed the problem I had encountered. I made my change based on analyzing the library references in libgdal (see attached libgdal_libs.txt). In libgdal I did not see any references to libgeotiff. Looking at it now it looks like ports PDAL and liblas are dependent on both libgeotiff and gdal so it probably makes sense to change all (3). I am not sure if the gdal built-in geotiff has incompatibilities with libgeotiff but the potential exists.

Note that high-level projects grass7 and qgis are at least indirectly dependent on both gdal an libgeotiff.

comment:3 Changed 2 years ago by bal-agates

I tried changing my Portfile to something more similar to that before commit 81f2b (i.e. using --with-geotiff=${prefix}) but started getting a runtime link problem

dyld[40747]: Symbol not found: _GTIFFree
  Referenced from: /opt/local/lib/libpdalcpp.13.0.0.dylib
  Expected in: /opt/local/lib/libgdal.29.dylib
zsh: abort      QGIS3

I am not certain if I made some error in rebuilding gdal dependents. I did rebuild port PDAL after gdal. I haven't studied the PDAL Portfile or its sources to understand if it is depending on gdal's --with-geotiff=internal.

By changing only (2) of the (3) GDAL configure settings my QGIS and GDAL seem to be working, at least as far as I have tested. I will attach Portfile.patch representing what seems to be working for me.

Changed 2 years ago by bal-agates

Attachment: Portfile.patch added

gdal Portfile proposed revision 3 changes

comment:4 Changed 2 years ago by bal-agates

I was able to resolve my "symbol not found" dyld problem using a Portfile that changes all (3) [--with-libtiff, --with-geotiff, --with-jpeg]. After applying Portfile.patch.2 (attached) to my gdal Portfile I uninstalled all dependent ports and rebuilt them. On my system that was:

$ sudo port uninstall qgis3
$ sudo port uninstall grass7
$ sudo port uninstall PDAL
$ sudo port uninstall liblas
$ sudo port uninstall py39-gdal
$ sudo port uninstall gdal
$ sudo port -s -k install gdal
$ sudo port -s -k install py39-gdal
$ sudo port -s -k install liblas
$ sudo port -s -k install PDAL
$ sudo port -s -k install grass7
$ sudo port -s -k install qgis3

I used the "-s" switch to build for source. I think by default gdal was installing from a pre-built dist and I wanted to make sure the source cmake/configure was run on each. The "-k" switch really wasn't needed but I wanted to save the work data so if I ran into problems I could see what configure found.

My theory as to my last dyld problem was that I had rebuilt gdal and PDAL but that qgis3 was built before making changes to gdal. The prior configure for qgis3 found some functions in the original gdal that were no longer there.

With a clean slate MacPorts should have no problems. I am uncertain if there are edge cases if the gdal Portfile is updated when a user already has gdal dependents built? Does one need to bump the revision level on all gdal dependent ports to make sure they are rebuilt when the gdal portfile is updated or is there some other mechanism to make sure that happens?

Changed 2 years ago by bal-agates

Attachment: Portfile.patch.2 added

gdal Portfile patch that changes --with-libtiff, --with-geotiff, --with-jpeg

comment:5 Changed 2 years ago by cooljeanius (Eric Gallager)

Cc: cooljeanius added

comment:6 Changed 18 months ago by bal-agates

I recently upgraded all of my ports and this issue is still valid. A better issue title would be "gdal generates runtime error with GeoTIFF that uses JPEG compression". For example

$ gdal_translate AZ_Sahuarita_314990_1958_62500_geo.tif -of GTiff j.tif
ERROR 1: JPEGLib:Wrong JPEG library version: library is 62, caller expects 80
Assertion failed: (sp->cinfo.comm.is_decompressor), function JPEGSetupDecode, file tif_jpeg.c, line 962.
zsh: abort

In the above JPEGLIb version 62 comes from the gdal built-in and version 80 comes from libjpeg (port libjpeg-turbo). The gdal portfile has "-with-jpeg=internal". Possibly cmake is finding and using the JPEGLib version from the libjpeg header file when building applications like gdal_translate? Or multiple objects with same name get randomly selected during linking? Note when gdal variant +proj8 is built one dependency chain is gdal -> proj8 -> tiff -> libjpeg-turbo so likely multiple JPEG library objects with the same name when building applications. QGIS3 includes other libraries that depend on libjpeg-turbo.

This problem is still causing the latest qgis3 @ 3.28.0 to ungracefully crash when importing a GeoTIFF with JPEG compression. That crash is probably just a result of not handling the gdal error well.

I tried my old fix of modifying the portfile with the changes in <Portfile.patch.2>. This seemed to fix gdal_translate, however, QGIS3 failed to build. I do not understand the new QGIS3 build failure.

ld: warning: -undefined dynamic_lookup may not work with chained fixups
make[2]: Leaving directory `/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_gis_qgis3/qgis3/work/build'
[ 69%] Built target python_module_qgis__core
make[1]: Leaving directory `/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_gis_qgis3/qgis3/work/build'
make: *** [all] Error 2

I haven't been able to find much on "-undefined dynamic_lookup may not work with chained fixups" and it is curious that a warning causes the build to stop. I believe this is related to recent changes in the Xcode 14 compiler. I was able to work through that by adding the linker switch "-Wl,-no_fixup_chains" but then ran into crash on startup. Not sure how to debug further.

My current build environment

macOS 12.6 (Monterey)
$ clang++ --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin21.6.0
Thread model: posix

comment:7 Changed 14 months ago by justinbb

This is still a showstopper, still a very bad bug. The worst is that in the current state of things, it looks okay at build time and both GDAL and dependent apps such as QGIS seem to work just fine. Then you display a JPEG-compressed GeoTIFF and it breaks. QGIS simply crashes, because the error return from GDAL is "handled" by failing an assert. (groan.)

It is a problem all over the GDAL world, with equivalent bugs posted in other package managers. The GDAL people should really rip out the embedded, ancient libjpeg. But in the meantime (a very long meantime, this has been going on for years) it's necessary to fix the portfile to allow reliably generating a functional library!

In my configuration, simply doing port install qgis3 causes libjpeg-turbo to be installed as a dependency of a dependency somewhere. The GDAL library is mangled because it pulls in jpeglib.h from /opt/include, which is version 80, but it includes the code for the embedded version 62. Thus the inconsistency is compiled into libgdal !!

Quick-and-dirty workaround was to do port disable libjpeg-turbo, then port install gdal, then port enable libjpeg-turbo again.

comment:8 Changed 5 months ago by Schamschula (Marius Schamschula)

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.