Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#43145 closed defect (fixed)

Perl PortGroup fails to reinplace non-ascii Makefiles

Reported by: mojca (Mojca Miklavec) Owned by: macports-tickets@…
Priority: Normal Milestone:
Component: ports Version: 2.2.1
Keywords: Cc: danielluke (Daniel J. Luke), neverpanic (Clemens Lang), anddam (Andrea D'Amore), cooljeanius (Eric Gallager), jeremyhu (Jeremy Huddleston Sequoia)
Port: p5-test-base p5-yaml

Description (last modified by anddam (Andrea D'Amore))

When building p5-yaml on 10.8 and 10.9, sed fails with

DEBUG: Executing reinplace: /usr/bin/sed {/^CCFLAGS *=/s/$/ /} < /opt/local/var/macports/build/_opt_mports_dports_perl_p5-yaml/p5.16-yaml/work/YAML-0.90/Makefile >@ file14
DEBUG: sed: RE error: illegal byte sequence

(see https://build.macports.org/builders/buildports-mavericks-x86_64/builds/2454) because the Makefile contains a non-ascii character (in Latin 1 encoding, invalid UTF).

This comes from the perl PortGroup:

# CCFLAGS can be passed in to "configure" but it's not necessarily inherited.
# LDFLAGS can't be passed in (or if it can, it's not easy to figure out how).
post-configure {
    fs-traverse file ${configure.dir} {
        if {[file isfile ${file}] && [file tail ${file}] eq "Makefile"} {
            ui_info "Fixing flags in [string map "${configure.dir}/ {}" ${file}]"
            reinplace "/^CCFLAGS *=/s/$/ [get_canonical_archflags cc]/" ${file}
            reinplace "/^OTHERLDFLAGS *=/s/$/ [get_canonical_archflags ld]/" ${file}
        }
    }
}

Suggestions from IRC:

neverpanic:

set LC_ALL to C I think that's a common issue with ill-formatted UTF-8 files /usr/bin/sed changed on 10.8 and above You can either patch that to be valid UTF-8 or set configure.env LC_ALL=C in the Portfile It is kind of weird to use the locale settings for that. I mean you could set the correct locale for the file, but then any messages to your terminal would be broken…

Dar1us:

it refuses to process files which aren't validly encoded given the current LC settings .. and by default that is UTF-8

Change History (18)

comment:1 Changed 10 years ago by cooljeanius (Eric Gallager)

Cc: egall@… added

Cc Me!

comment:2 Changed 10 years ago by mojca (Mojca Miklavec)

Port: p5-test-base added

comment:3 Changed 10 years ago by danielluke (Daniel J. Luke)

see also: https://lists.macosforge.org/pipermail/macports-dev/2012-August/019993.html

(yes the perl portgroup should be updated)

Last edited 10 years ago by danielluke (Daniel J. Luke) (previous) (diff)

comment:4 Changed 10 years ago by anddam (Andrea D'Amore)

Description: modified (diff)

The portgroup cannot possibly know what encoding will the produced Makefile use.

Port p5-yaml is using ExtUtils::MakeMaker that in turn relies on Pod::Man. Pod::Man can produce utf8 Makefiles, search utf8 in in this page but seems that MakeMaker doesn't have any option to specify that, see this request.

Since the portgroup is patching Makefiles we could sanitize the file by adding a new variable specifying a charset to convert from. In this case the portfile writer would specify something like

perl5.makefile_encoding    iso8859-1
perl5.setup         YAML 0.90

and the post-configure phase would take care of convert the file to UTF-8 (or any other target encoding depending on the darwin release) before reinplaces.

comment:5 Changed 10 years ago by danielluke (Daniel J. Luke)

... or we could just set LANG=C and things will work just like they used to.

comment:6 Changed 10 years ago by jmroot (Joshua Root)

Cc: jeremyhu@… added

The nicest solution would be to iconv the Makefile.PL to UTF-8 first so any info that ends up being installed doesn’t have a weird encoding. But in any case, this is the exact reason why reinplace has a -locale flag.

comment:7 Changed 10 years ago by mojca (Mojca Miklavec)

Just to explain where the weird encoding comes from.

The original file dist.ini contains UTF-8 encoded

author = Ingy döt Net

Then this apparently gets translated into

"AUTHOR" => "Ingy d\x{f6}t Net",

in Makefile.PL. During the configure phase, the following command is called (more or less):

perl5.18 Makefile.PL

which results in

# --- MakeMaker ppd section:
# Creates a PPD (Perl Package Description) for a binary distribution.
ppd :
	$(NOECHO) $(ECHO) '<SOFTPKG NAME="$(DISTNAME)" VERSION="$(VERSION)">' > $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '    <ABSTRACT>YAML Ain'\''t Markup Language (tm)</ABSTRACT>' >> $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '    <AUTHOR>Ingy döt Net</AUTHOR>' >> $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '    <IMPLEMENTATION>' >> $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '        <ARCHITECTURE NAME="darwin-thread-multi-2level-5.18" />' >> $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '        <CODEBASE HREF="" />' >> $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '    </IMPLEMENTATION>' >> $(DISTNAME).ppd
	$(NOECHO) $(ECHO) '</SOFTPKG>' >> $(DISTNAME).ppd

in Latin 1 encoding.

Maybe a proper fix in ExtUtils::MakeMaker (to allow UTF-8 output) would be better (other than asking the upstream to remove non-ascii from the Makefile.PL).

Last edited 10 years ago by mojca (Mojca Miklavec) (previous) (diff)

comment:8 Changed 10 years ago by jmroot (Joshua Root)

If the output is always Latin 1, that at least makes it easy to pass the right locale to reinplace.

comment:9 in reply to:  8 Changed 10 years ago by anddam (Andrea D'Amore)

Replying to jmr@…:

The nicest solution would be to iconv the Makefile.PL to UTF-8 first so any info that ends up being installed doesn’t have a weird encoding.

I'm not sure that this would help since Pod::Man needs a specific flag to enable utf8 and seems that ExtUtils::MakeMaker isn't passing it when calling Pod::Man. Also you'd need to specify the =encoding in the temporary produced POD file.

utf8
By default, Pod::Man produces the most conservative possible *roff output to try to ensure that it will work with as many different *roff implementations as possible.
[…]
Be aware that, when using this option, the input encoding of your POD source must be properly declared unless it is US-ASCII or Latin-1. POD input without an =encoding command will be assumed to be in Latin-1, and if it's actually in UTF-8, the output will be double-encoded.

But in any case, this is the exact reason why reinplace has a -locale flag.

Correct, so we could specify a perl5.reinplace_locale variable and pass that.

Since the post-configure is traversing the filesystem checking for all "Makefile" named files and the encoding isn't necessarily the same for all files I think that this reinplace_locale should be a structure of {{file_name file_locale} …} couples that should be parsed per each file.

Replying to jmr@…:

If the output is always Latin 1, that at least makes it easy to pass the right locale to reinplace.

The output could be plain ASCII (but I guess that wouldn't be a problem), but not all modules use ExtUtiles::MakeMaker, I've seen at least on Makefile.PL directly generating the Makefile.

comment:10 Changed 10 years ago by mojca (Mojca Miklavec)

We are neither recoding files nor changing any non-ascii portions of the text. So I would say that encoding is pretty much irrelevant even if it's not Latin 1, as long as it's kept intact. The "C" local should do just as well.

comment:11 Changed 10 years ago by mojca (Mojca Miklavec)

See also #44630.

At least the following ports are affected (but there might be more):

  • p5-io-all
  • p5-spiffy
  • p5-test-base
  • p5-yaml
  • p5-yaml-libyaml

comment:12 in reply to:  10 Changed 10 years ago by danielluke (Daniel J. Luke)

Replying to mojca@…:

We are neither recoding files nor changing any non-ascii portions of the text. So I would say that encoding is pretty much irrelevant even if it's not Latin 1, as long as it's kept intact. The "C" local should do just as well.

+1 for adding -locale C to the reinplace calls in the perl5 portgroup

comment:13 Changed 10 years ago by danielluke (Daniel J. Luke)

Resolution: fixed
Status: newclosed

comment:14 Changed 10 years ago by mojca (Mojca Miklavec)

Please also remove the patch in the other four ports then.

comment:15 in reply to:  14 Changed 10 years ago by danielluke (Daniel J. Luke)

Replying to mojca@…:

Please also remove the patch in the other four ports then.

done, although since the perl5 portgroup change didn't break them, I don't believe I really had an obligation to correct them.

comment:16 Changed 10 years ago by mojca (Mojca Miklavec)

Thank you. I'm just cross-referencing the numbers (r123726, r123727, r123728, r123729). Sure, the ports weren't broken, but we would probably forget to remove the patch(es) later and could keep them forever without any good reason.

comment:17 in reply to:  16 Changed 10 years ago by danielluke (Daniel J. Luke)

Replying to mojca@…:

Sure, the ports weren't broken, but we would probably forget to remove the patch(es) later and could keep them forever without any good reason.

historically, it has been up to the individual port maintainers to handle non-breaking modifications to base (or portgroups), not the person/persons making changes to base or portgroups.

comment:18 Changed 10 years ago by mojca (Mojca Miklavec)

(You can probably also move the checksums out of the brackets in p5-test-base.)

Note: See TracTickets for help on using tickets.