Opened 4 years ago

Closed 3 years ago

#52420 closed defect (invalid)

neomutt: Umlauts, accents are wrongly displayed

Reported by: mimaoffice (Michele Marcionelli) Owned by: lbschenkel (Leonardo Brondani Schenkel)
Priority: Normal Milestone:
Component: ports Version: 2.3.4
Keywords: Cc:
Port: neomutt

Description

I updated mutt-devel to neomutt. Now the umlauts (ö,ü,...) and accents (è,á,...) are not displayed correctly anymore. For instance I get a '?' or '\123'.

A fix would be very appreciated ;-)

Bests, Michele

Change History (12)

comment:1 Changed 4 years ago by ryandesign (Ryan Schmidt)

Owner: changed from macports-tickets@… to leonardo@…
Summary: Umlauts, accents are wrongly displayed in neomuttneomutt: Umlauts, accents are wrongly displayed

comment:2 Changed 4 years ago by lbschenkel (Leonardo Brondani Schenkel)

That's very odd. I use all sorts of accented characters as well and I never experienced any issue.

I'll need some more details:

  • Does it happen with all e-mails or just some e-mails?
  • Does it happen in the UI as well? Try typing a mutt command with accents.
  • Which terminal do you use? The built-in Terminal? iTerm? Something else?
  • Could you please run the command locale and post the output?
  • Could you please run the command mutt -v and post the output?

comment:3 Changed 4 years ago by neverpanic (Clemens Lang)

Try unset LC_CTYPE (and possibly also unset LC_ALL) before starting neomutt.

It seems that a number of Terminals have a feature to helpfully automatically set the locale variables, but end up setting them to an unsupported value. Not setting them does not cause problems, but setting them to unsupported values renders special characters broken. I bisected this to https://github.com/neomutt/neomutt/commit/f28b4cace4a1f26eace3120ae08e0b943557139f, which comes from mutt upstream, so is not a neomutt problem.

In iTerm, unticking "Preferences > Profiles > $yourprofile > Terminal > Environment > Set locale variables automatically" and opening a new Shell session fixed this for me. Terminal.app has a similar setting.

comment:4 Changed 4 years ago by mimaoffice (Michele Marcionelli)

Just for your information: the old mutt-devel (Mutt 1.6.0 (2016-04-01)) was working well with the default settings, i.e. with the "set locale variables automatically" setting for Terminal (or iTerm) which correspond to following locale:

LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Now with the new neomutt (NeoMutt 20160916 (1.7.0) - just installed today) I get following results:

1) with default locale (see above):

  • in the mail listing I read the subject "Z??rich" instead of "Zürich"
  • while reading the email I see in the header "Subject: Z??rich" and in the body "Z\303\274rich"

2) with unset LC_CTYPE I get this output of locale

LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

and I see:

  • in the mail listing I read the subject "Zrich" instead of "Zürich"
  • while reading the email both header and body is OK

3) with export LC_ALL=en_US.UTF-8 I get this output of locale

LANG=
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

and everything seems to work fine; also export LC_ALL=en_US oder export LC_ALL=de_CH work find.

What is the difference with (1)? Why (1) don't work correctly with neomutt?

I also noticed that not setting the LC_ALL but all other LC_* event to different values mutt will display umlauts and accents correctyle; but if I just unset one of the LC_* mutt will fail:

LANG=
LC_COLLATE="de_AT"
LC_CTYPE="de_DE"
LC_MESSAGES="en_US"
LC_MONETARY="de_CH"
LC_NUMERIC="fr_FR"
LC_TIME="en_GB"
LC_ALL=

==> is ok for mutt

LANG=
LC_COLLATE="de_AT"
LC_CTYPE="de_DE"
LC_MESSAGES="en_US"
LC_MONETARY="C"
LC_NUMERIC="fr_FR"
LC_TIME="en_GB"
LC_ALL=

==> is no more ok for mutt

HTH ;-)

Bests, Michele

comment:5 Changed 4 years ago by neverpanic (Clemens Lang)

Yes, mutt-devel is expected to not have this problem, because it doesn't have the commit that changes this. Instead of updating mutt-devel to 1.7.0 (which has the same problem) we switched to neomutt based on mutt 1.7.0.

A "C" locale not supporting special characters is not surprising to me. I'm not sure how mixing a single "C" locale into others interacts.

I do not really know why some combinations don't work. The mosh remote shell client, for example, aborts on connection with LC_CTYPE=UTF-8 telling me that this is an invalid locale (even though /usr/share/locale/UTF-8/LC_CTYPE exists). We should probably report this to the developers of mutt, but the answer may well be that there's a problem with our locale configuration rather than with their code.

comment:6 Changed 4 years ago by lbschenkel (Leonardo Brondani Schenkel)

This is what I suspected, but I just wanted to confirm before blaming your set-up.

Unfortunately the LC_CTYPE="UTF-8" that Terminal.app sets by default is not a valid recognized locale by a lot of software (the locale is supposed to be something like "de_CH.UTF-8"). The macOS userland can cope with it (and many BSD as well), but if you ever ssh to a Linux box for example you'll instantly get this warning: setlocale: LC_CTYPE: cannot change locale (UTF-8) because the value is rejected by the GNU userland.

Mutt 1.7 is now being affected by this. Before 1.7 it didn't pay attention to your environment's locale (only to the locale you set in the configuration file), but that setting was removed and it is now determing the locale and the charset based on the environment variables, like virtually all other software does. Unfortunately it got bitten by the broken macOS defaults.

This is so annoying in general that on my shell initialization scripts I have the following:

if [ "${LC_CTYPE}" == "UTF-8" ]; then
  # OS X likes to set LC_CTYPE="UTF-8". In general this is supported by BSD
  # systems, but not Linux (which will become a problem when ssh'ing to
  # foreign systems). Check if it is possible to unset LC_CTYPE without
  # changing the charset, and unset it if that is true.
  if [ "$(LC_CTYPE= locale charmap)" == "UTF-8" ]; then
    unset -v LC_CTYPE
  fi
fi

If you change your shell initialization scripts to contain export LANG=de_CH.UTF-8 (or whatever is your locale of choice) *before* the snippet above, then LC_CTYPE will be automatically unset and you should not have any more of these issues.

Note that as "cal" have correctly said, the change of behavior that you have experienced was introduced vanilla Mutt, not NeoMutt neither this port. All I can do in this situation is to suggest you to change your settings.

comment:7 in reply to:  6 ; Changed 4 years ago by larryv (Lawrence Velázquez)

Replying to leonardo.schenkel@…:

Unfortunately the LC_CTYPE="UTF-8" that Terminal.app sets by default is not a valid recognized locale by a lot of software (the locale is supposed to be something like "de_CH.UTF-8").

FWIW, my Terminal.app automatically uses a correct locale. My shell startup settings do not touch LANG.

LANG=en_US.UTF-8

I don’t know why it would use just “UTF-8” for anything. Perhaps this happens when the system Language & Region settings are customized? I use the stock “United States” settings.

comment:8 Changed 4 years ago by lbschenkel (Leonardo Brondani Schenkel)

I don't touch LANG as well. In all Macs I have ever owned, in all OS X versions I can remember, Terminal.app sets the locale like this:

LANG=<locale from regional settings>.UTF-8
LC_CTYPE=UTF-8

I never saw plain "C" for LANG, though, which is curious.

Why it uses "UTF-8" for LC_CTYPE and not a well-formed locale name is a mystery to me, especially since it can do that just well for LANG — and since LANG is already defined as a UTF-8 locale, there's no need to set LC_CTYPE at all.

I'm not an expert, but my understanding is that the relevant specs say that all locales must be in the form of lang_country_variant.charset with notable exceptions ("C", "POSIX"). Even though "UTF-8" exists as a locale in the Mac, it's not supposed to be a valid locale name. I even tried to create such a locale via locale-gen on Linux systems and it refuses to do so.

comment:9 in reply to:  7 Changed 4 years ago by mimaoffice (Michele Marcionelli)

Replying to larryv@…:

Replying to leonardo.schenkel@…: I don’t know why it would use just “UTF-8” for anything. Perhaps this happens when the system Language & Region settings are customized? I use the stock “United States” settings.

That was the problem. I'm using as Language "English" and Region "Switzerland" and this generate the wrong default locale mentioned above. Setting as Language "Italian (Switzerland)" or "German (...)" or "French (...)" then everything works fine.

This is definitely not a *mutt problem... rather an OS X one.

Thank you guys helping me with this problem.

Bests, Michele

comment:10 in reply to:  8 Changed 4 years ago by larryv (Lawrence Velázquez)

Replying to leonardo.schenkel@…:

I don't touch LANG as well. In all Macs I have ever owned, in all OS X versions I can remember, Terminal.app sets the locale like this:

LANG=<locale from regional settings>.UTF-8
LC_CTYPE=UTF-8

I have seen the opposite: I have never had LC_CTYPE set automatically. My environment has always contained LANG alone.

If I had to, I would guess that the system doesn’t try to guess a country if you tweak the system settings in a way that doesn’t explicitly denote one. But I have no idea. UTF-8 is obviously just a character encoding without any locale information at all.

comment:11 Changed 4 years ago by larryv (Lawrence Velázquez)

FYI, more information on the Mac locale fun: https://bugs.python.org/issue18378#msg215215

comment:12 Changed 3 years ago by lbschenkel (Leonardo Brondani Schenkel)

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.