Opened 17 years ago

Closed 17 years ago

#11978 closed defect (fixed)

portindex doesn't encode non-ASCII characters correctly in ISO-8859-1 locales

Reported by: vinc17@… Owned by: kballard (Lily Ballard)
Priority: Normal Milestone:
Component: ports Version: 1.4.40
Keywords: Cc: vinc17@…
Port:

Description

In ISO-8859-1 locales, portindex does a UTF-8 encoding twice:

$ grep xmlittre PortIndex
stardict-xmlittre 329
portdir textproc/stardict-xmlittre variants universal description {XMLittré dictionary for stardict} name stardict-xmlittre version 2.4.2 categories textproc homepage http://francois.gannaz.free.fr/Littre/accueil.php revision 0 epoch 0 maintainers vincent-opdarw@vinc17.org long_description {XMLittré dictionary for stardict.}

instead of

stardict-xmlittre 327
portdir textproc/stardict-xmlittre variants universal description {XMLittré dictionary for stardict} name stardict-xmlittre version 2.4.2 categories textproc homepage http://francois.gannaz.free.fr/Littre/accueil.php revision 0 epoch 0 maintainers vincent-opdarw@vinc17.org long_description {XMLittré dictionary for stardict.}

Change History (6)

comment:1 Changed 17 years ago by kballard (Lily Ballard)

It sounds like it's reading the UTF-8-encoded Portfile as ISO-8859-1, and then UTF-8-encoding that for output. That's a bit odd, as I tried testing exactly that case and it seemed to autodetect UTF-8, but I guess it's not working right.

comment:2 Changed 17 years ago by kballard (Lily Ballard)

Owner: changed from macports-dev@… to eridius@…
Status: newassigned

comment:3 Changed 17 years ago by kballard (Lily Ballard)

I set my locale to ISO-8859-1 and re-ran PortIndex, and the entry for xmlittre is still UTF-8-encoded. Can you give me instructions to reproduce your results?

comment:4 Changed 17 years ago by vinc17@…

The locale command outputs:

LANG="POSIX"
LC_COLLATE="POSIX"
LC_CTYPE="en_US.ISO8859-1"
LC_MESSAGES="POSIX"
LC_MONETARY="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_ALL="POSIX/en_US.ISO8859-1/POSIX/POSIX/POSIX/POSIX"

(LANG and LC_COLLATE are set to POSIX, LC_CTYPE is set to en_US.ISO8859-1, and the other variables are not set.)

prunille:~/software/dports> grep xmlittr PortIndex | hexdump -C | tail -5
00000130  64 65 73 63 72 69 70 74  69 6f 6e 20 7b 58 4d 4c  |description {XML|
00000140  69 74 74 72 c3 83 c2 a9  20 64 69 63 74 69 6f 6e  |ittrÃ.© diction|
00000150  61 72 79 20 66 6f 72 20  73 74 61 72 64 69 63 74  |ary for stardict|
00000160  2e 7d 0a                                          |.}.|
00000163

comment:5 Changed 17 years ago by kballard (Lily Ballard)

Ah hah. I guess I wasn't setting it to ISO-8859-1 correctly. I can reproduce this issue now. Seems like the solution is to replace source with a combination of open/fconfigure/read/close.

comment:6 Changed 17 years ago by kballard (Lily Ballard)

Resolution: fixed
Status: assignedclosed

Ok, I killed the silly [fconfigure $fd -encoding utf-8] calls I had in there and now I do an [encoding system utf-8] in dportinit. This will cause all file access to default to utf-8 (but stdin and stdout keep their original values, so it should display non-ASCII text just fine).

Committed in r25975.

Note: See TracTickets for help on using tickets.