New Ticket     Wiki     Browse Source     Timeline     Roadmap     Ticket Reports     Search

Ticket #18736 (closed enhancement: fixed)

Opened 3 years ago

Last modified 22 months ago

distname is not percent-encoded before gluing it into the URL

Reported by: ryandesign@… Owned by: macports-tickets@…
Priority: Normal Milestone: MacPorts 1.9.0
Component: base Version: 1.7.0
Keywords: Cc: snc@…, blb@…, raimue@…
Port:

Description

MacPorts base does not percent-encode the distname before putting it in the final URL. This causes distnames containing spaces to be completely unavailable (causes a Tcl error), causes distnames containing a percent character to probably not download at all (probably not a common need), and causes distnames containing a plus character to work sometimes but not others, at the whim of the web server (since a plus should really be encoded as %2b the web server is completely within its right to not know what we're asking for).

I first noted this in #18733 in response to r47662.

Attachments

urlencode.diff Download (1.6 KB) - added by ryandesign@… 3 years ago.
patch-macports-curl-escape.diff Download (2.0 KB) - added by raimue@… 3 years ago.

Change History

  Changed 3 years ago by ryandesign@…

I searched high and low to see how to do percent-encoding in Tcl. I found a function http::mapReply which does it. It's part of the http module included with Tcl. It seems like I should just be able to do something like this:

package require http
return [http::mapReply ${distname}]

Unfortunately while I can get this to work with MacPorts tcl, I can't get it to work with the OS-provided tcl; it can't find the http::mapReply procedure. And even if we find out why that is and how to fix it, it won't work because http::mapReply  was seriously broken until tcl 8.4.12; Mac OS X 10.4.11 has tcl 8.4.7 so it would not include the fix.

The http module appears to have originally been written in tcl (http.tcl) but is now  apparently written in C. I copied the tcl code from http.tcl and modified it as suggested in  the bug report so that it is not broken. I'll attach the patch I came up with.

Changed 3 years ago by ryandesign@…

follow-up: ↓ 3   Changed 3 years ago by raimue@…

  • cc raimue@… added

I also can't find mapReply in the  Tcl docs. So this seems to have been an internal proc.

Now you already made the effort of writing this in Tcl, but as we are already using and linking with libcurl, we could use  curl_easy_escape() for this purpose.

in reply to: ↑ 2 ; follow-up: ↓ 11   Changed 3 years ago by ryandesign@…

Replying to raimue@…:

Now you already made the effort of writing this in Tcl, but as we are already using and linking with libcurl, we could use  curl_easy_escape() for this purpose.

That would be fine with me, except if it does what the docs say: "All input characters that are not a-z, A-Z or 0-9 are converted to their 'URL escaped' version" -- if that's what it really does then that's broken just like tcl < 8.4.12 was broken: it needs to convert all characters that are not a-z, A-Z, 0-9, -, ., _ or ~.

follow-up: ↓ 5   Changed 3 years ago by raimue@…

According to the man page, libcurl implements  RFC2396, Tcl goes with  RFC3986. It is not really broken, just outdated. Which is not a serious issue, as decoders are advised to decode these characters normally.

There is also a simple approach using the public http API, which is so simple that we could drop the wrapper as well. Although calling ::http::formatQuery with an odd number of arguments seems not to be documented (could also call with an additional empty string and strip the last char).

package require http

proc urlencode {str} {
    return [::http::formatQuery $str]
}

But if we are going to duplicate this, we should probably copy from the  latest version which is not using regsub for performance reasons. It is not implemented in C as stated on the Tclers wiki.

in reply to: ↑ 4 ; follow-ups: ↓ 6 ↓ 10   Changed 3 years ago by ryandesign@…

Replying to raimue@…:

According to the man page, libcurl implements  RFC2396, Tcl goes with  RFC3986. It is not really broken, just outdated. Which is not a serious issue, as decoders are advised to decode these characters normally.

It seems silly if you want to download foo-1.0.tar.gz to request foo%2d1%2e0%2etar%2egz but I agree it should still work.

There is also a simple approach using the public http API, which is so simple that we could drop the wrapper as well.

That would be fine if it works. I had not tried [::http::something] I had tried [http::something] which was how the example I saw used it. I'm not so familiar with tcl syntax when it comes to packages.

Although calling ::http::formatQuery with an odd number of arguments seems not to be documented (could also call with an additional empty string and strip the last char).

formatQuery is for formatting a query string (e.g. "?a=b&c=d"), but that's not what we're doing, so I don't feel we should be using that function. It seems mapReply is the correct function to use.

But if we are going to duplicate this, we should probably copy from the  latest version which is not using regsub for performance reasons. It is not implemented in C as stated on the Tclers wiki.

Ok.

in reply to: ↑ 5 ; follow-up: ↓ 7   Changed 3 years ago by raimue@…

FYI ::http::* starts from the top level namespace, http::* looks for a http namespace in the current namespace first.

Replying to ryandesign@…:

formatQuery is for formatting a query string (e.g. "?a=b&c=d"), but that's not what we're doing, so I don't feel we should be using that function. It seems mapReply is the correct function to use.

http::mapReply is private API (at least in 8.4 and 8.5, seems to have changed for 8.6), we should not use it. But we cannot even use it before we call another http::* function, as it is not listed in pkgIndex.tcl. http::formatQuery is a wrapper around http::mapReply which is public. Actually we want to encode a string to be used as value in a query string, so it is not absolutely wrong.

in reply to: ↑ 6   Changed 3 years ago by ryandesign@…

Replying to raimue@…:

http::mapReply is private API (at least in 8.4 and 8.5, seems to have changed for 8.6), we should not use it. But we cannot even use it before we call another http::* function, as it is not listed in pkgIndex.tcl.

That's a shame.

http::formatQuery is a wrapper around http::mapReply which is public. Actually we want to encode a string to be used as value in a query string, so it is not absolutely wrong.

The  query string is the part of a URL after the question mark. We are constructing a distfile URL which does not contain a question mark, meaning it does not have a query string component.

So what about using curl_easy_escape() as you suggested?

Changed 3 years ago by raimue@…

  Changed 3 years ago by raimue@…

patch-macports-curl-escape.diff is a patch to add a curl escape command, using curl_easy_escape() internally.

$ rlwrap /usr/bin/tclsh
% source /Library/Tcl/macports1.0/macports_fastload.tcl
0
% package require Pextlib
1.0
% curl escape foo~bar+baz!
foo%7Ebar%2Bbaz%21

  Changed 3 years ago by ryandesign@…

Thanks.

When trying to compile MacPorts with this change, I got:

ld: Undefined symbols:
_curl_easy_escape

I found that on this Intel Tiger Mac, I somehow had both library version 3 and 4 of curl, with all the symlinks pointing to version 3, and only version 4 has the curl_easy_escape function:

$ ls -l /usr/lib/libcurl*
lrwxr-xr-x   1 root  wheel      15 Sep  5  2008 /usr/lib/libcurl.2.dylib -> libcurl.3.dylib
lrwxr-xr-x   1 root  wheel      15 Sep  5  2008 /usr/lib/libcurl.3.0.0.dylib -> libcurl.3.dylib
-rwxr-xr-x   1 root  wheel  384692 Oct 24 02:02 /usr/lib/libcurl.3.dylib
-rwxr-xr-x   1 root  wheel  460136 Nov  6 15:53 /usr/lib/libcurl.4.dylib
lrwxr-xr-x   1 root  wheel      15 Sep  5  2008 /usr/lib/libcurl.dylib -> libcurl.3.dylib
$ grep curl_easy_escape /usr/lib/libcurl*
Binary file /usr/lib/libcurl.4.dylib matches
$

I checked two other Tiger Macs (one Intel and one PowerPC) and they only have library version 4, and everything else is just symlinks to it:

$ ls -l libcurl*
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.2.dylib -> libcurl.4.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 16  2007 libcurl.3.0.0.dylib -> libcurl.3.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.3.dylib -> libcurl.4.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.4.0.0.dylib -> libcurl.4.dylib
-rwxr-xr-x  1 root  wheel  460136 Nov  6 15:53 libcurl.4.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.dylib -> libcurl.4.dylib
$

And I checked two Leopard Macs (one Intel, one PowerPC) which also look the same (except that the libcurl.3.0.0.dylib symlink is missing).

Once I fixed the symlinks on the first Mac I could build it.

in reply to: ↑ 5   Changed 3 years ago by ryandesign@…

Replying to ryandesign@…:

It seems silly if you want to download foo-1.0.tar.gz to request foo%2d1%2e0%2etar%2egz but I agree it should still work.

I've had this patch in place for awhile locally, and it seems to work with most servers, but  not with downloads.sourceforge.net.

in reply to: ↑ 3   Changed 3 years ago by snc@…

  • cc snc@… added

Replying to ryandesign@…:

That would be fine with me, except if it does what the docs say: "All input characters that are not a-z, A-Z or 0-9 are converted to their 'URL escaped' version" -- if that's what it really does then that's broken just like tcl < 8.4.12 was broken: it needs to convert all characters that are not a-z, A-Z, 0-9, -, ., _ or ~.

We can probably summarize that as "Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits." (from  PHP).

  Changed 3 years ago by jmr@…

  • milestone changed from MacPorts 1.8.0 to MacPorts 1.8.1

Pushing this back a little since it's not a regression and will apparently be nontrivial to fix.

  Changed 2 years ago by wsiegrist@…

  • milestone changed from MacPorts 1.8.1 to MacPorts 1.8.2

  Changed 2 years ago by jmr@…

1.8.1 milestone is closing, moving open tickets out

  Changed 2 years ago by jmr@…

  • milestone changed from MacPorts 1.8.2 to MacPorts Future

  Changed 22 months ago by jmr@…

  • status changed from new to closed
  • resolution set to fixed
  • milestone changed from MacPorts Future to MacPorts 1.9.0
Note: See TracTickets for help on using tickets.