Opened 12 years ago

Closed 11 years ago

#18736 closed enhancement (fixed)

distname is not percent-encoded before gluing it into the URL

Reported by: ryandesign (Ryan Schmidt) Owned by: macports-tickets@…
Priority: Normal Milestone: MacPorts 1.9.0
Component: base Version: 1.7.0
Keywords: Cc: nerdling (Jeremy Lavergne), blb@…, raimue (Rainer Müller)
Port:

Description

MacPorts base does not percent-encode the distname before putting it in the final URL. This causes distnames containing spaces to be completely unavailable (causes a Tcl error), causes distnames containing a percent character to probably not download at all (probably not a common need), and causes distnames containing a plus character to work sometimes but not others, at the whim of the web server (since a plus should really be encoded as %2b the web server is completely within its right to not know what we're asking for).

I first noted this in #18733 in response to r47662.

Attachments (2)

urlencode.diff (1.6 KB) - added by ryandesign (Ryan Schmidt) 12 years ago.
patch-macports-curl-escape.diff (2.0 KB) - added by raimue (Rainer Müller) 12 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 12 years ago by ryandesign (Ryan Schmidt)

I searched high and low to see how to do percent-encoding in Tcl. I found a function http::mapReply which does it. It's part of the http module included with Tcl. It seems like I should just be able to do something like this:

package require http
return [http::mapReply ${distname}]

Unfortunately while I can get this to work with MacPorts tcl, I can't get it to work with the OS-provided tcl; it can't find the http::mapReply procedure. And even if we find out why that is and how to fix it, it won't work because http::mapReply was seriously broken until tcl 8.4.12; Mac OS X 10.4.11 has tcl 8.4.7 so it would not include the fix.

The http module appears to have originally been written in tcl (http.tcl) but is now apparently written in C. I copied the tcl code from http.tcl and modified it as suggested in the bug report so that it is not broken. I'll attach the patch I came up with.

Changed 12 years ago by ryandesign (Ryan Schmidt)

Attachment: urlencode.diff added

comment:2 Changed 12 years ago by raimue (Rainer Müller)

Cc: raimue@… added

I also can't find mapReply in the Tcl docs. So this seems to have been an internal proc.

Now you already made the effort of writing this in Tcl, but as we are already using and linking with libcurl, we could use curl_easy_escape() for this purpose.

comment:3 in reply to:  2 ; Changed 12 years ago by ryandesign (Ryan Schmidt)

Replying to raimue@…:

Now you already made the effort of writing this in Tcl, but as we are already using and linking with libcurl, we could use curl_easy_escape() for this purpose.

That would be fine with me, except if it does what the docs say: "All input characters that are not a-z, A-Z or 0-9 are converted to their 'URL escaped' version" -- if that's what it really does then that's broken just like tcl < 8.4.12 was broken: it needs to convert all characters that are not a-z, A-Z, 0-9, -, ., _ or ~.

comment:4 Changed 12 years ago by raimue (Rainer Müller)

According to the man page, libcurl implements RFC2396, Tcl goes with RFC3986. It is not really broken, just outdated. Which is not a serious issue, as decoders are advised to decode these characters normally.

There is also a simple approach using the public http API, which is so simple that we could drop the wrapper as well. Although calling ::http::formatQuery with an odd number of arguments seems not to be documented (could also call with an additional empty string and strip the last char).

package require http

proc urlencode {str} {
    return [::http::formatQuery $str]
}

But if we are going to duplicate this, we should probably copy from the latest version which is not using regsub for performance reasons. It is not implemented in C as stated on the Tclers wiki.

comment:5 in reply to:  4 ; Changed 12 years ago by ryandesign (Ryan Schmidt)

Replying to raimue@…:

According to the man page, libcurl implements RFC2396, Tcl goes with RFC3986. It is not really broken, just outdated. Which is not a serious issue, as decoders are advised to decode these characters normally.

It seems silly if you want to download foo-1.0.tar.gz to request foo%2d1%2e0%2etar%2egz but I agree it should still work.

There is also a simple approach using the public http API, which is so simple that we could drop the wrapper as well.

That would be fine if it works. I had not tried [::http::something] I had tried [http::something] which was how the example I saw used it. I'm not so familiar with tcl syntax when it comes to packages.

Although calling ::http::formatQuery with an odd number of arguments seems not to be documented (could also call with an additional empty string and strip the last char).

formatQuery is for formatting a query string (e.g. "?a=b&c=d"), but that's not what we're doing, so I don't feel we should be using that function. It seems mapReply is the correct function to use.

But if we are going to duplicate this, we should probably copy from the latest version which is not using regsub for performance reasons. It is not implemented in C as stated on the Tclers wiki.

Ok.

comment:6 in reply to:  5 ; Changed 12 years ago by raimue (Rainer Müller)

FYI ::http::* starts from the top level namespace, http::* looks for a http namespace in the current namespace first.

Replying to ryandesign@…:

formatQuery is for formatting a query string (e.g. "?a=b&c=d"), but that's not what we're doing, so I don't feel we should be using that function. It seems mapReply is the correct function to use.

http::mapReply is private API (at least in 8.4 and 8.5, seems to have changed for 8.6), we should not use it. But we cannot even use it before we call another http::* function, as it is not listed in pkgIndex.tcl. http::formatQuery is a wrapper around http::mapReply which is public. Actually we want to encode a string to be used as value in a query string, so it is not absolutely wrong.

comment:7 in reply to:  6 Changed 12 years ago by ryandesign (Ryan Schmidt)

Replying to raimue@…:

http::mapReply is private API (at least in 8.4 and 8.5, seems to have changed for 8.6), we should not use it. But we cannot even use it before we call another http::* function, as it is not listed in pkgIndex.tcl.

That's a shame.

http::formatQuery is a wrapper around http::mapReply which is public. Actually we want to encode a string to be used as value in a query string, so it is not absolutely wrong.

The query string is the part of a URL after the question mark. We are constructing a distfile URL which does not contain a question mark, meaning it does not have a query string component.

So what about using curl_easy_escape() as you suggested?

Changed 12 years ago by raimue (Rainer Müller)

comment:8 Changed 12 years ago by raimue (Rainer Müller)

patch-macports-curl-escape.diff is a patch to add a curl escape command, using curl_easy_escape() internally.

$ rlwrap /usr/bin/tclsh
% source /Library/Tcl/macports1.0/macports_fastload.tcl
0
% package require Pextlib
1.0
% curl escape foo~bar+baz!
foo%7Ebar%2Bbaz%21

comment:9 Changed 12 years ago by ryandesign (Ryan Schmidt)

Thanks.

When trying to compile MacPorts with this change, I got:

ld: Undefined symbols:
_curl_easy_escape

I found that on this Intel Tiger Mac, I somehow had both library version 3 and 4 of curl, with all the symlinks pointing to version 3, and only version 4 has the curl_easy_escape function:

$ ls -l /usr/lib/libcurl*
lrwxr-xr-x   1 root  wheel      15 Sep  5  2008 /usr/lib/libcurl.2.dylib -> libcurl.3.dylib
lrwxr-xr-x   1 root  wheel      15 Sep  5  2008 /usr/lib/libcurl.3.0.0.dylib -> libcurl.3.dylib
-rwxr-xr-x   1 root  wheel  384692 Oct 24 02:02 /usr/lib/libcurl.3.dylib
-rwxr-xr-x   1 root  wheel  460136 Nov  6 15:53 /usr/lib/libcurl.4.dylib
lrwxr-xr-x   1 root  wheel      15 Sep  5  2008 /usr/lib/libcurl.dylib -> libcurl.3.dylib
$ grep curl_easy_escape /usr/lib/libcurl*
Binary file /usr/lib/libcurl.4.dylib matches
$

I checked two other Tiger Macs (one Intel and one PowerPC) and they only have library version 4, and everything else is just symlinks to it:

$ ls -l libcurl*
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.2.dylib -> libcurl.4.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 16  2007 libcurl.3.0.0.dylib -> libcurl.3.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.3.dylib -> libcurl.4.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.4.0.0.dylib -> libcurl.4.dylib
-rwxr-xr-x  1 root  wheel  460136 Nov  6 15:53 libcurl.4.dylib
lrwxr-xr-x  1 root  wheel      15 Mar 26  2008 libcurl.dylib -> libcurl.4.dylib
$

And I checked two Leopard Macs (one Intel, one PowerPC) which also look the same (except that the libcurl.3.0.0.dylib symlink is missing).

Once I fixed the symlinks on the first Mac I could build it.

comment:10 in reply to:  5 Changed 12 years ago by ryandesign (Ryan Schmidt)

Replying to ryandesign@…:

It seems silly if you want to download foo-1.0.tar.gz to request foo%2d1%2e0%2etar%2egz but I agree it should still work.

I've had this patch in place for awhile locally, and it seems to work with most servers, but not with downloads.sourceforge.net.

comment:11 in reply to:  3 Changed 12 years ago by nerdling (Jeremy Lavergne)

Cc: snc@… added

Replying to ryandesign@…:

That would be fine with me, except if it does what the docs say: "All input characters that are not a-z, A-Z or 0-9 are converted to their 'URL escaped' version" -- if that's what it really does then that's broken just like tcl < 8.4.12 was broken: it needs to convert all characters that are not a-z, A-Z, 0-9, -, ., _ or ~.

We can probably summarize that as "Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits." (from PHP).

comment:12 Changed 12 years ago by jmroot (Joshua Root)

Milestone: MacPorts 1.8.0MacPorts 1.8.1

Pushing this back a little since it's not a regression and will apparently be nontrivial to fix.

comment:13 Changed 11 years ago by wsiegrist@…

Milestone: MacPorts 1.8.1MacPorts 1.8.2

comment:14 Changed 11 years ago by jmroot (Joshua Root)

1.8.1 milestone is closing, moving open tickets out

comment:15 Changed 11 years ago by jmroot (Joshua Root)

Milestone: MacPorts 1.8.2MacPorts Future

comment:16 Changed 11 years ago by jmroot (Joshua Root)

Milestone: MacPorts FutureMacPorts 1.9.0
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.