Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#56975 closed defect (worksforme)

livecheck broken in GitHub PortGroup

Reported by: l2dy (Zero King) Owned by:
Priority: Normal Milestone:
Component: ports Version:
Keywords: Cc: ryandesign (Ryan Carsten Schmidt), cjones051073 (Chris Jones)
Port:

Description

GitHub is returning different content with and without Accept: text/html header.

$ port -v live h2o
Error: cannot check if h2o was updated (regex didn't match)
$ curl 'https://github.com/h2o/h2o/tags'
  <li role="option" class="autocomplete-item">v2.3.0-beta1</li>
  <li role="option" class="autocomplete-item">v2.2.5</li>
  <li role="option" class="autocomplete-item">v2.2.4</li>
  <li role="option" class="autocomplete-item">v2.2.3</li>
  <li role="option" class="autocomplete-item">v2.2.2</li>
  <li role="option" class="autocomplete-item">v2.2.1</li>
  <li role="option" class="autocomplete-item">v2.2.0</li>
  <li role="option" class="autocomplete-item">v2.2.0-beta3</li>
  <li role="option" class="autocomplete-item">v2.2.0-beta2</li>

Change History (17)

comment:1 Changed 6 years ago by ryandesign (Ryan Carsten Schmidt)

Cc: ryandesign added

At one point I looked into switching the github portgroup's tag livecheck over to an atom feed like the commit livecheck already uses, since downloading the full page HTML is unnecessary and wasteful.

But it looks like now the URL we're already using is returning shorter output, which would be fine.

But this also points out that MacPorts isn't sending Accept: text/html when doing livecheck. And probably we should be, right?

comment:2 Changed 6 years ago by cjones051073 (Chris Jones)

Cc: cjones051073 added

comment:3 Changed 6 years ago by cjones051073 (Chris Jones)

Funnily enough, I stumbled over this myself yesterday and was looking into it a bit, before finding this ticket, so thought I would post what I had found just FYI...

I can confirm appending "Accept: text/html" to the curl http headers does seem to fix things.

E.g. without I see for instance,

> port -d live tigervnc
<snip>
DEBUG: Executing org.macports.livecheck (tigervnc)
DEBUG: Port (livecheck) version is 1.7.1
DEBUG: Fetching https://github.com/TigerVNC/tigervnc/tags
DEBUG: The regex is "archive/v([^"]+)\.tar\.gz"
Error: cannot check if tigervnc was updated (regex didn't match)

whereas with

~/Projects/MacPorts/base > git diff
diff --git a/src/pextlib1.0/curl.c b/src/pextlib1.0/curl.c
index d8202bf4..26984fb5 100644
--- a/src/pextlib1.0/curl.c
+++ b/src/pextlib1.0/curl.c
@@ -465,7 +465,8 @@ CurlFetchCmd(Tcl_Interp* interp, int objc, Tcl_Obj* CONST objv[])
                }

                /* Clear the Pragma: no-cache header */
                headers = curl_slist_append(headers, "Pragma:");
+               headers = curl_slist_append(headers, "Accept: text/html");
                theCurlCode = curl_easy_setopt(theHandle, CURLOPT_HTTPHEADER, headers);
                if (theCurlCode != CURLE_OK) {
                        theResult = SetResultFromCurlErrorCode(interp, theCurlCode);

I see

> port -d live tigervnc
<snip>
DEBUG: Executing org.macports.livecheck (tigervnc)
DEBUG: Port (livecheck) version is 1.7.1
DEBUG: Fetching https://github.com/TigerVNC/tigervnc/tags
DEBUG: The regex is "archive/v([^"]+)\.tar\.gz"
DEBUG: The regex matched "archive/v1.9.0.tar.gz", extracted "1.9.0"
DEBUG: The regex matched "archive/v1.8.90.tar.gz", extracted "1.8.90"
DEBUG: The regex matched "archive/v1.8.0.tar.gz", extracted "1.8.0"
DEBUG: The regex matched "archive/v1.7.90.tar.gz", extracted "1.7.90"
DEBUG: The regex matched "archive/v1.7.1.tar.gz", extracted "1.7.1"
DEBUG: The regex matched "archive/v1.7.0.tar.gz", extracted "1.7.0"
DEBUG: The regex matched "archive/v1.6.90.tar.gz", extracted "1.6.90"
DEBUG: The regex matched "archive/v1.6.0.tar.gz", extracted "1.6.0"
DEBUG: The regex matched "archive/v1.5.90.tar.gz", extracted "1.5.90"
DEBUG: The regex matched "archive/v1.5.0.tar.gz", extracted "1.5.0"
tigervnc seems to have been updated (port version: 1.7.1, new version: 1.9.0)

So would do the trick...

Whether this is the right thing to do is another question.

Should this header be appended to all curl calls, or just the livecheck ones ?

Would it be better instead to live with the shorter output and adjust the regex accordingly ?

Last edited 6 years ago by cjones051073 (Chris Jones) (previous) (diff)

comment:4 Changed 6 years ago by raimue (Rainer Müller)

We cannot add the header to all requests through libcurl, as web servers might no longer send us some files. For example, if we want to download a .tar.gz but only accept text/html as content type, the server might reject our request.

Probably we should have a parameter to allow passing arbitrary headers such as this "Accept:" into CurlFetchCmd from Tcl. Then it could only be used in the livecheck code.

It gets harder to find the right regex if the content is different from what is served to a browser, so I think we should change this in base. The quickfix is of course to adapt the regex.

comment:5 in reply to:  4 Changed 6 years ago by cjones051073 (Chris Jones)

Replying to raimue:

We cannot add the header to all requests through libcurl, as web servers might no longer send us some files. For example, if we want to download a .tar.gz but only accept text/html as content type, the server might reject our request.

Probably we should have a parameter to allow passing arbitrary headers such as this "Accept:" into CurlFetchCmd from Tcl. Then it could only be used in the livecheck code.

Yeah, I figured the same. So how about something like

~/Projects/MacPorts/base > git diff
diff --git a/src/pextlib1.0/curl.c b/src/pextlib1.0/curl.c
index d8202bf4..d20b26c0 100644
--- a/src/pextlib1.0/curl.c
+++ b/src/pextlib1.0/curl.c
@@ -178,6 +178,7 @@ CurlFetchCmd(Tcl_Interp* interp, int objc, Tcl_Obj* CONST objv[])
                };
                char* effectiveURL = NULL;
                char* userAgent = PACKAGE_NAME "/" PACKAGE_VERSION " libcurl/" LIBCURL_VERSION;
+               char* httpHeader = "";
                int optioncrsr;
                int lastoption;
                const char* theURL;
@@ -240,6 +241,18 @@ CurlFetchCmd(Tcl_Interp* interp, int objc, Tcl_Obj* CONST objv[])
                                        theResult = TCL_ERROR;
                                        break;
                                }
+                       } else if (strcmp(theOption, "--append-http-header") == 0) {
+                               /* check we also have the parameter */
+                               if (optioncrsr < lastoption) {
+                                       optioncrsr++;
+                                       httpHeader = Tcl_GetString(objv[optioncrsr]);
+                               } else {
+                                       Tcl_SetResult(interp,
+                                               "curl fetch: --append-http-header option requires a parameter",
+                                               TCL_STATIC);
+                                       theResult = TCL_ERROR;
+                                       break;
+                               }
                        } else if (strcmp(theOption, "--progress") == 0) {
                                /* check we also have the parameter */
                                if (optioncrsr < lastoption) {
@@ -466,6 +479,8 @@ CurlFetchCmd(Tcl_Interp* interp, int objc, Tcl_Obj* CONST objv[])

                /* Clear the Pragma: no-cache header */
                headers = curl_slist_append(headers, "Pragma:");
+               /* Append any optional headers */
+               headers = curl_slist_append(headers, httpHeader);
                theCurlCode = curl_easy_setopt(theHandle, CURLOPT_HTTPHEADER, headers);
                if (theCurlCode != CURLE_OK) {
                        theResult = SetResultFromCurlErrorCode(interp, theCurlCode);
diff --git a/src/port1.0/portlivecheck.tcl b/src/port1.0/portlivecheck.tcl
index 0b88cc4c..2320d2bc 100644
--- a/src/port1.0/portlivecheck.tcl
+++ b/src/port1.0/portlivecheck.tcl
@@ -75,7 +75,7 @@ proc portlivecheck::livecheck_main {args} {

     ui_debug "Port (livecheck) version is ${livecheck.version}"

-    set curl_options {}
+    set curl_options { "--append-http-header" "Accept: text/html" }
     if {[tbool livecheck.ignore_sslcert]} {
         lappend curl_options "--ignore-ssl-cert"
     }

?

It gets harder to find the right regex if the content is different from what is served to a browser, so I think we should change this in base. The quickfix is of course to adapt the regex.

comment:6 Changed 6 years ago by cjones051073 (Chris Jones)

It seems things are even more complicated. Whilst the above largely fixes most ports using the GitHub port group, it has cased issues with others, such as those fetching from sourceforge. The issue here is the feed checked is rss and adding the "Accept: text/html" header breaks these fetches.

> curl -H"Accept: text/html" https://sourceforge.net/projects/expat/rss
project_rss
>

So it seems more flexibility is required here...

Last edited 6 years ago by cjones051073 (Chris Jones) (previous) (diff)

comment:7 Changed 6 years ago by cjones051073 (Chris Jones)

OK, so a second try.

Now in the live check does not hardcode adding the "Accept: text/html" header. Instead it declares a new livecheck.httpheader option (defaults to {}), which then only the GitHub port group uses. See

https://github.com/macports/macports-base/compare/master...cjones051073:CurlFetch-AddHTTPHeaderOption

and

https://github.com/macports/macports-ports/compare/master...cjones051073:GitHubPortGroup-SetHTTPHeader

for the diffs in base and ports respectively.

I've checked with a few ports, and with this livecheck is fixed for GitHub ports, and also not broken for sourceforge...

comment:8 Changed 6 years ago by cjones051073 (Chris Jones)

On reflection, rename livecheck.httpheader live check.curloptions ... Obviously can change further.

So, what do people think ? Is this going in the right direction ?

comment:9 Changed 6 years ago by jmroot (Joshua Root)

If you're trying to match what browsers do, something like Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 would be what you want.

comment:10 Changed 6 years ago by cjones051073 (Chris Jones)

Well, my thinking was more to not change anything beyond where it is needed, hence the new livecheck option that defaults to doing nothing, and is only used by the GitHub group.

comment:11 Changed 6 years ago by cjones051073 (Chris Jones)

But yes, using the line you suggest as the default for all livecheck fetches does seem to work OK.

comment:12 Changed 6 years ago by cjones051073 (Chris Jones)

So, what about my submitting a PR for the base changes above ?

comment:13 Changed 6 years ago by l2dy (Zero King)

Resolution: worksforme
Status: newclosed

GitHub reverted the change.

comment:14 Changed 6 years ago by cjones051073 (Chris Jones)

OK, thats good. I still wonder though if it wouldn't still make sense to send headers that better match what browsers do ?

comment:15 Changed 6 years ago by ryandesign (Ryan Carsten Schmidt)

Yes, we should change MacPorts to send an Accept header that's similar to what a browser sends.

We should also see if we can make the github portgroup request a smaller document for livecheck, such as the atom feed, to save bandwidth.

We should also have livecheck request a compressed response to save bandwidth; see #55011.

comment:16 Changed 6 years ago by cjones051073 (Chris Jones)

OK. In that case I will submit what I have so far as a PR. Most of the changes are simply to implement the option to pass a Header to the curl fetch, which would be needed regardless of if and where it is used...

Note: See TracTickets for help on using tickets.