Opened 4 years ago

Last modified 2 years ago

#50969 assigned enhancement

Provide a generic extract command in base

Reported by: mojca (Mojca Miklavec) Owned by:
Priority: Normal Milestone: MacPorts 2.7.0
Component: base Version: 2.3.4
Keywords: Cc: RJVB (René Bertin), ci42, ryandesign (Ryan Schmidt), raimue (Rainer Müller), neverpanic (Clemens Lang), ian.rees@…
Port:

Description (last modified by mojca (Mojca Miklavec))

It would be extremely useful to introduce a command like

file_extract foo.tar.xz

in the core, so that ports could even "manually" extract (or compress) additional files when needed.

At the moment MacPorts supports fetching distfiles from different mirrors, but it doesn't support extracting different filetypes, so all the "hard work" of extracting the proper file has to be repeated manually in the Portfile. It might also be useful to be able to extract additional tarballs when fetching from git/svn, so the command would be helpful in many different context.

MacPorts could also have the ability to automatically use different extract methods based on filenames (that is: when file extension is specified; use_xz would then only influence the default type that could theoretically be overwritten).

(The other nice-to-have functionality would be the ability to create (reproducible) tarballs on demand as requested in #50896.)

Change History (13)

comment:1 Changed 4 years ago by mojca (Mojca Miklavec)

Description: modified (diff)

comment:2 Changed 4 years ago by RJVB (René Bertin)

At the moment MacPorts supports fetching distfiles from different mirrors, but it doesn't support extracting different filetypes, so all the "hard work" of extracting the proper file has to be repeated manually in the Portfile.

Shouldn't that read "would have to be repeated manually in the Portfile if an error wasn't raised by the automatic extraction process" and you don't rewrite the extract block yourself?

I'd add a nice-to-have feature, at least until "base" provides an integrated way of fetching files in multiple formats:

file_fetch <name> [site]

that would do an on-demand fetch of the named file, looking it up on the master_sites or else from the given site. The site argument might be redundant if adding unused sites to master_sites has no negative side-effects. As far as I can see that would mostly require refactoring portfetch::fetchfiles from portfetch.tcl slightly, and exporting the procedure for extracting a single file.

Here's an example from a work-in-progress Portfile of mine where such a function would come in handy:

post-fetch {
    if {![file exists ${distpath}/${qtwkdistfile}]} {
        ui_msg "--->  Fetching qtwebkit"
        # find a better way to fetch the archive with some form of optional progress
        # report that doesn't scroll 
        system -W ${distpath} "wget --progress=bar:force:noscroll --quiet --show-progress https://github.com/qtproject/qtwebkit/archive/${qtwebkit_commit}/qtwebkit-snapshot.tar.gz"
        # give the file a more evocative name
        file rename ${distpath}/qtwebkit-snapshot.tar.gz ${distpath}/${qtwkdistfile}
    }
}

I don't see how the automatic fetch/extract mechanism could achieve this - it would require syntax to tell fetch to save a given distfile under a different name, and extract to use that new name.

I'd still need a manual extract, but that's probably required anyway (and not just because using hfsCompression really makes a difference here):

post-extract {
    if {![file exists ${distpath}/${qtwkdistfile}]} {
        ui_error "This shouldn't happen"
    } elseif {![file exists ${worksrcpath}/qtwebkit/.git] && [file exists ${distpath}/${qtwkdistfile}]} {
        ui_msg "--->  Extracting ${qtwkdistfile}"
        if {[file exists ${prefix}/bin/bsdtar]} {
            system -W ${worksrcpath} "bsdtar -x -k --hfsCompression -f ${distpath}/${qtwkdistfile}"
        } else {
            system -W ${worksrcpath} "tar -xf ${distpath}/${qtwkdistfile}"
        }
        file rename ${worksrcpath}/qtwebkit-${qtwebkit_commit} ${worksrcpath}/qtwebkit
        file mkdir ${worksrcpath}/qtwebkit/.git
    }
}

comment:3 Changed 4 years ago by RJVB (René Bertin)

Cc: rjvbertin@… added

Cc Me!

comment:4 Changed 4 years ago by RJVB (René Bertin)

Ahem, after squinting some more at portfetch.tcl I realised that there is already a 'curl' command that's available directly in Tcl, so I can replace my wget above with

    if {[llength [info commands ui_progress_download]] > 0 } {
        curl fetch --remote-time --progress ui_progress_download \
            https://github.com/qtproject/qtwebkit/archive/${qtwebkit_commit}/qtwebkit-snapshot.tar.gz \
            ${distpath}/${qtwkdistfile}
    } else {
        curl fetch --remote-time --progress builtin \
            https://github.com/qtproject/qtwebkit/archive/${qtwebkit_commit}/qtwebkit-snapshot.tar.gz \
            ${distpath}/${qtwkdistfile}
    }

Still, I'd say that something like this would be nice to have:

proc file_fetch {url dest} {
    if {[llength [info commands ui_progress_download]] > 0 } {
        curl fetch --remote-time --progress ui_progress_download \
            ${url} ${dest}
    } else {
        curl fetch --remote-time --progress builtin \
            ${url} ${dest}
    }
}

(also to simplify the code in "base" as the check for ui_progress_download is repeated in multiple locations currently.)

Last edited 4 years ago by RJVB (René Bertin) (previous) (diff)

comment:5 Changed 4 years ago by mojca (Mojca Miklavec)

I agree that if we add file_extract, file_compress, ... adding file_fetch would be a a natural follower.

But if I understand correctly the only reason why you need to rewrite the fetching phase yourself in this particular example is because you want to rename the file afterwards to prevent clashes? I usually worked around that by using dist_subdir, but again dist_subdir is global to the port, so if your ports need one huge file that doesn't change (but is properly versioned) and another small file that doesn't change the name (perhaps it changes just the URL) and keeps changing on regular basis, using dist_subdir is very suboptimal.

Supporting renaming of files is not a completely unreasonable request. In particular not in the context of adding more flexibility like different suffixes / extract methods, files with the same name from different URLs, very unfortunate filenames, fetching files to different dist_subdir-s, ... who knows what else. But it would require a different syntax.

comment:6 Changed 4 years ago by RJVB (René Bertin)

No, in my current case there is also the issue that the main file uses a different compression scheme, so putting the additional file in distfiles will cause the extraction phase to fail.

However, one can think of other scenarios where one would want to download additional file(s) outside of the regular mechanism mostly designed for source files. Configuration, templates, data, there are many kinds of additional things a port could want to provide automatically when it is (re)installed, that are kept up-to-date upstreams but for which the installed software doesn't provide an integrated updating mechanism.

I admit, it's still a bit far-fetched. OTOH, almost all the work has been done, cf. my example file_fetch procedure above ... which could also be provided through a PortGroup of course.

Last edited 4 years ago by RJVB (René Bertin) (previous) (diff)

comment:7 Changed 4 years ago by mojca (Mojca Miklavec)

I have another example of a need for "post-fetch". A project forgot to compress the files (the files have a correct name foo-version.tar.gz, but they are in fact tar files). A solution would be to either adapt the extract for that single file or to compress the file after it has been fetched and store the compressed version only (under the same name as the file that has just been fetched).

This is somewhat similar to René's case with the need to rename file after it has been fetched.

comment:8 in reply to:  7 Changed 4 years ago by RJVB (René Bertin)

Replying to mojca@…:

This is somewhat similar to René's case with the need to rename file after it has been fetched.

Yes, and my case used to be even more similar. Before I discovered the github mirror where one can fetch a tarball, I actually had to do a manual git clone in the post-fetch. And to avoid the associated overhead I archived that check-out so that a rebuild would be able to unpack a tarball in the post-extract phase.

comment:9 Changed 4 years ago by raimue (Rainer Müller)

Cc: raimue@… added
Summary: Provide a generic extract (and compress) commands in baseProvide a generic extract command in base

comment:10 Changed 4 years ago by raimue (Rainer Müller)

Example usage of the simplified distfiles option on macports-dev: https://lists.macosforge.org/pipermail/macports-dev/2016-March/032671.html

comment:11 Changed 3 years ago by ci42

Cc: ci42 added

comment:12 Changed 2 years ago by neverpanic (Clemens Lang)

Cc: ryandesign neverpanic ian.rees@… added
Milestone: MacPorts 2.6.0
Owner: raimue@… deleted
Status: newassigned

comment:13 Changed 2 years ago by mojca (Mojca Miklavec)

Milestone: MacPorts 2.6.0MacPorts 2.7.0
Note: See TracTickets for help on using tickets.