ZIP file structures ================ ``` r devtools::load_all("~/rrr/usethis") #> ℹ Loading usethis library(fs) ``` ## Different styles of ZIP file usethis has an unexported function `tidy_unzip()`, which is used under the hood in `use_course()` and `use_zip()`. It is a wrapper around `utils::unzip()` that uses some heuristics to choose a good value for `exdir`, which is the “the directory to extract files to.” Why do we do this? Because it’s really easy to *not* get the desired result when unpacking a ZIP archive. Common aggravations: - Instead of the unpacked files being corraled within a folder, they explode as “loose parts” into the current working directory. Too little nesting. - The unpacked files are contained in a folder, but that folder itself is contained inside another folder. Too much nesting. `tidy_unzip()` tries to get the nesting just right. Why doesn’t unzipping “just work”? Because the people who make `.zip` files make lots of different choices when they actually create the archive and these details aren’t baked in, i.e. a successful roundtrip isn’t automatic. It usually requires some peeking inside the archive and adjusting the unpack options. This README documents specific `.zip` situations that we anticipate. ## Explicit parent folder Consider the foo folder: ``` bash tree foo #> foo #> └── file.txt #> #> 1 directory, 1 file ``` Zip it up like so: ``` bash zip -r foo-explicit-parent.zip foo/ ``` This is the type of ZIP file that we get from GitHub via links of the forms and . Inspect it in the shell: ``` bash unzip -Z1 foo-explicit-parent.zip #> foo/ #> foo/file.txt ``` Or from R: ``` r foo_files <- unzip("foo-explicit-parent.zip", list = TRUE) with( foo_files, data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name)) ) #> Name dirname basename #> 1 foo/ . foo #> 2 foo/file.txt foo file.txt ``` Note that the folder `foo/` is explicitly included and all of the files are contained in it (in this case, just one file). ## Implicit parent folder Consider the foo folder: ``` bash tree foo #> foo #> └── file.txt #> #> 1 directory, 1 file ``` Zip it up like so: ``` bash zip -r foo-implicit-parent.zip foo/* ``` Note the use of `foo/*`, as opposed to `foo` or `foo/`. This type of ZIP file was reported in . The example given there is . Inspect our small example in the shell: ``` bash unzip -Z1 foo-implicit-parent.zip #> foo/file.txt ``` Or from R: ``` r foo_files <- unzip("foo-implicit-parent.zip", list = TRUE) with( foo_files, data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name)) ) #> Name dirname basename #> 1 foo/file.txt foo file.txt ``` Note that `foo/` is not included and its (original) existence is just implicit in the relative path to, e.g., `foo/file.txt`. Here’s a similar look at the example from issue \#1961: ``` bash ~/rrr/usethis/tests/testthat/ref % unzip -l ~/Downloads/Species\ v2.3.zip Archive: /Users/jenny/Downloads/Species v2.3.zip Length Date Time Name --------- ---------- ----- ---- 1241 04-27-2023 09:16 species_v2/label_encoder.txt 175187560 04-06-2023 15:13 species_v2/model_arch.pt 174953575 04-14-2023 12:28 species_v2/model_weights.pth --------- ------- 350142376 3 files ``` Note also that the implicit parent folder `species_v2` is not the base of the ZIP file name `Species v2.3.zip`. ## No parent Consider the foo folder: ``` bash tree foo #> foo #> └── file.txt #> #> 1 directory, 1 file ``` Zip it up like so: ``` bash (cd foo && zip -r ../foo-no-parent.zip .) ``` Note that we are zipping everything in a folder from *inside* the folder. Inspect our small example in the shell: ``` bash unzip -Z1 foo-no-parent.zip #> file.txt ``` Or from R: ``` r foo_files <- unzip("foo-no-parent.zip", list = TRUE) with( foo_files, data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name)) ) #> Name dirname basename #> 1 file.txt . file.txt ``` All the files are packaged in the ZIP archive as “loose parts”, i.e. there is no explicit or implicit top-level directory. ## No parent, the DropBox Variation This is the structure of ZIP files yielded by DropBox via links of this form . I can’t figure out how to even do this with zip locally, so I had to create an example on DropBox and download it. Jim Hester reports it is possible with `archive::archive_write_files()`. It’s basically like the “no parent” example above, except it includes a spurious top-level directory `"/"`. Inspect our small example in the shell: ``` bash unzip -Z1 foo-loose-dropbox.zip #> / #> file.txt ``` Or from R: ``` r # curl::curl_download( # "https://www.dropbox.com/sh/5qfvssimxf2ja58/AABz3zrpf-iPYgvQCgyjCVdKa?dl=1", # destfile = "foo-loose-dropbox.zip" # ) foo_files <- unzip("foo-loose-dropbox.zip", list = TRUE) with( foo_files, data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name)) ) #> Name dirname basename #> 1 / / #> 2 file.txt . file.txt ``` Also note that, when unzipping with `unzip` in the shell, you get this result: Archive: foo-loose-dropbox.zip warning: stripped absolute path spec from / mapname: conversion of failed inflating: file.txt which indicates some tripping over the `/`. Only `file.txt` is left behind. This is a pretty odd ZIP packing strategy. But we need to plan for it. ## Only subdirectories For testing purposes, we also want an example where all the files are inside subdirectories. Examples based on the yo directory here: ``` bash tree yo #> yo #> ├── subdir1 #> │   └── file1.txt #> └── subdir2 #> └── file2.txt #> #> 3 directories, 2 files ``` Zip it up, in all the usual ways: ``` bash zip -r yo-explicit-parent.zip yo/ zip -r yo-implicit-parent.zip yo/* (cd yo && zip -r ../yo-no-parent.zip .) ``` Again, I couldn’t create the DropBox variant locally, so I did it by downloading from DropBox. ``` r # curl::curl_download( # "https://www.dropbox.com/sh/afydxe6pkpz8v6m/AADHbMZAaW3IQ8zppH9mjNsga?dl=1", # destfile = "yo-loose-dropbox.zip" # ) ``` Inspect each in the shell: ``` bash unzip -Z1 yo-explicit-parent.zip #> yo/ #> yo/subdir2/ #> yo/subdir2/file2.txt #> yo/subdir1/ #> yo/subdir1/file1.txt ``` ``` bash unzip -Z1 yo-implicit-parent.zip #> yo/subdir1/ #> yo/subdir1/file1.txt #> yo/subdir2/ #> yo/subdir2/file2.txt ``` ``` bash unzip -Z1 yo-no-parent.zip #> subdir2/ #> subdir2/file2.txt #> subdir1/ #> subdir1/file1.txt ``` ``` bash unzip -Z1 yo-loose-dropbox.zip #> / #> subdir1/file1.txt #> subdir2/file2.txt #> subdir1/ #> subdir2/ ```