GSoC Week 3-4

4 minute read

For some reason, I totally forgot the blog for week 3, so here is a bundle of the progress from week 3 and week 4.

Summary of the two weeks

At this stage, the main focus is on making git-sparse-checkout work better with git-mv. With the ongoing “from out-of-cone to in-cone” clearing up, I’m also working to make the complementary “from in-cone to out-of-cone” possible. Read the previous blogs on GSoC to get a better context.

In the meantime, some experiments towards integration with sparse-index have also started, which are based on the latest work “from in-cone to out-of-cone” boiling in my branch.

This week I’m working to ship a PATCH v5 (please reference all the code here) to address the issues raised in PATCH v4.

The good news is, that PATCH v5 is being queued into the ‘next’ branch, which means it could potentially be merged into ‘master’. It marks that this stage of work is almost done.

What’s going on in PATCH v5

  1. Fix style-nits.

  2. Add t1092 tests (2/8) for “mv: add check_dir_in_index() and solve general dir check issue” (8/8).

There is really not much to say about v5: it addressed some questions/ideas raised in v4, and that’s it.

What I was doing in week 3

I spent the whole week experimenting with moving from in-cone to out-of-cone. It is a complementary part to the ongoing out-of-cone to in-cone series.

In this form of move, the <destination> is a SKIP_WORKTREE_DIR (as a enum flag in builtin/mv.c), which means it is a directory exists only in the index, but missing in the working tree. Such a directory is a result of all its files being sparsified, so it is removed from the working tree (it is also known as a “sparse-directory entry” when sparse-index is on).

It is worth noticing that both “cone mode” and out-of-cone specify that the <destination> can only be such a directory described above. For the reason behind this conclusion and more information about “cone mode” (which is an essential concept), please see git-sparse-checkout (1), section “INTERNALS — CONE PATTERN SET”.

To make this form of move possible, we should do a few steps:

  1. When <destination> does not present in the working tree, utilize the check_dir_in_index function to see if <destination> is in the index as a SKIP_WORKTREE_DIR, or if it is a “sparse-directory entry”. If yes, then proceed to the next step, otherwise, stop.

  2. Check if the cache_entry (in mv, the move is usually done by two steps: first rename(2) the file on disk, then rename the corresponding cache_entry) being moved is dirty (Changes not staged for commit). If not dirty: we turn on the CE_SKIP_WORKTREE bit for the moved cache_entry, then we simply delete the corresponding file from the disk. The reason behind this is that moving <source> from in-cone to out-of-cone, the expected behavior is to “sparsify” the file: turn on its CE_SKIP_WORKTREE bit and the corresponding file should be gone from the disk. If is dirty: we create the leading directories so that the result can be moved (e.g. in git mv folder2/file folder1/deeper/, folder1/deeper/ is a SKIP_WORKTREE_DIR, so we do something like mkdir -p folder1/deeper to make sure that rename("folder2/file", "folder1/deeper/file") can work). In this case, we don’t want to remove the resulted file (in this case “folder1/deeper/file”), and we don’t turn on its corresponding cache_entry’s CE_SKIP_WORKTREE bit. The reason is that the change in this file has not been staged yet, we should leave it on disk for safety, so the user can decide what to do.

  3. In the dirty case, we also warn the user about what paths are dirty and thus not moved.

What I was doing in week 4

So week 3 was for experimenting with moving from in-cone to out-of-cone. In week 4, I was building the integration with sparse-index on top of week 3’s result. It’s time to get my feet wet trying to work with sparse-index, after realizing a relatively ideal interaction between mv and sparse-checkout.

One of the head-on obstacles I met is that mv does not know about sparse-index at all. For example, let’s say git mv folder1 deep, wherein folder1 is a sparse-directory entry, deep is a normal directory, and the expected result is deep/folder1. We know that mv needs to search the index, and find every cache_entry that starts with folder1/, and move these cache_entry one by one. However, with sparse-index on, the folder1 is stored in the index as folder1/, and all its files are pruned away from the index, which means we can’t really locate any files under it, so the whole “moving a directory” logic is broken.

There are several possible solutions here:

  1. Expand the folder1/ sparse-directory entry first, so all the files under it are back, then we should be able to utilize the original mv logic.
  2. Treat sparse-index as a special case and make some new logic for it, which could require more effort than the previous solution.

What’s next

Similar logic conflicts are potentially many, and I’m still working to address them to make sure things work with sparse-index.