GSoC Week 1

2 minute read

Background

For my GSoC project “More Sparse Index Integrations”, I will integrate more Git commands with the “sparse index” feature. The first command to integrate is git-mv.

When I was experimenting with git-mv, I found that it still has some erroneous/weird logic when working with git-sparse-checkout. This RFC (Request Further Comments) patch is relevant.

And because being able to work correctly with git-sparse-checkout is a prerequisite for integrating with “sparse index”, I decided that I should optimize git-mv to work better with git-sparse-checkout first.

Check out this patch to see this WIP (Work In Progress) series.

About this week

At this stage, the main focus is making git-sparse-checkout work better with git-mv.

This week I’m working to ship a WIP v3 (please reference all the code here) to address the issues raised in WIP v2.

What is wrong between git-sparse-checkout and git-mv

git-mv is one of the oldest commands in Git and it does two things:

  1. Move a file/directory/symlink
  2. Rename a file/directory/symlink

It works just fine until we try to make it work with git-sparse-checkout.

git-sparse-checkout can make some files exist in the index but absent in the working tree. One of the few cases where git-mv gets confused is that when you are trying to git mv <source> <destination>, and <source> happpens to be a “sparse file”, which present in the index but cannot be found in the working tree. In this case, git-mv naturally complains because the file it is trying to move is not on-disk.

How to solve it

What we like git-mv to do is, if it cannot find the <source> in the working tree, it should go ahead and see if this <source> is in the index (only if supplied with --sparse). If it is there, perform the move/rename.

Here is my proposed solution:

  1. If the <source> is not on-disk, look for the cache_entry with the same name in the index. If found and the command is supplied with a --sparse option (e.g. git mv --sparse <source> <destination>), perform the move.

  2. Do the same thing if <source> is a directory.

  3. After the move, if <source> is out-of-cone, <destination> is in-cone, we have to deal with this cross-cone movement (see what is a “cone” here). The ideal result is, moving from out-of-cone (<source> was absent in the working tree) to in-cone (<destination> should be present in the working tree), <destination> should be checked out from the index to the working tree. To acheive this result, we should

    • disable the SKIP_WORKTREE bit from the corresponding cache_entry
    • checkout this cache_entry to the working tree

Limitations

With the implementation above, we have addressed moving from out-of-cone to in-cone. How about the reverse action: moving from in-cone to out-of-cone?

The idea is basically the opposite of the solution mentioned above:

  • enable the SKIP_WORKTREE bit of the corresponding cache_entry
  • remove this cache_entry’s file counterpart from the working tree

However, there are a few tricks here:

  1. git-mv does not work when the <destination> is absent from the working tree. However, moving to out-of-cone indicates that the <destination> is absent from the working tree. Therefore, how to tweak git-mv’s logic so we can add this feature becomes an interesting problem.
  2. Moving to out-of-cone needs to remove the working tree file. But what if the working tree file is not up-to-date (i.e. it is modified but not added to index yet)? If we directly remove it, the information will lose. So how can we do it properly for both security and better UX?

For these tricks, I will try to answer them in the next week.

Tags:

Categories:

Updated: