GSoC Week 1
Background
For my GSoC project “More Sparse Index Integrations”, I will integrate more
Git commands with the “sparse index” feature. The first command to integrate
is git-mv
.
When I was experimenting with git-mv
, I found that it still has some
erroneous/weird logic when working with git-sparse-checkout
. This RFC
(Request Further Comments) patch is relevant.
And because being able to work correctly with
git-sparse-checkout
is a prerequisite for integrating with “sparse index”,
I decided that I should optimize git-mv
to work better with
git-sparse-checkout
first.
Check out this patch to see this WIP (Work In Progress) series.
About this week
At this stage, the main focus is making git-sparse-checkout
work better
with git-mv
.
This week I’m working to ship a WIP v3 (please reference all the code here) to address the issues raised in WIP v2.
What is wrong between git-sparse-checkout
and git-mv
git-mv
is one of the oldest commands in Git and it does two things:
- Move a file/directory/symlink
- Rename a file/directory/symlink
It works just fine until we try to make it work with git-sparse-checkout
.
git-sparse-checkout
can make some files exist in the index but absent
in the working tree. One of the few cases where git-mv
gets confused is
that when you are trying to git mv <source> <destination>
, and <source>
happpens to be a “sparse file”, which present in the index but cannot be found
in the working tree. In this case, git-mv
naturally complains because
the file it is trying to move is not on-disk.
How to solve it
What we like git-mv
to do is, if it cannot find the <source>
in the
working tree, it should go ahead and see if this <source>
is in the index
(only if supplied with --sparse
). If it is there, perform the move/rename.
Here is my proposed solution:
-
If the
<source>
is not on-disk, look for thecache_entry
with the same name in the index. If found and the command is supplied with a--sparse
option (e.g.git mv --sparse <source> <destination>
), perform the move. -
Do the same thing if
<source>
is a directory. -
After the move, if
<source>
is out-of-cone,<destination>
is in-cone, we have to deal with this cross-cone movement (see what is a “cone” here). The ideal result is, moving from out-of-cone (<source>
was absent in the working tree) to in-cone (<destination>
should be present in the working tree),<destination>
should be checked out from the index to the working tree. To acheive this result, we should- disable the SKIP_WORKTREE bit from the corresponding
cache_entry
- checkout this
cache_entry
to the working tree
- disable the SKIP_WORKTREE bit from the corresponding
Limitations
With the implementation above, we have addressed moving from out-of-cone to in-cone. How about the reverse action: moving from in-cone to out-of-cone?
The idea is basically the opposite of the solution mentioned above:
- enable the SKIP_WORKTREE bit of the corresponding
cache_entry
- remove this
cache_entry
’s file counterpart from the working tree
However, there are a few tricks here:
git-mv
does not work when the<destination>
is absent from the working tree. However, moving to out-of-cone indicates that the<destination>
is absent from the working tree. Therefore, how to tweakgit-mv
’s logic so we can add this feature becomes an interesting problem.- Moving to out-of-cone needs to remove the working tree file. But what if the working tree file is not up-to-date (i.e. it is modified but not added to index yet)? If we directly remove it, the information will lose. So how can we do it properly for both security and better UX?
For these tricks, I will try to answer them in the next week.