GSoC Week 7-8

5 minute read


In the past two weeks, besides keeping the work on git-mv, I was also working on the sparse-index integration with git-rm. I shipped one iteration of git-mv series and two iterations of git-rm series. At the time of writing, git-mv is v2 and git-rm is also v2.

Latest work on git-mv. Latest work on git-rm.


Please reference the URL in the “Summary” section for the latest work on git-mv.

The work on git-mv is in its second phase, “from-in-to-out”, as the first phase “from-out-to-in” was merged into master branch in week 4.

“from-in-to-out” series was introduced in the week 5-6 blog, so I am not reiterating its basic idea. This discussion is mainly about what is the difference between the v2 and v1 of the series.

In v2, there is not a major functionality rewrite, but mostly style-nits and commit message improvement. I’m going to copy-paste part of the cover-letter (this is where we write a summary of the changes introduced, and possibly other worthy information in a patch series) of my v2 as a demonstration of the changes since v1. I will write below each numbered item to explain the change.

  1. Change “grep -x” to ^ and $ in t7002 test, and remove some useless “-x” (lines that are not ambiguous).

    I was using grep in the v1 patch to test if a path appears in an output. However, during the testing phase, I noticed that, when I want to verify “file1” is not in the output, grep returns true if the output contains something like “file123”, even though “file1” does not exist. To address this problem, I give an -x option to grep. -x option of grep is to match the whole line, which means the pattern “file1” only matches when a line is exactly “file1”, not “file123” or “file1xyz”. The intent here is good, but reviewers think that it is easier to read if I instead use ^ and $ to enclose my pattern, mainly because these specifiers have the exact same meaning, and they are more commonly known than -x option.

  2. Rename check_dir_in_index() to empty_dir_has_sparse_contents() and add more precise documentation.

    check_dir_in_index() was introduced in the “from-out-to-in” series of git-mv. The method’s name and documentation are not properly describing its purpose and prerequisite. This patch is trying to rephrase so the method is better explained.

  3. Reverse return values from empty_dir_has_sparse_contents() to comply with the method’s name.

    Same as point 2. Reverse the method’s binary (1 and 0) return value to better describe its behavior.

  4. Make some commit messages more natural/fluent/without typos.

    As the title suggests, nothing big here.

  5. Split commit “mv: from in-cone to out-of-cone” to two commits, one does in-to-out functionality, the other one does advice.

    There is an advise (how Git advise the user to do something, a method that prints information) added to aid the functionality added in this patch. Reviewer find it clearer to split the patch into two patches: one for the functionality, the other for the advise.

  6. Take care a few memory leaks (xstrdups).

    I was using the returned pointer from xstrdup (this is a wrapper in Git’s codebase that wraps around standard library strdup, the wrapper prints an out-of-memory message when the memory is depleted) in a statement. However, without properly assigning the pointer to a variable, that chunk of allocated memory is effectively out-of-reach, causing a memory leak. I fixed it by assigning the returned pointer to a variable, and free the variable after done using it.

  7. Add a new way to recursively cleanup empty WORKING_DIRECTORY.

    During a in-to-in (normal move without sparse concept) move, when <source> argument of git-mv is a directory, git-mv performs the move by 1) moving the whole directory in the worktree, 2) move the directory’s children entries in the index individually. However, during an in-to-out move, due to the special logic, we don’t do the first step above, rather we just check each entry and perform the move. This change can lead to the <source> directory remaining on-disk even when it is empty. This patch is adding a way to cleanup such directory.


Please reference the URL in the “Summary” section for the latest work on git-rm.

The work on git-rm is turning on sparse-index feature within the command. And add tests to to test the command’s compatibility with sparse-index. This sparse-index integration process is relatively standardized as I mentioned in the last blog. Instead of laying out every steps, I prefer to talk about what stands out to me during the process.

  1. Working with pathspec in sparse-index enabled repo

    Git uses pathspec as an important feature in many builtin commands. git-rm is one of these commands. It is convenient to use this feature most of the time, but it brings some troubles when a command, like git-rm itself, wants to work in a sparse-index enabled repo.

    One of the most important things during integrating with sparse-index, is we need to decide when should Git expand the sparse-index. For git-rm, when the pathspec matches something outside of the sparse-checkout definition, we know that Git needs to expand the index, because the user may want to remove things out there.

    The very process of deciding if pathspec needs an expanded index is the trouble here, as pathspec works differently from a literal path.

    For example, when “deep” is set as the sparse-checkout definition, a literal path “folder1/file.c” is clearly not in the definition. But how about a pathspec like “*/file.c”? This pathspec could possibly match both “deep/file.c” and “folder1/file.c”, so we decide this pathspec may need to expand the index.

    I’m lucky that there is a pre-existing method that decides whether a pathspec needs to expand the index. All I did here is simply extract the method to be a public one so I can use it to fullfil the need.

  2. Bug caused by shell expansion

    When I was adding tests to t1092, the pathspecs I supplied to git-rm are not quoted. The bare argument without quotes makes the shell want to expand them when there are wildcards in them, and it lead to unexpected behavior. My mentor helped me with this subtle and tricky bug. The solutions is to put quotes around the argument.

What’s next

I’m looking into git-grep as next command for sparse-index integration. By working on multiple commands at the same time, I can maximize the speed of work as it’s kind of parallelism ;)

I’ll read the reviews from the mailing list and make modifications to my sent series if necessary.

I’m also trying to review some of the patches on the mailing list.