Git Cherry-pick

How Cherry-picking is handled in KEDEHub.

Cherry-pick in general

For each of one or more existing commits, the git cherry-pick <oid> command creates a new commit with an identical diff to <oid> whose parent is the current commit[2].

Git is following these steps:

  1. Compute the diff between the commit <oid> and its parent.
  2. Apply that diff to the current HEAD.
  3. Create a new commit whose root tree matches the new working directory and whose parent is the commit at HEAD.
  4. Move the ref at HEAD to that new commit.

It is important to recognize that cherry-pick didn't “move” the commit to be on top of our current HEAD. Instead cherry-pick created a new commit whose diff matches the existing commit. This way there are two copies of the same diff i.e. code contributed by the same author on the same date.

How is Cherry-picking handled in KEDEHub?

Cherry-picking keeps the author and created date of the original commit in the newly created commit. The cherry-picked commits have the additional info of who committed at the moment of cherry-picking. If we ever needed to answer the question "who committed this code the first time?", we would be able to retrieve that by tracking the source of the cherry-pick and reading this unchanged data.

Here is an example how to asure ourselves that author name and date are preserved in the cherry-picked commit:

                            
rm -rf 1
mkdir 1
cd 1
git init

echo 1 > 1
git add 1
git commit -m c1

git checkout -b dev
echo 2 > 2
git add 2
git commit -m c2
git log --all --reverse --date-order --format=fuller
----------
commit fada57f3fbb8ff6addb9f4c1222d82a2427053ad (master)
Author:     user_1
AuthorDate: Thu May 19 15:38:37 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:38:37 2022 +0300

c1

commit a4ec7c5ae39f5254418c14966dd1450f13fc1ea6 (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 15:38:59 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:38:59 2022 +0300

c2
----------

c2=`git rev-parse HEAD`
sleep 1
git checkout master
git cherry-pick "$c2"
git log --all --reverse --date-order --format=fuller
----------
commit fada57f3fbb8ff6addb9f4c1222d82a2427053ad
Author:     user_1
AuthorDate: Thu May 19 15:38:37 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:38:37 2022 +0300

c1

commit a4ec7c5ae39f5254418c14966dd1450f13fc1ea6 (dev)
Author:     user_1
AuthorDate: Thu May 19 15:38:59 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:38:59 2022 +0300

c2

commit 34c3e82f8ad935da538a4e3bb98790cb35a3a4ef (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 15:38:59 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:40:51 2022 +0300

c2-
----------

cd ..
rm -rf 1
                        
                        

We can see that c2 is duplicated in two commits. However, for both duplicates AuthorDate is the same.

KEDEHub utilizes the unchanged author name and author date to find the cherry-picked copies of the original commit. Those copies are not included in KEDE calculations.

How to lie with Cherry-picking?

There might be people who would like to game KEDEHub in order to get higher KE$DE. There are several ways to create false, fraudulent commits using cherry-pick.

If you supply --reset-author as a command line flag, git commit will reset the author to what is configured, or to whomever you name. This also renews the author timestamp[3]. You can also specify an author-date at this point in the same way. The git cherry-pick command won't pass --reset-author when runs git commit, but here is what things you can do:

  • If you run git cherry-pick -n instead of git cherry-pick, then the cherry-pick command won't run git commit. You'll have to run it yourself. You can run it with --reset-author, and hence adjust the author and date.

Here is an example how to chance author date with cherry-picking:

                            
rm -rf 1
mkdir 1
cd 1
git init

echo 1 > 1
git add 1
git commit -m c1

git checkout -b dev
echo 2 > 2
git add 2
git commit -m c2
git log --all --reverse --date-order --format=fuller
----------
commit 677233c000425b3bd0348f19b056ce014de6e2ea (master)
Author:     user_1
AuthorDate: Thu May 19 16:08:42 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:08:42 2022 +0300

c1

commit 7c26cb9be9bef362de0805c8620e002be269085f (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 16:08:57 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:08:57 2022 +0300

c2
----------

c2=`git rev-parse HEAD`
sleep 1
git checkout master
git cherry-pick "$c2" -n
git commit --amend --no-edit --date 'now'
git log --all --reverse --date-order --format=fuller
----------
commit 677233c000425b3bd0348f19b056ce014de6e2ea
Author:     user_1
AuthorDate: Thu May 19 16:08:42 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:08:42 2022 +0300

c1

commit 7c26cb9be9bef362de0805c8620e002be269085f (dev)
Author:     user_1
AuthorDate: Thu May 19 16:08:57 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:08:57 2022 +0300

c2

commit c1c67487ff10dbd4350f3c5c4ac1f2145ad02bdc (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 16:09:56 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:09:56 2022 +0300

c1
----------

cd ..
rm -rf 1
                        
                        

We see that commit c1 is created a new but with a different author date. In this way code changes will be duplicated and if used in KEDE calculations increase individual performance.

References

1. Git Basics - Undoing Things

2. git-cherry-pick - Apply the changes introduced by some existing commits

3. --reset-author

Getting started