Git Rebase

How rebasing is handled in KEDEHub.

Rebase in general

Rebasing is intended to rewrite local repository history before it is combined with a remote repository. With the rebase command, you can take all the changes that were committed on one branch and replay them on a different branch. When you rebase, you're abandoning existing commits and creating new ones that are similar but different.

This operation works by going to the common ancestor of the two branches (the one you're on and the one you're rebasing onto), getting the diff introduced by each commit of the branch you're on, saving those diffs to temporary files, resetting the current branch to the same commit as the branch you are rebasing onto, and finally creating new commits by applying each change in turn. At the end, there is a new set of commits with different OIDs but the same code changes as the original commits[1].

For example, in order to update dev with changes from master we run git rebase master dev, where dev is out of sync with master. The result will create new commits with new OIDs with the same content as those already on dev with the commits from master inserted before them. Each commit OID in Git is based on a number of factors, one of which is the OID of the commit that comes before it. Since the parents of the existing dev commist are now different so their OID are also different, but all metadata about the commit contents (date, author, changes to files) will be retained[4].

Rebasing makes for a cleaner history. If you examine the log of a rebased branch, it looks like a linear history. It appears that all the work happened in series, even when it originally happened in parallel branches.

Here is an example how to asure ourselves that author name and date are preserved in the rebased commit:

                            
rm -rf 1
mkdir 1
cd 1
git init

echo 1 > 1
git add 1
git commit -m c1

git checkout -b dev
echo 2 > 2
git add 2
git commit -m c2
git log --all --reverse --date-order --format=fuller
----------
commit e6c484fd08ae5d2d8198d9a3a683f98c854d8073 (master)
Author:     user_1
AuthorDate: Thu May 19 16:55:40 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:55:40 2022 +0300

c1

commit fcbdfc74e9dc59c91f1158a3fd8ba6fc21df35dc (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 16:56:09 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:56:09 2022 +0300

c2
----------

sleep 1
git checkout master
echo 3 > 3
git add 3
git commit -m c3
git log --all --reverse --date-order --format=fuller
----------
commit e6c484fd08ae5d2d8198d9a3a683f98c854d8073
Author:     user_1
AuthorDate: Thu May 19 16:55:40 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:55:40 2022 +0300

c1

commit fcbdfc74e9dc59c91f1158a3fd8ba6fc21df35dc (dev)
Author:     user_1
AuthorDate: Thu May 19 16:56:09 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:56:09 2022 +0300

c2

commit ab61c3dc404cc1911b205b3269a02193d97120ee (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 17:04:20 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:04:20 2022 +0300

c3

----------
sleep 1
git checkout dev
git rebase master
git log --all --reverse --date-order --format=fuller
----------
commit e6c484fd08ae5d2d8198d9a3a683f98c854d8073
Author:     user_1
AuthorDate: Thu May 19 16:55:40 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 16:55:40 2022 +0300

c1

commit ab61c3dc404cc1911b205b3269a02193d97120ee (master)
Author:     user_1
AuthorDate: Thu May 19 17:04:20 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:04:20 2022 +0300

c3

commit a9ffb1c9f3c071a51f20e0d4f886c39097baa8c7 (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 16:56:09 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:05:26 2022 +0300

c2
----------

cd ..
rm -rf 1
                            
                        

We see that c2 has a new OID. That is because it was recreated a new. However, for both versions of c2 AuthorDate is the same.

Here is previous example ut this time we rebase dev onto master. The author name and date are preserved in the rebased commit:

                            
rm -rf 1
mkdir 1
cd 1
git init

echo 1 > 1
git add 1
git commit -m c1

git checkout -b dev
echo 2 > 2
git add 2
git commit -m c2
git log --all --reverse --date-order --format=fuller
----------
commit 96f58c8f3cac3f539aec7d7082118495a48c3684 (master)
Author:     user_1
AuthorDate: Thu May 19 17:29:11 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:29:11 2022 +0300

c1

commit 0be6fb0403e244e2b4805697ed94e198bc3233f6 (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 17:29:26 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:29:26 2022 +0300

c2
----------

sleep 1
git checkout master
echo 3 > 3
git add 3
git commit -m c3
git log --all --reverse --date-order --format=fuller
----------
commit 96f58c8f3cac3f539aec7d7082118495a48c3684
Author:     user_1
AuthorDate: Thu May 19 17:29:11 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:29:11 2022 +0300

c1

commit 0be6fb0403e244e2b4805697ed94e198bc3233f6 (dev)
Author:     user_1
AuthorDate: Thu May 19 17:29:26 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:29:26 2022 +0300

c2

commit 603a604dd3189266ea1dd9d3f076fda29edbd3fd (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 17:30:02 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:30:02 2022 +0300

c3
----------

sleep 1
--- staying on master, not switching to dev
git rebase master dev
git log --all --reverse --date-order --format=fuller
----------
commit 96f58c8f3cac3f539aec7d7082118495a48c3684
Author:     user_1
AuthorDate: Thu May 19 17:29:11 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:29:11 2022 +0300

c1

commit 603a604dd3189266ea1dd9d3f076fda29edbd3fd (master)
Author:     user_1
AuthorDate: Thu May 19 17:30:02 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:30:02 2022 +0300

c3

commit 2f977d82ea0e5fb23ddd77931f1d278a5e93ff7a (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 17:29:26 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:30:42 2022 +0300

c2
----------

cd ..
rm -rf 1
                            
                        

We see that c2 again has a new OID. That is because it was recreated a new. However, for both versions of c2 AuthorDate is the same.

Rebase vs. Merge

Rebasing replays changes from one line of work onto another in the order they were introduced, whereas merging takes the endpoints and merges them together.

Both merging and rebasing effectively unite the history of two or more branches. There are two camps on integrating changes from one branch into another: one preferring rebasing for a clean and organized history, and the other preferring merging for the most accurate representation of the project's development. The rebase operation provides a more clean and concise view of the history compared to issuing merge commits throughout the history. When a blending of public branches is required, using merging ensures the consistency in the state of the repository between all developers.

The Perils of Rebasing

Git provides the interactive mode rebase -i to help developers review commits in the to-do list before starting the rebase process. Working in this mode, developers can rewrite a repository's commit history by:

  • Flatten (or squash in Git terminology) several commits into one so that it looks like the changes were all done at once
  • Delete one of the commits
  • Add commits
  • Split one commit into two.
  • Reorder the commits
  • fixup commits - produce commits that fix a specific commit in history by appending a commit with message fixup![10]

Duplicated commits after rebase

Using rebase you can create duplicate commits[9]. If you push commits to a remote branch, then:

  1. others pull them and then base work on them,
  2. you rewrite those commits with git rebase and force push them
  3. your collaborators will have to re-merge their work
  4. things will get messy when you try to pull their work back into yours[1].

Here is an example of duplicated commits on one and the same branch:

All metadata about the two commits is the same, but their hashes and their points in the DAG are different.

Here is an example how to create duplicate commits:

                                
rm -rf remote_1
mkdir remote_1
cd remote_1
git init --bare local_remote.git
cd ..

rm -rf 1
git clone ./remote_1/local_remote.git 1
cd 1
git remote -v show origin

echo 1 > 1
git add 1
git commit -m c1
git push origin master

git checkout -b dev
git push --set-upstream origin dev
echo 2 > 2
git add 2
git commit -m c2
git push origin dev
git log --all --reverse --date-order --format=fuller
----------
commit 75c0910f5c06c64f506f5ccf2db257224c4972fb (origin/master, master)
Author:     user_1
AuthorDate: Fri May 20 09:20:01 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:01 2022 +0300

c1

commit a0f3999f38fc7651e4f4daed587b6f064cd6c903 (HEAD -> dev, origin/dev)
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:23 2022 +0300

c2
----------

sleep 1
git checkout master
echo 3 > 3
git add 3
git commit -m c3
git push origin master
git log --all --reverse --date-order --format=fuller
----------
commit 75c0910f5c06c64f506f5ccf2db257224c4972fb
Author:     user_1
AuthorDate: Fri May 20 09:20:01 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:01 2022 +0300

c1

commit a0f3999f38fc7651e4f4daed587b6f064cd6c903 (origin/dev, dev)
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:23 2022 +0300

c2

commit 2571cedbd6a37438e924e979cf4e70dd5cc4ede1 (HEAD -> master, origin/master)
Author:     user_1
AuthorDate: Fri May 20 09:20:59 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:59 2022 +0300

c3
----------

git checkout dev
git rebase master
git log --all --reverse --date-order --format=fuller
----------
commit 75c0910f5c06c64f506f5ccf2db257224c4972fb
Author:     user_1
AuthorDate: Fri May 20 09:20:01 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:01 2022 +0300

c1

commit a0f3999f38fc7651e4f4daed587b6f064cd6c903 (origin/dev)
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:23 2022 +0300

c2

commit 2571cedbd6a37438e924e979cf4e70dd5cc4ede1 (origin/master, master)
Author:     user_1
AuthorDate: Fri May 20 09:20:59 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:59 2022 +0300

c3

commit 1312acc265e38dcf590089e90d3bcc47466c0c17 (HEAD -> dev)
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:22:33 2022 +0300

c2
----------

-- And now we have duplicate commits locally. If we were to run git push we would send them up to the server.

git push origin dev
----------
To /Users/dimitarbakardzhiev/git/./remote_1/local_remote.git
! [rejected]        dev -> dev (non-fast-forward)
error: failed to push some refs to '/Users/dimitarbakardzhiev/git/./remote_1/local_remote.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
----------

git pull
git log --all --reverse --date-order --format=fuller
----------
commit 75c0910f5c06c64f506f5ccf2db257224c4972fb
Author:     user_1
AuthorDate: Fri May 20 09:20:01 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:01 2022 +0300

c1

commit a0f3999f38fc7651e4f4daed587b6f064cd6c903 (origin/dev)
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:23 2022 +0300

c2

commit 2571cedbd6a37438e924e979cf4e70dd5cc4ede1 (origin/master, master)
Author:     user_1
AuthorDate: Fri May 20 09:20:59 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:59 2022 +0300

c3

commit 1312acc265e38dcf590089e90d3bcc47466c0c17
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:22:33 2022 +0300

c2

commit b536e3b6a53cd078dfc3d3ffeffe20e23700c9c5 (HEAD -> dev)
Merge: 1312acc a0f3999
Author:     user_1
AuthorDate: Fri May 20 09:24:44 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:24:44 2022 +0300

Merge branch 'dev' of /Users/dimitarbakardzhiev/git/./remote_1/local_remote into dev                                        
----------



git push origin dev
git log --all --reverse --date-order --format=fuller
----------
commit 75c0910f5c06c64f506f5ccf2db257224c4972fb
Author:     user_1
AuthorDate: Fri May 20 09:20:01 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:01 2022 +0300

c1

commit a0f3999f38fc7651e4f4daed587b6f064cd6c903
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:23 2022 +0300

c2

commit 2571cedbd6a37438e924e979cf4e70dd5cc4ede1 (origin/master, master)
Author:     user_1
AuthorDate: Fri May 20 09:20:59 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:20:59 2022 +0300

c3

commit 1312acc265e38dcf590089e90d3bcc47466c0c17
Author:     user_1
AuthorDate: Fri May 20 09:20:23 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:22:33 2022 +0300

c2

commit b536e3b6a53cd078dfc3d3ffeffe20e23700c9c5 (HEAD -> dev, origin/dev)
Merge: 1312acc a0f3999
Author:     user_1
AuthorDate: Fri May 20 09:24:44 2022 +0300
Commit:     user_1
CommitDate: Fri May 20 09:24:44 2022 +0300

Merge branch 'dev' of /Users/dimitarbakardzhiev/git/./remote_1/local_remote into dev
----------

git reflog
----------
b536e3b (HEAD -> dev, origin/dev) HEAD@{0}: pull: Merge made by the 'recursive' strategy.
1312acc HEAD@{1}: rebase (finish): returning to refs/heads/dev
1312acc HEAD@{2}: rebase (pick): c2
2571ced (origin/master, master) HEAD@{3}: rebase (start): checkout master
a0f3999 HEAD@{4}: checkout: moving from master to dev
2571ced (origin/master, master) HEAD@{5}: commit: c3
75c0910 HEAD@{6}: checkout: moving from dev to master
a0f3999 HEAD@{7}: commit: c2
75c0910 HEAD@{8}: checkout: moving from master to dev
75c0910 HEAD@{9}: commit (initial): c1

----------


cd ..
rm -rf 1
rm -rf remote_1
                                
                            

We see that c2 is duplicated including for both versions AuthorDate is the same.

To avoid getting to this stage, if working alonewe could have run git push --force (at the point where we instead ran git pull). This would have sent our commits with the new hashes to the server without issue.

If working in a team we probably should not be using git rebase in the first place. To update dev with changes from master, we should, instead of running git rebase master dev, run git merge master whilst on dev.

If you somehow didn't notice that you have duplicated commits and continue working atop of them, you've really made a mess for yourself and your team. The size of the mess is disproportionate to the number of commits you have atop of the duplicates.

That is why git rebase should not be used on any commit history that has already been made available to other members of a development team[1]. Doing so will alter the cryptographic hash values of repository and cause issues with the consistency of the commit objects for all the local repositories that represent the project before the rebase took place (i.e., the repositories of all other members of the development staff who previously cloned the repository).

Rebase local changes before pushing to clean up your work, but never rebase anything that you've pushed somewhere[1].

How is rebasing handled in KEDEHub?

Before we can answer this, let's talk about what history means.

One perspective is that a repository's commit history is a record of what actually happened. It's like a historical document and shouldn't be tampered with. From this angle changing existing commits via rebasing is like lying about what actually transpired.

The opposing perspective is that the commit history is the story of how your project was made. That would mean we are not interested in the drafts, but only in the final source code. People in this camp would use tools like git rebase and filter-branch, to tell a coherent story of how they get from A to B. That is called "retrospective coherence" - the history in a git repository (especially a stable official project repository) may not reflect what actually happened to arrive at that particular state. Use of rebasing is one reason to mine satellite repositories, those developers use to write new features, as these repositories may contain the complete, unmodified development history[4].

When analyzing repositories we are interested in the real messy history of what happened. However, every team and every project is different. On top of that people may try to intentionally change history in order to present themselves in a better light.

KEDEHub can handle both merging and rebasing. Of course, with the former we have a much better detailed view on what actually happened. The latter provides us with enough data to have an aggregated view on the history. Since most of the time we are interested in the aggregates we should be fine.

KEDEHub utilizes the unchanged author and date to find the rebased copies of the original commit. Those copies are not included in KEDE calculations.

Below is an example about how rebase looks like in commits history[5].

The figure depicts the remote master branch and the developer's local clone, which is indicated by the review timeline. At the time the review started, the codebase in the master branch was represented by commit 79432. Hence, this is the first revision's parent. Right after the first revision (commit d3b8b) was submitted, a reviewer observed that the proposed source code change was prone to a NullPointerException. Next, the developer checked whether the codebase had changed in the master branch. Since this was not the case, the developer fixed the defect in the local clone and submitted the second revision, as represented by commit 9c046. The new commit 9c046 was created by amending the previous commit d3b8b[10]. Thus, only the new commit 9c046 remained as part of the repsitory history. The first commit d3b8b is effectively lost. The master branch remained the same while the first and second revisions occurred. Hence, the developer did not have to update the local clone to apply the changes. A rebasing operation was not necessary, and any difference in the source code between the first and second revisions are due to changes performed by the developer. However, between the second and third revisions, 5 commits were integrated into the remote master branch. These commits influenced commit 9c046 because they modified the remote master branch. These are considered external commits to commit 9c046. In this case, the developer needed to rebase the master branch into the local clone to obtain the up to date version of the code from which the third revision could be based upon. Hence, despite being sequentially related in the development, the second and third revisions were based on different versions of the codebase and do not share the same parent commit.

For this particular code review, in the first two revisions, the author made modifications to parts of a large class named CheckoutDialog. However, in between the second and third revisions, the 5 external commits that have been integrated into the codebase moved large pieces of code from CheckoutDialog to the BranchSelectionAndEditDialog class. Hence, when the rebasing was performed, all the work performed in commit 9c046 was lost. The developer noticed that and added the lost code into the new commit 4fc0e. After the PR was accepted by the reviewer commit 4fc0e was merged into the master branch.

This rebasing operation caused rework for the developer who had to locate the pieces of code in the new class to adapt the original source code changes to the external changes in the codebase that were incorporated by the external commits.

With KEDEHub we don't have any issues calculating KEDE for the above example. KEDEHub counts symbols added and deleted for all commits which are flagged with star and uses that to calculate KEDE. Here is the explanation:

  • Commit 79432 is included.
  • Commit d3b8b is not included because the amend made it lost since it was not pushed to the remote.
  • Commit 9c046 is not included because it was lost by the rebase since it was not pushed to the remote.
  • The 5 commits from master are included. The duplicated 5 commits from the rebased review are excluded.
  • Commit 4fc0e is included because it was pushed to remote.
  • The merge. is the same commit 4fc0e.

How to lie with rebasing in KEDEHub?

There might be people who would like to game KEDEHub in order to get higher KE$DE. There are several ways to create false, fraudulent commits using rebase.

For example:

                            
rm -rf 1
mkdir 1
cd 1
git init

echo 1 > 1
git add 1
git commit -m c1
git log --all --reverse --date-order --format=fuller
----------
commit 478aef215a0acb1278ca05256b04898b9e75832f (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 15:30:54 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:30:54 2022 +0300

c1
------------

git rebase HEAD --exec "git commit --amend --no-edit --date 'now'"
git log --all --reverse --date-order --format=fuller
----------
commit 7964d6eaf734f5092b6553d84b5bd3fd359d84a1 (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 15:32:16 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 15:32:16 2022 +0300

c1
----------

cd ..
rm -rf 1
                            
                        

We see that AuthorDate is changed.

Here is another example how to changet author date of the rebased commit[11]:

                            
rm -rf 1
mkdir 1
cd 1
git init

echo 1 > 1
git add 1
git commit -m c1

git checkout -b dev
echo 2 > 2
git add 2
git commit -m c2
git log --all --reverse --date-order --format=fuller
----------
commit ffa780247b9d4dbb21516f927b2d3e44ee34ccff (master)
Author:     user_1
AuthorDate: Thu May 19 17:57:14 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:57:14 2022 +0300

c1

commit 493e2b0a3522f71a91644092acfc4b7175396876 (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 17:57:31 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:57:31 2022 +0300

c2
----------

sleep 1
git checkout master
echo 3 > 3
git add 3
git commit -m c3
git log --all --reverse --date-order --format=fuller
----------
commit ffa780247b9d4dbb21516f927b2d3e44ee34ccff
Author:     user_1
AuthorDate: Thu May 19 17:57:14 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:57:14 2022 +0300

c1

commit 493e2b0a3522f71a91644092acfc4b7175396876 (dev)
Author:     user_1
AuthorDate: Thu May 19 17:57:31 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:57:31 2022 +0300

c2

commit 9d2920296a207af961fb8d6455b8f35568b833ea (HEAD -> master)
Author:     user_1
AuthorDate: Thu May 19 17:58:08 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:58:08 2022 +0300

c3
----------

sleep 1
git checkout dev
git rebase master --reset-author-date
git log --all --reverse --date-order --format=fuller
----------
commit ffa780247b9d4dbb21516f927b2d3e44ee34ccff
Author:     user_1
AuthorDate: Thu May 19 17:57:14 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:57:14 2022 +0300

c1

commit 9d2920296a207af961fb8d6455b8f35568b833ea (master)
Author:     user_1
AuthorDate: Thu May 19 17:58:08 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:58:08 2022 +0300

c3

commit 42a7d4ac7ea9a7aa49abba8f825cee5aa26e83fd (HEAD -> dev)
Author:     user_1
AuthorDate: Thu May 19 17:58:40 2022 +0300
Commit:     user_1
CommitDate: Thu May 19 17:58:40 2022 +0300

c2
----------

cd ..
rm -rf 1
                            
                        

We see that c2 has a new OID. That is because it was recreated a new. However, see that AuthorDate of commit c2 is changed.

References

1. Git Branching - Rebasing

2. git-rebase - Reapply commits on top of another base tip

3. T. Ji, L. Chen, X. Yi and X. Mao, "Understanding Merge Conflicts and Resolutions in Git Rebases," 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020, pp. 70-80, doi: 10.1109/ISSRE5003.2020.00016.

4. C. Bird, P. C. Rigby, E. T. Barr, D. J. Hamilton, D. M. German and P. Devanbu, "The promises and perils of mining git," 2009 6th IEEE International Working Conference on Mining Software Repositories, 2009, pp. 1-10, doi: 10.1109/MSR.2009.5069475.

5. M. Paixao and P. H. Maia, "Rebasing in Code Review Considered Harmful: A Large-Scale Empirical Investigation," 2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2019, pp. 45-55, doi: 10.1109/SCAM.2019.00014.

6. S. W. Flint, J. Chauhan and R. Dyer, "Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data," 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021, pp. 85-96, doi: 10.1109/MSR52588.2021.00022.

7. Vladimir Kovalenko, Fabio Palomba, and Alberto Bacchelli. 2018. Mining file histories: should we consider branches? Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, 202–213. https://doi.org/10.1145/3238147.3238169

9. Git commits are duplicated in the same branch after doing a rebase

10. fixup=[(amend|reword):] commit

11. --reset-author-date

Getting started