Git Squash

How squashing is handled in KEDEHub

Squash in General

Git squash is a technique used to combine multiple commits into a single one. This is useful when you want to simplify your commit history, particularly before merging a feature branch into a main branch or when cleaning up a series of small, incremental commits into a more coherent commit.

Here’s a breakdown of how git squash works and when to use it:

When to Use Git Squash:

  1. Clean Up History: After working on a feature branch where you've made multiple small commits (e.g., fixing typos, debugging, or minor adjustments), squashing helps to present a cleaner, more logical commit history.
  2. Before Merging: Many teams use squash when merging feature branches to keep the main branch history clean and to avoid cluttering it with unnecessary or intermediary commits.
  3. Creating a Single Commit: If the feature or defect fix can be represented as one logical change, squashing lets you condense it into a single meaningful commit.

How to Use Git Squash:

  1. Interactive Rebase (git rebase -i)

    This is the most commonly used method for squashing. It allows you to choose exactly which commits to squash and in what order.

    • Run: git rebase -i HEAD~N
      Here, N is the number of recent commits you want to interactively rebase. This command opens a text editor showing your most recent commits.
    • In the editor, you'll see a list of commits like this:
                                                      
      pick a1b2c3 First commit
      pick d4e5f6 Second commit
      pick g7h8i9 Third commit
                                                      
                                                  
    • To squash the second and third commits into the first one, change the lines to:
                                                      
      pick a1b2c3 First commit
      squash d4e5f6 Second commit
      squash g7h8i9 Third commit
                                                      
                                                  
    • Save and close the editor. Git will squash the commits and allow you to edit the commit message for the squashed commit.

  2. Squashing During a Merge (git merge --squash)

    Another way to squash is during a merge process. This is typically used when merging a feature branch into the main branch.

    • Run: git checkout main (or the target branch you are merging into).
    • Run: git merge --squash feature-branch
      This merges all the changes from the feature-branch into the main branch but does not create a merge commit yet.
    • Then, to finalize: git commit
      This will create a single commit for all changes, and you can write a new commit message summarizing the changes from the feature branch.

Benefits of Git Squash:

  • Cleaner Commit History: Keeps the history of the main branch more concise and easier to read.
  • Single Logical Changes: A series of small, incremental commits that don't have much independent meaning can be combined into a single, more meaningful change.
  • Simplified Debugging: When reviewing the history, it's easier to understand when commits represent significant changes rather than every minor step along the way.

The Perils of Squashing

While squashing commits in Git can help streamline and simplify your commit history, there are several perils or potential drawbacks that teams and developers need to consider. These downsides are particularly relevant when working on large, collaborative projects or when commit history is essential for future reference and analysis.

  1. Loss of Granular History

    • Peril: Squashing combines multiple commits into one, which removes the detailed step-by-step history of how a feature or defect fix evolved. This granular history includes individual commits, their messages, and timestamps, all of which can provide valuable context for future debugging or code reviews.
    • Why It’s a Problem: If a defect is introduced at an intermediate step, or if a small but important change is made during development, squashing can make it harder to trace when exactly the change was introduced. The individual evolution of the code becomes lost, leaving only a summary of the final state.
    • Example: Imagine a developer made multiple commits to debug a problem and then squashed those commits into one. Later, when trying to track down a related issue, the detailed commit history would no longer be available to provide insights into what changed at each step.

  2. Loss of Author Information

    • Peril: When multiple developers contribute to different commits that get squashed, Git records only one author (typically the author of the first commit or the person performing the squash). This can obscure who contributed to specific changes within the squashed commit.
    • Why It’s a Problem: In large teams or organizations, it's important to track contributions accurately for accountability, recognition, and review purposes. Squashing can make it seem like only one person was responsible for all the changes, even though multiple developers worked on the feature.
    • Example: If three developers worked on a feature branch and made separate commits, squashing would collapse their work into a single commit that reflects only one developer’s contribution.

  3. Reduced Debugging and Blame Accuracy

    • Peril: Git’s blame functionality, which is used to track who last modified each line of code, becomes less accurate when multiple commits are squashed. Instead of pointing to the original author and commit where the change occurred, Git blame points to the squashed commit, which doesn’t reflect the true history of line-by-line changes.
    • Why It’s a Problem: When defects arise, developers often use Git blame to understand who wrote a particular line of code and why. Squashing obscures this information, making it harder to investigate the root cause of issues or seek clarification from the right person.
    • Example: After squashing a series of commits, a defect is found in a part of the code. Git blame will attribute the entire block of changes to the squashed commit, making it difficult to pinpoint the exact developer responsible for the defect.

  4. Impact on Code Review

    • Peril: Squashing can hide the development process, including incremental changes, experiments, or defect fixes, that might have been important during code review.
    • Why It’s a Problem: Reviewing small, incremental commits is often easier because each commit reflects a small, logical step. When commits are squashed, reviewers only see the final, aggregated change, which can make it harder to understand the developer’s thought process or to spot specific mistakes introduced at different stages.
    • Example: A developer might have iterated on a feature with multiple small commits to test and debug different parts. Squashing those commits into one can make the review process less transparent, as the intermediate decisions and fixes are hidden.

  5. Difficulty in Collaboration

    • Peril: If multiple people are working on the same branch or topic, squashing can cause merge conflicts and confusion when trying to integrate their work.
    • Why It’s a Problem: In collaborative environments, developers often branch off from one another’s work. If someone squashes a set of commits, it rewrites the commit history, which can cause conflicts or confusion for others who were based on that earlier history. It may lead to issues when syncing branches or resolving conflicts.
    • Example: Developer A is working on a feature branch and squashes commits. Developer B, who is working on a different part of the same branch, might experience conflicts when trying to merge their work, because the commit history has changed.

  6. Loss of Historical Context

    • Peril: Squashing can obscure the decision-making process that led to certain code changes, making it harder to understand the context or reasoning behind them in the future.
    • Why It’s a Problem: Developers often leave detailed commit messages that describe the reasoning for certain changes. Squashing eliminates those intermediate commit messages, which can contain useful context about why a particular approach was taken or why a defect was fixed in a certain way.
    • Example: A feature might have evolved over several days of development, with each commit message documenting why certain decisions were made. After squashing, only the final commit message remains, removing the context from the earlier commits.

  7. Loss of Commit-Level Information for Auditing

    • Peril: For some projects, especially those in regulated industries or projects that require a high degree of traceability, squashing removes the granular history needed for auditing purposes.
    • Why It’s a Problem: In industries like aerospace, healthcare, or government, where strict auditing and traceability are required, each change needs to be documented in detail. Squashing removes commit-level granularity, which can make it difficult to meet compliance or auditing requirements.
    • Example: If a defect is discovered in production, auditors might require information on each incremental change leading to the final feature. Squashing eliminates this information, making it harder to provide detailed documentation of what happened and when.

  8. Potential for Merging Conflicts

    • Peril: Squashing involves rewriting history, which can lead to conflicts when other branches or collaborators have based their work on the unsquashed commits.
    • Why It’s a Problem: Rewriting the history of a branch by squashing commits can cause conflicts for other developers working on related branches or codebases, especially if they are working on shared code. It can make integration or collaboration more challenging.
    • Example: After squashing commits on a feature branch, another developer’s branch might experience conflicts during merging because the commit history has changed.

Squashing is useful for keeping a clean, concise commit history, especially when merging feature branches into a main branch. However, it should be used with caution, particularly in collaborative projects or those requiring detailed historical information. It's best to squash commits when:

    The commits represent small, incremental changes that don’t need to be preserved individually.
  • The team agrees that the lost history is not critical.
  • Commit squashing occurs at the end of the feature branch’s lifecycle, before merging into the main branch, to avoid disrupting ongoing work.

Teams should strike a balance between keeping a clean history and retaining valuable commit information, especially in cases where future debugging, traceability, or detailed performance metrics are essential.

How is squashing handled in KEDEHub?

Since there are multiple ways to squash commits, the way data appears in KEDEHub can vary depending on your practices and the timing of your commits.

Understand how your organization handles squashing, and use that context to determine which metrics will be most relevant for you.

Local Squashing: If you squash locally before pushing, KEDEHub will only analyze the squashed commits.

Remote Squashing: If you push commits before squashing, KEDEHub’s behavior depends on the order and timing of the analysis:

  • If KEDEHub analyzes commits before squashing, and the branch is not deleted, KEDEHub will analyze the pre-squashed commits initially and later remove them once the squashed commit is analyzed.
  • If the branch is squashed and deleted without merging, KEDEHub will continue to analyze the pre-squashed commits but won't show metrics for the squashed commit.
  • If the branch is squashed and merged into another branch before deletion, KEDEHub will analyze only the squashed commit from the merged branch.
  • If KEDEHub analizes commits before squashing, and the branch is merged and analyzed again before deletion, KEDEHub will initially analyze the pre-squashed commits and remove them once the squashed commit is analyzed.
  • If KEDEHub does not analyze the branch before squashing and deletion, it will only show the squashed commit and never analyze the pre-squashed commits.

In summary, the timing of squashing and merging affects how KEDEHub displays your data. As a best practice, aim to squash and merge commits within a short timeframe, avoiding processes that span multiple days. Frequent, same-day squashing and merging of smaller commits works best.

References

1. Git Squashing Commits

2. git-reset - Squashing

Getting started