Merge Identities

Using KEDEMatcher

Overview

Identity matching in the context of Git involves the process of accurately identifying and distinguishing developers based on the various email addresses and names they use when committing work. Developers may use a range of email types, such as corporate, personal, or even anonymous addresses like "users.noreply.github.com." Similarly, the names they commit under can vary significantly, including full names with or without surnames, names with typographical errors, pseudonyms, or sometimes even missing names. The challenge of identity matching lies in aggregating these diverse identities for each individual developer and differentiating them from the identities of other developers. This process is crucial for obtaining precise information about a developer's contributions and activities in Git repositories.

KEDEMatcher addresses this problem by performing semi-automatic identity recognition The client application KEDEMatcher is an open-source project, which can be found here. KEDEMatcher must be installed on a computer with Python 3.11 and network access to the KEDEHub server. Mac OS M1 architecture is not supported, It is actually an issue with installing This package apjv.

Provision a new company

Install virtualenv

                            
pip install virtualenv
                            
                        

Install KEDEMatcher in virtual environment

                            
git clone https://github.com/kedehub/kedematcher.git kedematcher

cd kedematcher/

python3 -m virtualenv env

source ~/kedegit/env/bin/activate

pip install pip --upgrade

pip install -r requirements.txt

pip install python-Levenshtein

pip install numpy --upgrade

deactivate
                            
                        

Configuration directory

KEDEMatcher uses the same configuration as KEDEGit. Thus, if not already set up, go and setup KEDEGit as explained here.

Reference

Merge identities for a single project

The identity-merge command will:

  1. Determine all authors who belong to the same individual
  2. Create a new KEDEHub user for that individual

To merge identities on a single project, use the below and make sure to use the PROJECT_ID, not project name:

                            
python3 -m kedehub identity-merge -p PROJECT_ID
                            
                        

The following is the output from the identity-merge command:

                            
(env) python3 -m kedehub identity-merge -p math
Confing dir: /home/ec2-user/.config/KedeGit
First pass...
Matching by: EmailMatcher:  75%|███████████████████████████████████████████████████████████████████████████▊                         | 3/4 [00:00 00:00, 20763.88it/s]
Matching by: EmailNameMatcher:  67%|██████████████████████████████████████████████████████████████████                                 | 2/3 [00:00 00:00, 348.09it/s]
Saving users: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01 00:00,  2.31it/s]
Successfully merged 3 into 3 users with 0 authors for 1 projects. Created 1 new users. 
Second pass...
Matching by: EmailNameMatcher:  67%|█████████████████████████████████████████████████████████████████▎                                | 2/3 [00:00 00:00, 3580.29it/s]
Saving users: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00 00:00, 69.72it/s]
Successfully merged 3 into 3 users with 0 authors for 1 projects. Created 0 new users. 
                            
                        

Merge identities for a company

To merge identities on all projects for a company, use:

                            
python3 -m kedehub identity-merge
                            
                        

Getting started