Analyzing a New Project: A Step-by-step Guide

Utilizing KEDEGit and KEDEMatcher to Compute KEDE and Related Statistics

New Project Initialization

In this article, we highlight the most efficient and straightforward method to initialize a new project in KEDEHub. For this process, we only use a selection of the available command-line commands. A comprehensive list of all command-line commands can be found here.

To initiate a project, adhere to the following instructions:

                        

    git clone repo_1 ~/git/repo_1

    cd ~/kedegit/

    source ~/kedegit/env/bin/activate

    screen -U -L python3 -m kedehub init-project NEW_PROJECT ~/git/repo_1

    python3 -m kedehub list-projects

                        
                    

The following is the output from the init-project command:

                            
Assigned project ID: cli to project: CLI
Repository https://github.com/okteto/okteto
Processing commits:   0%|                                                                                                    	      | 0/3577 [00:00$lt?, ?it/s]
The diff is too large. We only processed the first 200 changed out of 1552 files
Processing commits:   0%|▏                                                                                           		      | 6/3577 [00:03$lt19:10,  3.10it/s]
The diff is too large. We only processed the first 200 changed out of 231 files
Processing commits:   1%|▍                                                                                          		      | 19/3577 [00:06$lt09:02,  6.56it/s]
The diff is too large. We only processed the first 200 changed out of 1301 files
Processing commits:   1%|█▎                                                                                         		      | 51/3577 [00:11$lt08:17,  7.09it/s]
The diff is too large. We only processed the first 200 changed out of 1052 files
Processing commits:   4%|███▊                                                                                      		      | 150/3577 [00:23$lt10:42,  5.34it/s
Processing commits:   7%|██████▉                                                                                                      | 235/3577 [02:15$lt11:47,  4.72it/s]
The diff is too large. We only processed the first 200 changed out of 282 files
Processing commits:   7%|██████▉                                                                                                      | 239/3577 [04:20$lt10:43:53, 11.57s/it]
The diff is too large. We only processed the first 200 changed out of 463 files
Processing commits:  15%|███████████████▍                                                                                             | 522/3577 [04:53$lt04:50, 10.51it/s]
The diff is too large. We only processed the first 200 changed out of 2357 files
Processing commits:  54%|█████████████████████████████████████████████████████████                                                    | 1944/3577 [06:57$lt02:48,  9.70it/s]
The diff is too large. We only processed the first 200 changed out of 271 files
Processing commits:  84%|███████████████████████████████████████████████████████████████████████████████████████▊                     | 2990/3577 [08:20$lt00:48, 12.19it/s]
The diff is too large. We only processed the first 200 changed out of 393 files
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████    | 3577/3577 [09:08$lt00:00,  6.52it/s]
Updating templates for persons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:51$lt00:00,  1.63it/s]
Calculating Daily KEDE for persons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:15$lt00:00,  5.29it/s]
Calculating Weekly KEDE for persons: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:11$lt00:00,  7.12it/s]
Successfully initialized project with ID = cli
                            
                        

You might observe instances where the message only processed the first 200 changed appears. This indicates that a single commit contained more than 200 files.

After successfully initializing the new project, the system will display the new project ID as follows: Assigned project ID: NEW_PROJECT_ID to project: NEW_PROJECT. Make sure to note down the NEW_PROJECT_ID, as it will be essential for all subsequent work related to the newly created NEW_PROJECT.

To detach from an active session, press ctrl + d. he full list of screen Key Bindings can be found here. To reopen a closed screen session, use the command screen -r . To check he screen log use tail -n 10 screenlog.0

                        

    deactivate
                        
                    

Now you can proceed to merge the identities of project's contributors.

Merging Identities

The identity-merge command serves to:

  1. Identify all authors associated with a single individual.
  2. Generate a new KEDEHub user for that individual.

To execute identity merging for the new project, use the code below and don't forget to use the Project ID, not the project's name:

                        
cd ~/kedematcher/

source ~/kedematcher/env/bin/activate

python3 -m kedehub identity-merge -p NEW_PROJECT_ID

deactivate
                        
                    

Here is an example output from the identity-merge command:

First pass...
Matching by: EmailMatcher:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████▊ | 84/85 [00:00<00:00, 1801.11it/s]
Matching by: EmailNameMatcher:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▊ | 84/85 [00:00<00:00, 97.06it/s]
Saving users: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:25<00:00,  2.71it/s]
Successfully merged 84 into 68 users with 16 authors for 1 projects. Created 68 new users. 
Second pass...
Matching by: EmailNameMatcher:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▊ | 84/85 [00:00<00:00, 95.72it/s]

Do you want to merge this author: Javier López Barba j...@...m> 
    into this user: Javier López Barba 
    with email:j...@...m ?(yes/no)yes

Do you want to merge this author: Francisco Miranda f...@...m 
    into this user: Francisco Miranda 
    with email:f...@..m ?(yes/no)yes
Saving users: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66/66 [00:00<00:00, 987.09it/s]
Successfully merged 68 into 66 users with 2 authors for 1 projects. Created 0 new users. 
                            
                        

(Note: To safeguard privacy, email addresses have been obfuscated.)

Please pay attention to the following observations:

  1. Initially, 85 git authors were detected as contributors to the project repository.
  2. Post the initial pass, 84 of these were automatically consolidated into 66 KEDEHub users.
  3. One git author did not result in a user creation, as it was excluded by the applied filters.
  4. For two users, KEDEMatcher required human intervention to determine the most accurate consolidation of some KEDEHub users into one.

Excluding Templates

Once the git authors have been consolidated into users, the next step involves identifying potential template commits. Detailed instructions on how to perform this task are provided here.

Getting started