KEDEGit local client

How to use KEDEGit for calculating KEDE and other statistics

Overview

KEDEGit is a local Python application responsible for:

  1. Analyzing local Git repositories
  2. Sending commands to the KEDEHub SaaS

The client application KEDEGit is an open-source project, which can be found here. KEDEGit must be installed on a computer with Python 3.9 and network access to the folders where the target Git repositories are cloned. One organization can have multiple client applications installed on different computers, each analyzing a different set of Git repositories. For example, each department may have its own Git repositories and its own KEDEGit client application. However, all data collected by all KEDEGit client applications will be stored under the company name. Therefore, we recommend organizations maintain only one KEDEGit client application.

Tip: We recommend installing only one KEDEGit local client for all your repositories.

Furthermore, the KEDEGit client can connect to popular code-sharing platforms like GitHub, GitLab, and Bitbucket, and clone Git repositories from these platforms. Organizations can use this feature if they want to centralize all KEDEHub-related activities on a single computer. This computer can be maintained by operations or system administrators from another department, instead of developers and their line managers.

Analysis is performed on local clones of Git repositories, ensuring:

  • Your source code and commit messages remain secure on your premises, with no transfer to KEDEHub
  • No capture of your intellectual property through source code analysis
  • No analysis of commit messages

Provision a new company

Install virtualenv

                            
pip install virtualenv
                            
                        

Install KEDEGit in virtual environment

                            
git clone https://github.com/kedehub/kedegit.git kedegit

cd kedegit/

python3 -m virtualenv env

source ~/kedegit/env/bin/activate

pip install pip --upgrade

pip install -r requirements.txt

pip install python-Levenshtein

pip install numpy --upgrade

deactivate
                            
                        

Configure KEDEGit

Configuration directory

KEDEGit uses Confuse for managing its configuration. In our case the application name is KedeGit. The configuration paths for different platforms are listed here . Users can also add an override configuration directory with an environment variable. The environment variable name for KEDEGit is KEDEGITDIR . This guide shows how to use KEDEGit on Amazon EC2 For EC2 the configuration directory is:

                                
    /home/ec2-user/.config/KedeGit
                                
                            
You need to create it before proceeding further.

Allowed and excluded file types:

                                
    cp docs/kede-config.json /home/ec2-user/.config/KedeGit
                                
                            
Change the file if needed to match your architecture, technology and preferences.

Set configuration file

                                

    cp docs/empty_config.yaml /home/ec2-user/.config/KedeGit/config.yaml
                                
                            

Open config.yaml and add values for company name, user and token from your invitation email.

                            
    nano /home/ec2-user/.config/KedeGit/config.yaml
                            
                        
                            
    server:
    protocol: https
    host: api.kedehub.io
    port: 443

    company:
    name:
    user:
    token:
                            
                        

Test if everything is OK

                                
    source ~/kedegit/env/bin/activate

    python3 -m kedehub list-projects

    deactivate
                                
                            
That command should list all projects for a company. If the company has no projects yet nothing will be listed.

Reference

Initializing a New Project

Create a new project called PROJECT, consisting of a local Git repository located at ~/git/repo_1:

                            
    python3 -m kedehub init-project PROJECT ~/git/repo_1
                            
                        

PROJECT will be the name you see in KEDEHub web client as explained here.

To ignore commits from the repository prior to a certain date, use the --earliest-commit-date option:

After successfully initializing the new project, the system will display the new project ID as follows: Assigned project ID: PROJECT_ID to project: PROJECT. Make sure to note down the PROJECT_ID, as it will be essential for all subsequent work related to the newly created PROJECT.

                            
    python3 -m kedehub init-project PROJECT_ID ~/git/repo_1  --earliest-commit-date "2020-01-01"
                            
                        

The following is the output from the nit-project command:

                            
(venv39) $ python3 -m kedehub init-project kedegit ~/git/kedegit_public
Confing dir: /Users/user/.config/KedeGit
Adding repo: /Users/user/git/kedegit_public to project kedegit
KedeGit Init time 0:00:00.038585
Assigned project ID: kedegit to project: kedegit
Repository https://github.com/kedehub/kedegit.git
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:0100:00, 12.89it/s]
Updating templates for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:0000:00, 22.04it/s]
Calculating Daily KEDE for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:0000:00, 20.89it/s]
Calculating Weekly KEDE for persons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:0000:00, 24.78it/s]
Successfully initialized project with ID = kedegit 
                            
                        

Adding a New Repository to an Existing Project

Add a new local Git repository located at ~/git/repo_2 to PROJECT_ID

                            
    python3 -m kedehub add-repository PROJECT_ID ~/git/repo_2
                            
                        

To ignore commits from the repository prior to a certain date, use the --earliest-commit-date option:

                            
    python3 -m kedehub add-repository PROJECT_ID ~/git/repo_2  --earliest-commit-date "2020-01-01"
                            
                        

Importing Multiple Repositories

To add several new local Git repositories located in a specific folder, such as ~/git/repos, to a PROJECT, use the following command:

                            
    python3 -m kedehub bulk-import-repos --workdir ~/git/repo_2 -p PROJECT 
                            
                        

The project name is optional. If the -p option is not used, a new project will be initialized for each of the repositories in the specified folder.

The following is the output from the bulk-import-repos command:

                                
Importing repo 1 of 3
Importing repo https://github.com/okteto/context with project name Github Actions
Adding repo: /home/ec2-user/git/github_actions/context to project Github Actions
KedeGit Init time 0:00:04.613556
Assigned project ID: github_actions to project: Github Actions
Repository https://github.com/okteto/context
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 86/86 [00:06<00:00, 12.85it/s]
Updating templates for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  7.31it/s]
Calculating Daily KEDE for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.61it/s]
Calculating Weekly KEDE for persons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  7.39it/s]
Importing repo 2 of 3
Importing repo https://github.com/okteto/apply with project name Github Actions
Adding repo: /home/ec2-user/git/github_actions/apply to project Github Actions
KedeGit Init time 0:00:00.034135
Assigned project ID: github_actions to project: Github Actions
Repository https://github.com/okteto/apply
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:09<00:00,  8.53it/s]
Updating templates for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  7.42it/s]
Calculating Daily KEDE for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  4.99it/s]
Calculating Weekly KEDE for persons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  6.90it/s]
Importing repo 3 of 3
Importing repo https://github.com/okteto/destroy-stack with project name Github Actions
Adding repo: /home/ec2-user/git/github_actions/destroy-stack to project Github Actions
KedeGit Init time 0:00:00.031647
Assigned project ID: github_actions to project: Github Actions
Repository https://github.com/okteto/destroy-stack
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 86/86 [00:05<00:00, 14.54it/s]
Updating templates for persons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00,  7.75it/s]
Calculating Daily KEDE for persons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  4.30it/s]
Calculating Weekly KEDE for persons: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00,  5.70it/s]
Successfully imported 3 out of 3 repos
                                
                            

Updating Local Project Repositories

New code is constantly contributed to all company repositories. To update local repositories from remote sources (performing a git pull), use the following command for all company projects:

                            
    python3 -m kedehub update-repos
                            
                        

Updating Project Statistics

The update-projects command performs the following actions:

  • Analyzes all new commits
  • Detects templates and fraudulent code for the new commits
  • Calculates KEDE and other statistics for the new commits

Updating All Project Statistics

To update the statistics for all projects within a company with new code contributions, execute:

                                
    python3 -m kedehub update-projects
                                
                            

Updating Specific Project Statistics

To update the statistics for a single existing project, execute:

                                
    python3 -m kedehub update-projects -p PROJECT_ID   
                                
                            

To update the statistics for multiple existing projects, execute:

                                
    python3 -m kedehub update-projects -p PROJECT_ID_1 PROJECT_ID_2  
                                
                            

Cleaning Project Data

In some cases, you may want to delete all calculated statistics for an existing project and re-analyze all commits. Use the --clean option to achieve this:

                                
    python3 -m kedehub update-projects -p PROJECT_ID --clean
                                
                            

Cloning Repositories to a Temporary Folder

To conserve space on the computer where an existing project's repositories are cloned, use the --temp option. This option deletes the local cloned repositories for the selected project and then clones the repositories again. This ensures that all new commits are included in the local clones. Execute the following command:

                                
    python3 -m kedehub update-projects -p PROJECT_ID   --temp
                                
                            

The command performs the following actions:

  1. Deletes the local clones of all project repositories listed in your config.yaml file
  2. Clones all the deleted repositories to a temporary folder
  3. Analyzes all commits
  4. Detects templates and fraudulent code
  5. Calculates KEDE and other statistics for the project
  6. Deletes the temporary folder

Clone and analyze a single GitHub repository

To clone and analyze a single repository REPO from COMPANY_ORG, using KEDEGit, you can use the clone-import-github-repo command with the --url option. Here is an example command, with a GitHub access token GITHUB_ACCESS_TOKEN,

                            
    python3 -m kedehub clone-import-github-repo --workdir ~/git/COMPANY --url "https://github.com/COMPANY_ORG/REPO" --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN
                            
                        

The above command:

  1. Clones the COMPANY_ORG/REPO repository to the specified --workdir.
  2. Creates a new project for the REPO repository in the specified --workdir. he project name will be the repository name REPO. If a project with the same name exists, a sequential number will be added to the name (e.g.,REPO1).
  3. Analyzes all commits for the REPO repository in the specified --workdir.
  4. Detects templates and fraudulent code
  5. Calculates KEDE and other statistics for the new project.

Clone repository in a temporary folder

If you need to save space on the computer where the repository is cloned, you can use the --temp option. Here's an example command:

                            
    python3 -m kedehub clone-import-github-repo --workdir ~/git/COMPANY --url "https://github.com/COMPANY_ORG/REPO" --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN --temp
                            
                        

The above command:

  1. Clones the COMPANY_ORG/REPO repository to the specified --workdir in a temporary folder.
  2. Creates a new project for the REPO repository in the specified --workdir. he project name will be the repository name REPO. If a project with the same name exists, a sequential number will be added to the name (e.g.,REPO1).
  3. Analyzes all commits for the REPO repository in the specified --workdir.
  4. Deletes the temporary folder.
  5. Detects templates and fraudulent code
  6. Calculates KEDE and other statistics for the new project.

Clone and analyze GitHub account

To save time when using KEDEHub, you can clone and analyze all repositories from a GitHub account using the command "python3 -m kedehub clone-import-github". For example:

                            
    python3 -m kedehub clone-import-github --workdir ~/git/COMPANY --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN
                            
                        

This command will clone all of the COMPANY_ORG repositories into --workdir and create a new project for each repository. The project name will be the same as the repository name, unless a project with that name already exists. In that case, a sequential number will be added to the project name. After cloning, the command will analyze all of the commits for each repository and calculate KEDE and other statistics for the new projects.

If you want to put all of the repositories for the GitHub account under a new project named PROJECT. you can use the -p option:

                            
    python3 -m kedehub clone-import-github --workdir ~/git/COMPANY --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN -p PROJECT
                            
                        

Clone repositories in a temporary folder

To save space on the computer where the project repositories are cloned, then you can use the --temp option.

                            
    python3 -m kedehub clone-import-github --workdir ~/git/COMPANY --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN --temp
                            
                        

The --temp option will clone all of the repositories in --workdir into a temporary folder. It then creates a new project for each repository and analyzes all commits, calculating KEDE and other statistics for each new project. The temporary folder is then deleted.

Detecting Templates and Fraudulent Code

Source code templates are one or more source code files, or pieces of code copied from a source and added to a source code file. In software development, a template could be a ready-to-use open-source framework.

Fraud occurs when one or more source code files are added to a repository, or a piece of source code is added to a source code file with the sole purpose of manipulating KEDE and other statistics. In software development, fraud can be committed by adding dead code (unused classes and methods) to a source code file.

KEDEHub features a Templates & Fraud Detection Engine to identify commits made of source code templates or source code committed to manipulate statistics. If not filtered out, such commits can artificially inflate the statistics of a software developer, team, or project. The filtering is done automatically before KEDEHub calculates KEDE and the other statistics.

In some cases, you may want to re-apply the Templates & Fraud Detection Engine on specific software developer or project. For that you can use KEDEGit and the template command.

Tip: After re-filtering templates do not forget to run the stats command to re-calculates KEDE and the other statistics.

KEDEGit can operate in two modes concerning template commits: automatic and interactive.

Automatic Mode

In this mode, KEDEHub updates templates without user interaction. To update all templates for a project, use the -p option:

                                
    python -m kedehub templates update -p PROJECT_ID
                                
                            

To update templates for a specific author on a specific project, use the -p and -a options:

                                
    python -m kedehub templates update -p PROJECT_ID -a AUTHOR
                                
                            

To update templates for a specific author across all projects, use the -a option:

                                
    python -m kedehub templates update -a AUTHOR
                                
                            

Interactive Mode

In this mode, KEDEHub shows all suspected template commits and updates templates after asking the user for consent. Interactive mode is set with the find option. To find templates for a specific author across all projects, use the -a option:

                                
    python -m kedehub templates find -a AUTHOR
                                
                            

To change the reporting interval, use the -r option. The default is quarterly (q). Other values include w for weekly and d for daily:

                                
    python -m kedehub templates find -p PROJECT_ID  -r w
                                
                            

When the templates command enters interactive mode, it displays the outliers found and guides the user through its interactive command loop. The loop shows the list of outliers and prompts the user with "Type 'y' if this commit is a template".

                                
    ----------------------------------------------------------------
    Templates finding for: AUTHOR
    ----------------------------------------------------------------

    For date: 2020-01-28 there are total of 19,568 chars added
            Out of all daily commits pick commits that are template.
                    repository: REPO_1.git
                    commit time: 2020-01-28T13:11:58.000000000
                    hexsha: 51508442cbae9075709c1800eea0abc85e6b6f91
                    added chars: 13,605
                            Type "y" if this commit is a template: y
                    repository: REPO_1.git
                    commit time: 2020-01-28T15:37:08.000000000
                    hexsha: 64593bd87d65c5883e163c171a4939b05ab1b386
                    added chars: 5,963
                            Type "y" if this commit is a template: y
                                
                            

There may be more than one repository for a commit. The user can type "y" for "yes" or press Enter for "no".

At the end, the selected templates are displayed.

                                

    Below are the templates you have selected:
    +--------------------------------------------+------------------------------------------+-------------------+
    | Repository                                 |            hexsha of the template commit | Chars of template |
    +--------------------------------------------+------------------------------------------+-------------------+
    | REPO_1.git                                 | 60d327a5eee3a297aaef79dde7846551b1d4c2d1 |             7,914 |
    +--------------------------------------------+------------------------------------------+-------------------+
                                
                            

the user is asked, "Are you ready to save the above templates?" The user can type "y" for "yes" or press Enter for "no". If "yes", the selected outliers are added to the templates for the person being examined.

Calculating KEDE for a Project

Calculate daily and weekly KEDE for the PROJECT.

                            
    python3 -m kedehub stats calculate-kede -p PROJECT_ID

    python3 -m kedehub stats calculate-weekly-kede -p PROJECT_ID
                            
                        

Fixing Incorrectly Calculated KEDE for a Project

After completing the addition of repositories to a project, it is recommended to recalculate KEDE. Recalculate KEDE for PROJECT_ID.

                            
    python3 -m kedehub fix-kede -p PROJECT_ID
                            
                        

Getting started