KEDEGit local client
How to use KEDEGit for calculating KEDE and other statistics
Overview
KEDEGit is a local Python application responsible for:
- Analyzing local Git repositories
- Sending commands to the KEDEHub
The client application KEDEGit is an open-source project, which can be found here. KEDEGit must be installed on a computer with Python 3.11 and network access to the folders where the target Git repositories are cloned. One organization can have multiple client applications installed on different computers, each analyzing a different set of Git repositories. For example, each department may have its own Git repositories and its own KEDEGit client application. However, all data collected by all KEDEGit client applications will be stored under the company name. Therefore, we recommend organizations maintain only one KEDEGit client application.
Tip: We recommend installing only one KEDEGit local client for all your repositories.
Furthermore, the KEDEGit client can connect to popular code-sharing platforms like GitHub, GitLab, and Bitbucket, and clone Git repositories from these platforms. Organizations can use this feature if they want to centralize all KEDEHub-related activities on a single computer. This computer can be maintained by operations or system administrators from another department, instead of developers and their line managers.
Analysis is performed on local clones of Git repositories, ensuring:
- Your source code and commit messages remain secure on your premises, with no transfer to KEDEHub
- No capture of your intellectual property through source code analysis
- No analysis of commit messages
Configure KEDEGit
Configuration directory
KEDEGit uses Confuse for managing its configuration. In our case the application name is KedeGit. The configuration paths for different platforms are listed here . Users can also add an override configuration directory with an environment variable. The environment variable name for KEDEGit is KEDEGITDIR .
In this guide the configuration directory is:
-
For Mac/Linux containers:
~/git/kedegit/docs
-
For Windows containers:
%HOME%\git\kedegit\docs
-
For Mac executables:
~/.config/KedeGit
-
For Linux executables:
~/.config/KedeGit
-
For Windows executables:
%HOME%\AppData\Roaming\KedeGit
If you run KEDEGit and KEDEMatcher as containers you can change it later to any other directory.
Set Allowed and excluded file types
Allowed and excluded file types are in ~/git/kedegit/docs/kede-config.json. Edit the file if needed to match your architecture, technology and preferences.
Cppy kede-config.json. to the same configuration directory.
Set configuration file
For Mac/Linus Execute the below command to create a new empty config.yaml.
cp ~/git/kedegit/docs/docker_empty_config.yaml ~/git/kedegit/docs/config.yaml
For Windows copy docker_empty_config.yaml
as config.yaml
Open config.yaml and add values for company name, user and token from your invitation email.
server:
protocol: http
host: host.docker.internal
port: 5400
company:
name:
user:
token:
In case you are using Podman you need to replace:host: host.docker.internal
with host: host.containers.internal
as the hostname to access services running on the host machine.
This hostname resolves to an IP address assigned to the host on a bridge network managed by Podman.
In case you are running KEDEGit as an executable you need to replace:host: host.docker.internal
with host: localhost
as the hostname to access services running on the host machine.
Commands Reference
Initializing a New Project
Create a new project called PROJECT, consisting of a local Git repository located at ~/git/repo_1
:
kedehub init-project PROJECT ~/git/repo_1
PROJECT will be the name you see in KEDEHub web client as explained here.
To ignore commits from the repository prior to a certain date, use the --earliest-commit-date
option:
After successfully initializing the new project, the system will display the new project ID as follows:
Assigned project ID: PROJECT_ID to project: PROJECT
.
Make sure to note down the PROJECT_ID, as it will be essential for all subsequent work related to the newly created PROJECT.
kedehub init-project PROJECT_ID ~/git/repo_1 --earliest-commit-date "2020-01-01"
The following is the output from the init-project command:
Confing dir: /Users/user/.config/KedeGit
Adding repo: /Users/user/git/kedegit_public to project kedegit
KedeGit Init time 0:00:00.038585
Assigned project ID: kedegit to project: kedegit
Repository https://github.com/kedehub/kedegit.git
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:0100:00, 12.89it/s]
Updating templates for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:0000:00, 22.04it/s]
Calculating Daily KEDE for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:0000:00, 20.89it/s]
Calculating Weekly KEDE for persons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:0000:00, 24.78it/s]
Successfully initialized project with ID = kedegit
Adding a New Repository to an Existing Project
Add a new local Git repository located at ~/git/repo_2
to PROJECT_ID
kedehub add-repository PROJECT_ID ~/git/repo_2
To ignore commits from the repository prior to a certain date, use the --earliest-commit-date
option:
kedehub add-repository PROJECT_ID ~/git/repo_2 --earliest-commit-date "2020-01-01"
Importing Multiple Repositories
To add several new local Git repositories located in a specific folder, such as ~/git/repos, to a PROJECT, use the following command:
kedehub bulk-import-repos --workdir ~/git/repo_2 -p PROJECT
The project name is optional.
If the -p
option is not used, a new project will be initialized for each of the repositories in the specified folder.
The following is the output from the bulk-import-repos
command:
Importing repo 1 of 3
Importing repo https://github.com/okteto/context with project name Github Actions
Adding repo: /home/ec2-user/git/github_actions/context to project Github Actions
KedeGit Init time 0:00:04.613556
Assigned project ID: github_actions to project: Github Actions
Repository https://github.com/okteto/context
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 86/86 [00:06<00:00, 12.85it/s]
Updating templates for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 7.31it/s]
Calculating Daily KEDE for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.61it/s]
Calculating Weekly KEDE for persons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 7.39it/s]
Importing repo 2 of 3
Importing repo https://github.com/okteto/apply with project name Github Actions
Adding repo: /home/ec2-user/git/github_actions/apply to project Github Actions
KedeGit Init time 0:00:00.034135
Assigned project ID: github_actions to project: Github Actions
Repository https://github.com/okteto/apply
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:09<00:00, 8.53it/s]
Updating templates for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00, 7.42it/s]
Calculating Daily KEDE for persons: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00, 4.99it/s]
Calculating Weekly KEDE for persons: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00, 6.90it/s]
Importing repo 3 of 3
Importing repo https://github.com/okteto/destroy-stack with project name Github Actions
Adding repo: /home/ec2-user/git/github_actions/destroy-stack to project Github Actions
KedeGit Init time 0:00:00.031647
Assigned project ID: github_actions to project: Github Actions
Repository https://github.com/okteto/destroy-stack
Processing commits: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 86/86 [00:05<00:00, 14.54it/s]
Updating templates for persons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 7.75it/s]
Calculating Daily KEDE for persons: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00, 4.30it/s]
Calculating Weekly KEDE for persons: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 5.70it/s]
Successfully imported 3 out of 3 repos
Updating Local Project Repositories
New code is constantly contributed to all company repositories.
To update local repositories from remote sources (performing a git pull
), use the following command for all company projects:
kedehub update-repos
Updating Project Statistics
The update-projects
command performs the following actions:
- Analyzes all new commits
- Detects templates and fraudulent code for the new commits
- Calculates KEDE and other statistics for the new commits
Updating All Project Statistics
To update the statistics for all projects within a company with new code contributions, execute:
kedehub update-projects
Updating Specific Project Statistics
To update the statistics for a single existing project, execute:
kedehub update-projects -p PROJECT_ID
To update the statistics for multiple existing projects, execute:
kedehub update-projects -p PROJECT_ID_1 PROJECT_ID_2
Cleaning Project Data
In some cases, you may want to delete all calculated statistics for an existing project and re-analyze all commits.
Use the --clean
option to achieve this:
kedehub update-projects -p PROJECT_ID --clean
Cloning Repositories to a Temporary Folder
To conserve space on the computer where an existing project's repositories are cloned, use the --temp
option.
This option deletes the local cloned repositories for the selected project and then clones the repositories again.
This ensures that all new commits are included in the local clones.
Execute the following command:
kedehub update-projects -p PROJECT_ID --temp
The command performs the following actions:
-
Deletes the local clones of all project repositories listed in your
config.yaml
file - Clones all the deleted repositories to a temporary folder
- Analyzes all commits
- Detects templates and fraudulent code
- Calculates KEDE and other statistics for the project
- Deletes the temporary folder
Clone and analyze a single GitHub repository
To clone and analyze a single repository REPO from COMPANY_ORG, using KEDEGit, you can use the clone-import-github-repo command with the --url option. Here is an example command, with a GitHub access token GITHUB_ACCESS_TOKEN,
kedehub clone-import-github-repo --workdir ~/git/COMPANY --url "https://github.com/COMPANY_ORG/REPO" --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN
The above command:
- Clones the COMPANY_ORG/REPO repository to the specified --workdir.
- Creates a new project for the REPO repository in the specified --workdir. he project name will be the repository name REPO. If a project with the same name exists, a sequential number will be added to the name (e.g.,REPO1).
- Analyzes all commits for the REPO repository in the specified --workdir.
- Detects templates and fraudulent code
- Calculates KEDE and other statistics for the new project.
Clone repository in a temporary folder
If you need to save space on the computer where the repository is cloned, you can use the --temp option. Here's an example command:
kedehub clone-import-github-repo --workdir ~/git/COMPANY --url "https://github.com/COMPANY_ORG/REPO" --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN --temp
The above command:
- Clones the COMPANY_ORG/REPO repository to the specified --workdir in a temporary folder.
- Creates a new project for the REPO repository in the specified --workdir. he project name will be the repository name REPO. If a project with the same name exists, a sequential number will be added to the name (e.g.,REPO1).
- Analyzes all commits for the REPO repository in the specified --workdir.
- Deletes the temporary folder.
- Detects templates and fraudulent code
- Calculates KEDE and other statistics for the new project.
Clone and analyze GitHub account
To save time when using KEDEHub, you can clone and analyze all repositories from a GitHub account using the command "kedehub clone-import-github". For example:
kedehub clone-import-github --workdir ~/git/COMPANY --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN
This command will clone all of the COMPANY_ORG repositories into --workdir and create a new project for each repository. The project name will be the same as the repository name, unless a project with that name already exists. In that case, a sequential number will be added to the project name. After cloning, the command will analyze all of the commits for each repository and calculate KEDE and other statistics for the new projects.
If you want to put all of the repositories for the GitHub account under a new project named PROJECT. you can use the -p option:
kedehub clone-import-github --workdir ~/git/COMPANY --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN -p PROJECT
Clone repositories in a temporary folder
To save space on the computer where the project repositories are cloned, then you can use the --temp option.
kedehub clone-import-github --workdir ~/git/COMPANY --org COMPANY_ORG --token GITHUB_ACCESS_TOKEN --temp
The --temp option will clone all of the repositories in --workdir into a temporary folder. It then creates a new project for each repository and analyzes all commits, calculating KEDE and other statistics for each new project. The temporary folder is then deleted.
Detecting Templates and Fraudulent Code
Source code templates are one or more source code files, or pieces of code copied from a source and added to a source code file. In software development, a template could be a ready-to-use open-source framework.
Fraud occurs when one or more source code files are added to a repository, or a piece of source code is added to a source code file with the sole purpose of manipulating KEDE and other statistics. In software development, fraud can be committed by adding dead code (unused classes and methods) to a source code file.
KEDEHub features a Templates & Fraud Detection Engine to identify commits made of source code templates or source code committed to manipulate statistics. If not filtered out, such commits can artificially inflate the statistics of a software developer, team, or project. The filtering is done automatically before KEDEHub calculates KEDE and the other statistics.
In some cases, you may want to re-apply the Templates & Fraud Detection Engine on specific software developer or project.
For that you can use KEDEGit and the template
command.
Tip: After re-filtering templates do not forget to run the stats
command to re-calculates KEDE and the other statistics.
KEDEGit can operate in two modes concerning template commits: automatic and interactive.
Automatic Mode
In this mode, KEDEHub updates templates without user interaction.
To update all templates for a project, use the -p
option:
python -m kedehub templates update -p PROJECT_ID
To update templates for a specific author on a specific project, use the -p
and -a
options:
python -m kedehub templates update -p PROJECT_ID -a AUTHOR
To update templates for a specific author across all projects, use the -a
option:
python -m kedehub templates update -a AUTHOR
Interactive Mode
In this mode, KEDEHub shows all suspected template commits and updates templates after asking the user for consent.
Interactive mode is set with the find
option.
To find templates for a specific author across all projects, use the -a
option:
python -m kedehub templates find -a AUTHOR
To change the reporting interval, use the -r
option.
The default is quarterly (q
).
Other values include w
for weekly and d
for daily:
python -m kedehub templates find -p PROJECT_ID -r w
When the templates
command enters interactive mode,
it displays the outliers found and guides the user through its interactive command loop.
The loop shows the list of outliers and prompts the user with "Type 'y' if this commit is a template".
----------------------------------------------------------------
Templates finding for: AUTHOR
----------------------------------------------------------------
For date: 2020-01-28 there are total of 19,568 chars added
Out of all daily commits pick commits that are template.
repository: REPO_1.git
commit time: 2020-01-28T13:11:58.000000000
hexsha: 51508442cbae9075709c1800eea0abc85e6b6f91
added chars: 13,605
Type "y" if this commit is a template: y
repository: REPO_1.git
commit time: 2020-01-28T15:37:08.000000000
hexsha: 64593bd87d65c5883e163c171a4939b05ab1b386
added chars: 5,963
Type "y" if this commit is a template: y
There may be more than one repository for a commit. The user can type "y" for "yes" or press Enter for "no".
At the end, the selected templates are displayed.
Below are the templates you have selected:
+--------------------------------------------+------------------------------------------+-------------------+
| Repository | hexsha of the template commit | Chars of template |
+--------------------------------------------+------------------------------------------+-------------------+
| REPO_1.git | 60d327a5eee3a297aaef79dde7846551b1d4c2d1 | 7,914 |
+--------------------------------------------+------------------------------------------+-------------------+
the user is asked, "Are you ready to save the above templates?" The user can type "y" for "yes" or press Enter for "no". If "yes", the selected outliers are added to the templates for the person being examined.
Calculating KEDE for a Project
Calculate daily and weekly KEDE for the PROJECT.
kedehub stats calculate-kede -p PROJECT_ID
kedehub stats calculate-weekly-kede -p PROJECT_ID
Fixing Incorrectly Calculated KEDE for a Project
After completing the addition of repositories to a project, it is recommended to recalculate KEDE. Recalculate KEDE for PROJECT_ID.
kedehub fix-kede -p PROJECT_ID
Appendix
Install virtualenv
pip install virtualenv
Install KEDEGit in virtual environment
git clone https://github.com/kedehub/kedegit.git kedegit
cd kedegit/
python3 -m virtualenv env
source ~/kedegit/env/bin/activate
pip install pip --upgrade
pip install -r requirements.txt
pip install python-Levenshtein
pip install numpy --upgrade
deactivate
Test if everything is OK
source ~/kedegit/env/bin/activate
python3 -m kedehub list-projects
deactivate
That command should list all projects for a company.
If the company has no projects yet nothing will be listed.
Getting started