KEDEHub Starter

How to install KEDEHub on your laptop in 10 minutes

Overview

Welcome to an exclusive exploration of KEDEHub!

  1. KEDEHub Starter is free to use for:

    • Small teams with less than 10 active monthly contributors
    • Open Source projects

    If your organization has several small teams then each one of them is allowed to use KEDEHub starter for free.

  2. Duration

    Your journey with KEDEhub is unlimited in duration - hopefully forever!

  3. Feedback & Support

    Your feedback drives our innovation. Share your experience, queries, or challenges anytime during the trial. Our dedicated support team is here to help. Contact us at support@kedehub.com.

Your insights and satisfaction are paramount to us. Dive in, explore, and let's improve your organization together with KEDEHub.

How to install KEDEHUb locally

Prerequisites: Git, Docker or Podman. Make sure you have allocated enough resources. We recommend allocating at least 6 CPUs, 4GB of RAM and 10GB of disk space. Windows, Linux and MacOS are supported. For Windows, you need WSL2 or Podman. Note: For Windows use the latest PowerShell.

Pull KEDEHUb Image

KEDEHub image is hosted on Amazon Elastic Container Registry (ECR) and it is set to be public. You can pull and use the image without needing to authenticate.

  1. Image location.

                                            
    public.ecr.aws/kedehub/kedehub-localhost-image
                                            
                                        

  2. Pull the Image:

    You can pull the image using the Docker command:

                                            
    docker pull public.ecr.aws/kedehub/kedehub-localhost-image && docker tag public.ecr.aws/kedehub/kedehub-localhost-image kedehub-localhost-image
                                            
                                        

    You can pull the image using the Podman command:

                                            
    podman pull public.ecr.aws/kedehub/kedehub-localhost-image && podman tag public.ecr.aws/kedehub/kedehub-localhost-image kedehub-localhost-image
                                            
                                        

  3. Ensure the kedehub-localhost-image is there.

    You can list the current images using Docker

                                            
    docker images
                                            
                                        

    You can list the current images using Podman

                                            
    podman images
                                            
                                        

Run the image

After pulling the image, you can run a container based on the image, by following the below steps:

  1. Before running the container, create a volume. This volume will store your data, ensuring it persists even if the container is stopped or deleted.

                                            
    docker volume create kedehubdata
                                            
                                        

                                            
    podman volume create kedehubdata
                                            
                                        

  2. Ensure the volume: is created

    You can list the current Volumes and check if kedehubdata is there.

                                            
    docker volume ls
                                            
                                        
                                            
    podman volume ls
                                            
                                        

  3. Run KEDEHub Container.

                                            
    docker run --name kedehub-localhost-container -p 8080:80 -p 5400:5400 -v kedehubdata:/var/lib/postgresql/data -d kedehub-localhost-image
                                            
                                        
                                            
    podman run --name kedehub-localhost-container -p 8080:80 -p 5400:5400 -v kedehubdata:/var/lib/postgresql/data -d kedehub-localhost-image
                                            
                                        

    When you create a new container and attach the same volume, the data will be available to the new container. With this setup, all the data will be stored in the kedehubdata volume. Even if you stop or remove the container, the data will remain in the volume.

    Check if the container is running:

                                            
    docker container ls -a
                                            
                                        
                                            
    podman container ls -a
                                            
                                        

    You should see something like:

                                            
    CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS          PORTS                                                                                        NAMES
    a7d5e3bf41a0   kedehub-localhost-image   "/usr/local/bin/star…"   32 seconds ago   Up 30 seconds   0.0.0.0:5400->5400/tcp, :::5400->5400/tcp, 5432/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp   kedehub-localhost-container
                                            
                                        

  4. Test using the below command if the KEDEHub instance is up and running:

                                            
    curl -X 'GET' 'http://localhost:8080/api/public/projects/count' -H 'accept: application/json'
    
                                            
                                        

    If everything is OK it should return:

                                            
    {"count": 0}    
                                            
                                        

    See the KEDEHUb Web site

                                            
    http://localhost:8080/
                                            
                                        

  5. Create a new company
                                        
    http://localhost:8080/signup_company/#/ 
                                        
                                    

    The designated admin user will receive an invitation email on the email address provided during the company creation. Ensure that the company is created by clicking the link in the email.

Analyze Git repositories

KEDEGit is a Python application responsible for:

  1. Analyzing local Git repositories
  2. Sending commands to the KEDEHub

Analysis is performed on local clones of Git repositories, ensuring:

  • Your source code and commit messages remain secure on your premises, with no transfer to KEDEHub
  • No capture of your intellectual property through source code analysis
  • No analysis of commit messages

The client application KEDEGit is an open-source project, which can be found here. KEDEGit must be installed on the same computer along with KEDEHUb and network access to the directory where the target Git repositories are cloned. You also need your user to be the owner of the target Git repository directory.

Clone KEDEGit

In this guide the KEDEGit directory is:

  • For for Mac/Linux: ~/git/kedegit/

  • For Windows: C:\Users\dimit\git\kedegit
For Windows ~/ equals C:\Users\dimit for the user 'dimit'.

Navigate to the ~/git/ directory and clone the KEDEGit repo:

                                
git clone https://github.com/kedehub/kedegit.git
                                
                            
Then clone the KEDEMatcher repo:
                                
git clone https://github.com/kedehub/kedematcher.git
                                
                            

Configure KEDEGit

Configuration directory

KEDEGit uses Confuse for managing its configuration. In our case the application name is KedeGit. The configuration paths for different platforms are listed here . Users can also add an override configuration directory with an environment variable. The environment variable name for KEDEGit is KEDEGITDIR .

In this guide the configuration directory is:

  • For Mac/Linux containers: ~/git/kedegit/docs
  • For Windows containers: %HOME%\git\kedegit\docs
  • For Mac executables: ~/.config/KedeGit
  • For Linux executables: ~/.config/KedeGit
  • For Windows executables: %HOME%\AppData\Roaming\KedeGit

If you run KEDEGit and KEDEMatcher as containers you can change it later to any other directory.

Set Allowed and excluded file types

Allowed and excluded file types are in ~/git/kedegit/docs/kede-config.json. Edit the file if needed to match your architecture, technology and preferences.

Cppy kede-config.json. to the same configuration directory.

Set configuration file

For Mac/Linus Execute the below command to create a new empty config.yaml.

                                

cp ~/git/kedegit/docs/docker_empty_config.yaml ~/git/kedegit/docs/config.yaml
                                
                            

For Windows copy docker_empty_config.yaml as config.yaml

Open config.yaml and add values for company name, user and token from your invitation email.

                                
server:
protocol: http
host: host.docker.internal
port: 5400

company:
name:
user:
token:
                                
                            

In case you are using Podman you need to replace:host: host.docker.internal with host: host.containers.internal as the hostname to access services running on the host machine. This hostname resolves to an IP address assigned to the host on a bridge network managed by Podman.

In case you are running KEDEGit as an executable you need to replace:host: host.docker.internal with host: localhost as the hostname to access services running on the host machine.

Check for Windows-Specific Formatting Issues

Ensure that:

  • Line endings (CRLF vs. LF) are consistent. You can convert line endings using an editor like Notepad++ or within Visual Studio Code (VS Code).
  • No special characters like non-ASCII characters or BOM (Byte Order Mark) (\ufeff) are accidentally inserted, especially if the file is edited on different systems.

There are two options to run KEDEGit:

Running KEDEGit inside a docker container

Pull the KEDEGit image

Run the following command:

                                    
docker pull public.ecr.aws/kedehub/kedegit-image && docker tag public.ecr.aws/kedehub/kedegit-image kedegit-image 
                                    
                                
                                    
podman pull public.ecr.aws/kedehub/kedegit-image && podman tag public.ecr.aws/kedehub/kedegit-image kedegit-image 
                                    
                                

Test the KEDEGit Container

                                    
docker run --rm --add-host=host.docker.internal:host-gateway --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit kedegit-image:latest list-projects
                                    
                                
                                    
podman run --rm --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit kedegit-image:latest list-projects
                                    
                                

For Windows run:

                                    

podman run --rm --name kedegit-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit -v c:\Users\dimit\git:/usr/data kedegit-image:latest list-projects
                                    
                                

Initializing a New Project

For testing KEDEGit we will use its source code repository located at https://github.com/kedehub/kedegit. We will clone the repository at `~/git/kedegit`.

Now, using the below command, we will initialize a new project called NEW_PROJECT, with the source code of the local Git repository located at `~/git/kedegit`.

                                
docker run --rm --add-host=host.docker.internal:host-gateway --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest init-project NEW_PROJECT /usr/data/kedegit
                                
                            
                                
podman run --rm --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest init-project NEW_PROJECT /usr/data/kedegit
                                
                            

In case you are running Podman on Windows, you need to pay attention to how you mount the volume of the local git folder from Windows paths into a Linux container. Podman supports several notation schemes, as presented here:

                                    
podman run --rm --name kedegit-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit -v c:\Users\dimit\git:/usr/data kedegit-image:latest init-project NEW_PROJECT /usr/data/kedegit
                                    
                                

You may also need to adjust permissions on your local repository Windows folder or use the 'ro' option as explained in the Appendix.

The following is the output from the init-project command:

                                
Adding repo: /usr/data/kedegit to project NEW_PROJECT
Assigned project ID: new_project to project: NEW_PROJECT
Processing Repository: https://github.com/kedehub/kedegit.git
Processing commits: 100%|██████████| 37/37 [00:03:00:00, 10.99it/s]
Updating templates for persons: 100%|██████████| 3/3 [00:00:00:00, 25.92it/s]
Calculating Daily KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 24.87it/s]
Calculating Weekly KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 31.39it/s]
Successfully initialized project with ID = new_project 
                                
                            

NEW_PROJECT will be the name you see in the KEDEHub web client as explained here.

After successfully initializing the new project, the system will display the new project ID as follows: Assigned project ID: PROJECT_ID to project: PROJECT. Make sure to note down the PROJECT_ID, as it will be essential for all subsequent work related to the newly created PROJECT.

Adding a New Repository to an Existing Project

Now, we will add another repository to `NEW_PROJECT`. That will be the KEDEMatcher located at https://github.com/kedehub/kedematcher. We will clone it in a new local Git repository located at `~/git/kedematcher` Then we can execute the below commands.

For Mac/Linux run:

                                    
docker run --rm --add-host=host.docker.internal:host-gateway --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest add-repository new_project /usr/data/kedematcher
                                    
                                
                                    
podman run --rm --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest add-repository new_project /usr/data/kedematcher
                                    
                                

For Windows run:

                                    

podman run --rm --name kedegit-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit -v c:\Users\dimit\git:/usr/data kedegit-image:latest add-repository new_project /usr/data/kedematcher
                                    
                                

The following is the output from the add-repository command:

                                
    Adding repo: /usr/data/kedematcher to project new_project
    Assigned project ID: new_project to project: new_project
    Processing Repository: https://github.com/kedehub/kedematcher.git
    Processing commits: 100%|██████████| 15/15 [00:01:00:00, 13.18it/s]
    Processed Repository: https://github.com/kedehub/kedematcher.git
    Updating templates for persons: 100%|██████████| 3/3 [00:00:00:00, 23.84it/s]
    Calculating Daily KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 22.29it/s]
    Calculating Weekly KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 29.82it/s]
    Successfully initialized project with ID = new_project 
                                
                            

Updating Project Statistics

The update-projects command performs the following actions:
  1. Analyzes all new commits
  2. Calculates KEDE and other statistics for the new commits

To update the statistics for all projects within a company with new code contributions, execute:

                                
docker run --rm --add-host=host.docker.internal:host-gateway --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest update-projects
                                
                            
                                
podman run --rm --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest update-projects
                                
                            

For Windows run:

                                    
podman run --rm --name kedegit-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit -v c:\Users\dimit\git:/usr/data kedegit-image:latest update-projects
                                    
                                

The following is the output from the update-projects command:

                                
    Updating Kedehub for project: new_project, #1 of 1
    Processing Repository: https://github.com/kedehub/kedegit.git
    Processing commits: 100%|██████████| 39/39 [00:00:00:00, 269.49it/s]
    Processed Repository: https://github.com/kedehub/kedegit.git
    Processing Repository: https://github.com/kedehub/kedematcher.git
    Processing commits: 100%|██████████| 15/15 [00:00:00:00, 409.60it/s]
    Processed Repository: https://github.com/kedehub/kedematcher.git
    Successfully updated 0 out of 1 projects
                                
                            

Updating Project Statistics for a single existing project with PROJECT_ID_1

For Mac/Linux run:

                                    
docker run --rm --add-host=host.docker.internal:host-gateway --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest update-projects -p PROJECT_ID_1  
                                    
                                
                                    
podman run --rm --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest update-projects -p PROJECT_ID_1  
                                    
                                

For Windows run:

                                    
podman run --rm --name kedegit-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit -v c:\Users\dimit\git:/usr/data kedegit-image:latest update-projects -p PROJECT_ID_1
                                    
                                

Updating Project Statistics for multiple existing projects e.g. PROJECT_ID_1 PROJECT_ID_2

For Mac/Linux run:

                                    
docker run --rm --add-host=host.docker.internal:host-gateway --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest update-projects -p PROJECT_ID_1 PROJECT_ID_2  
                                    
                                
                                    
podman run --rm --name kedegit-container -v ~/git/kedegit/docs:/root/.config/KedeGit -v ~/git:/usr/data kedegit-image:latest update-projects -p PROJECT_ID_1 PROJECT_ID_2  
                                    
                                

For Windows run:

                                    
podman run --rm --name kedegit-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit -v c:\Users\dimit\git:/usr/data kedegit-image:latest update-projects -p PROJECT_ID_1 PROJECT_ID_2
                                    
                                

More information here.

Running KEDEGit as system executable

Download KEDEGit executable for you environment

From here download the latest KEDEGit into C:\Users\dimit\git\kedegit\dist\win_dist\kedegit for Windows or ~/git/kedegit/dist/mac_dist/kedegit for Mac/Linux.

Note: For Windows use the latest PowerShell!

Test KEDEGit

For Mac/Linux run:

                                    
~/git/kedegit/dist/mac_dist/kedegit list-projects
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git>.\kedegit\dist\win_dist\kedegit list-projects
                                    
                                

That command should return nothing.

Initializing a New Project

For testing KEDEGit we will use its source code repository located at https://github.com/kedehub/kedegit. We will clone the repository at `~/git/kedegit`.

Now, using the below command, we will initialize a new project called NEW_PROJECT, with the source code of the local Git repository located at `~/git/kedegit`.

For Mac/Linux run:

                                    
~/git/kedegit/dist/mac_dist/kedegit init-project NEW_PROJECT ~/git/kedegit
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedegit\dist\win_dist\kedegit init-project NEW_PROJECT  C:\Users\dimit\git\kedegit
                                    
                                

The following is the output from the init-project command:

                                
    Adding repo: C:\Users\dimit\git\kedegit to project NEW_PROJECT
    Assigned project ID: new_project to project: NEW_PROJECT
    Processing Repository: https://github.com/kedehub/kedegit.git
    Processing commits: 100%|██████████| 37/37 [00:03:00:00, 10.99it/s]
    Updating templates for persons: 100%|██████████| 3/3 [00:00:00:00, 25.92it/s]
    Calculating Daily KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 24.87it/s]
    Calculating Weekly KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 31.39it/s]
    Successfully initialized project with ID = new_project 
                                
                            

NEW_PROJECT will be the name you see in the KEDEHub web client as explained here.

After successfully initializing the new project, the system will display the new project ID as follows: Assigned project ID: PROJECT_ID to project: PROJECT. Make sure to note down the PROJECT_ID, as it will be essential for all subsequent work related to the newly created PROJECT.

Adding a New Repository to an Existing Project

Now, we will add another repository to `NEW_PROJECT`. That will be the KEDEMatcher located at https://github.com/kedehub/kedematcher. We will clone it in a new local Git repository located at `~/git/kedematcher` Then we can execute the below command:

For Mac/Linux run:

                                    
~/git/kedegit/dist/mac_dist/kedegit add-repository new_project ~/git/kedematcher
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedegit\dist\win_dist\kedegit add-repository new_project C:\Users\dimit\git\kedematcher\
                                    
                                

The following is the output from the add-repository command:

                                    
Adding repo: C:\Users\dimit\git\kedematcher to project new_project
Assigned project ID: new_project to project: new_project
Processing Repository: https://github.com/kedehub/kedematcher.git
Processing commits: 100%|██████████| 15/15 [00:01:00:00, 13.18it/s]
Processed Repository: https://github.com/kedehub/kedematcher.git
Updating templates for persons: 100%|██████████| 3/3 [00:00:00:00, 23.84it/s]
Calculating Daily KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 22.29it/s]
Calculating Weekly KEDE for persons: 100%|██████████| 3/3 [00:00:00:00, 29.82it/s]
Successfully initialized project with ID = new_project 
                                    
                                

Updating Project Statistics

The update-projects command performs the following actions:
  1. Analyzes all new commits
  2. Calculates KEDE and other statistics for the new commits

For Mac/Linux run:

                                    
~/git/kedegit/dist/mac_dist/kedegit update-projects
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedegit\dist\win_dist\kedegit update-projects
                                    
                                

The following is the output from the update-projects command:

                                
Updating Kedehub for project: new_project, #1 of 1
Processing Repository: https://github.com/kedehub/kedegit.git
Processing commits: 100%|██████████| 39/39 [00:00:00:00, 269.49it/s]
Processed Repository: https://github.com/kedehub/kedegit.git
Processing Repository: https://github.com/kedehub/kedematcher.git
Processing commits: 100%|██████████| 15/15 [00:00:00:00, 409.60it/s]
Processed Repository: https://github.com/kedehub/kedematcher.git
Successfully updated 0 out of 1 projects
                                
                            

Updating Project Statistics for a single existing project with PROJECT_ID_1

For Mac/Linux run:

                                    
~/git/kedegit/dist/mac_dist/kedegit  update-projects -p PROJECT_ID_1 
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedegit\dist\win_dist\kedegit update-projects -p PROJECT_ID_1 
                                    
                                

Updating Project Statistics for multiple existing projects e.g. PROJECT_ID_1 PROJECT_ID_2

For Mac/Linux run:

                                    
~/git/kedegit/dist/mac_dist/kedegit  update-projects -p PROJECT_ID_1 PROJECT_ID_2 
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedegit\dist\win_dist\kedegit update-projects -p PROJECT_ID_1 PROJECT_ID_2 
                                    
                                

More information here.

Merge Identities

Identity matching in the context of Git involves the process of accurately identifying and distinguishing developers based on the various email addresses and names they use when committing work. Developers may use a range of email types, such as corporate, personal, or even anonymous addresses like "users.noreply.github.com." Similarly, the names they commit under can vary significantly, including full names with or without surnames, names with typographical errors, pseudonyms, or sometimes even missing names. The challenge of identity matching lies in aggregating these diverse identities for each individual developer and differentiating them from the identities of other developers. This process is crucial for obtaining precise information about a developer's contributions and activities in Git repositories.

KEDEMatcher addresses this problem by performing semi-automatic identity recognition The client application KEDEMatcher is an open-source project, which can be found here. KEDEMatcher must be installed on the same computer along with KEDEHUb and network access to the directory where the target Git repositories are cloned.

Note: Mac OS M1 architecture is not supported, It is actually an issue with this package apjv.

Clone KEDEMatcher

Navigate to the ~/git/ directory and clone the KEDEMatcher repo:

                                
git clone https://github.com/kedehub/kedematcher.git
                                
                            

Configure KEDEMatcher

KEDEMatcher uses the same configuration as KEDEGit. Thus, if not already set up, go and set up KEDEGit as explained here.

There are two options to run KEDEMatcher:

Running KEDEMatcher inside a docker container

Pull the KEDEMatcher docker image

Run the following command:

                                    
docker pull public.ecr.aws/kedehub/kedematcher-image && docker tag public.ecr.aws/kedehub/kedematcher-image kedematcher-image 
                                    
                                
                                    
podman pull public.ecr.aws/kedehub/kedematcher-image && podman tag public.ecr.aws/kedehub/kedematcher-image kedematcher-image 
                                    
                                

Merge identities for a single project

The identity-merge command will:

  1. Determine all authors who belong to the same individual
  2. Create a new KEDEHub user for that individual

To merge identities on a single project, use the below and make sure to use the PROJECT_ID in this case 'new_project', not project name:

                                
docker run --rm --name kedematcher-container -v ~/git/kedegit/docs:/root/.config/KedeGit kedematcher-image:latest identity-merge  -p new_project
                                
                            
                                
podman run --rm --name kedematcher-container -v ~/git/kedegit/docs:/root/.config/KedeGit kedematcher-image:latest identity-merge  -p new_project
                                
                            

In case you are running Podman on Windows, you need to pay attention to how you mount the volume of the local git folder from Windows paths into a Linux container. Podman supports several notation schemes, as presented here:

                                    
podman run --rm --name kedematcher-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit kedematcher-image:latest identity-merge  -p new_project
                                    
                                

You may also need to adjust permissions on your local repository Windows folder or use the 'ro' option as explained in the Appendix.

The following is the output from the identity-merge command:

                                
First pass...
Matching by: EmailMatcher:  67%|██████▋   | 2/3 [00:00:00:00, 5155.87it/s]
Matching by: EmailNameMatcher:  67%|██████▋   | 2/3 [00:00:00:00, 496.72it/s]
Saving users: 100%|██████████| 2/2 [00:00:00:00, 23.99it/s]
Successfully merged 2 into 2 users with 0 authors for 1 projects. Created 2 new users. 
Second pass...
Matching by: EmailNameMatcher:  67%|██████▋   | 2/3 [00:00:00:00, 3128.91it/s]
Saving users: 100%|██████████| 2/2 [00:00:00:00, 40329.85it/s]
Successfully merged 2 into 2 users with 0 authors for 1 projects. Created 0 new users. 
                                
                            

Merge identities for a company

To merge identities on all projects for a company, use:

                                
docker run --rm --name kedematcher-container -v ~/git/kedegit/docs:/root/.config/KedeGit kedematcher-image:latest identity-merge
                                
                            
                                
podman run --rm --name kedematcher-container -v ~/git/kedegit/docs:/root/.config/KedeGit kedematcher-image:latest identity-merge
                                
                            

In case you are running Podman on Windows, you need to pay attention to how you mount the volume of the local git folder from Windows paths into a Linux container. Podman supports several notation schemes, as presented here:

                                    
podman run --rm --name kedematcher-container -v c:\Users\dimit\git\kedegit\docs:/root/.config/KedeGit kedematcher-image:latest identity-merge
                                    
                                

You may also need to adjust permissions on your local repository Windows folder or use the 'ro' option as explained in the Appendix.

More information here.

Running KEDEMatcher as system executable

Download KEDEMatcher executable for you environment

From here download the latest KEDEMatcher into C:\Users\dimit\git\kedematcher\dist\win_dist\kedematcher for Windows or ~/git/kedematcher/dist/mac_dist/kedematcher for Mac/Linux.

Note: For Windows use the latest PowerShell!

Merge identities for a single project

The identity-merge command will:

  1. Determine all authors who belong to the same individual
  2. Create a new KEDEHub user for that individual

To merge identities on a single project, use the below and make sure to use the PROJECT_ID in this case 'new_project', not project name.

For Mac/Linux run:

                                    
~/git/kedematcher/dist/mac_dist/kedematcher identity-merge  -p new_project
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedematcher\dist\win_dist\kedematcher identity-merge  -p new_project
                                    
                                

The following is the output from the identity-merge command:

                                
First pass...
Matching by: EmailMatcher:  67%|██████▋   | 2/3 [00:00:00:00, 5155.87it/s]
Matching by: EmailNameMatcher:  67%|██████▋   | 2/3 [00:00:00:00, 496.72it/s]
Saving users: 100%|██████████| 2/2 [00:00:00:00, 23.99it/s]
Successfully merged 2 into 2 users with 0 authors for 1 projects. Created 2 new users. 
Second pass...
Matching by: EmailNameMatcher:  67%|██████▋   | 2/3 [00:00:00:00, 3128.91it/s]
Saving users: 100%|██████████| 2/2 [00:00:00:00, 40329.85it/s]
Successfully merged 2 into 2 users with 0 authors for 1 projects. Created 0 new users. 
                                
                            

Merge identities for a company

To merge identities on all projects for a company, use:

For Mac/Linux run:

                                    
~/git/kedematcher/dist/mac_dist/kedematcher identity-merge
                                    
                                

For Windows run:

                                    
C:\Users\dimit\git> .\kedematcher\dist\win_dist\kedematcher identity-merge
                                    
                                

More information here.

Successful Installation

The final result can be seen when you go to "Organization" tab on the KEDEHub web interface. You should see the image below:

Organization

We may notice that in the header menu:

  • THere is one project count on the 'Projects' tab
  • There is one active user count on the "People" tab

Appendix

Git's handling of directory ownership within a container

When mapping volumes from a Windows host to a Linux container, Windows filesystem permissions do not translate directly to Linux permissions.

When using KEDEGit and KEDEMatcher in a containerized environment like Docker or Podman, you may encounter issues related to Git's security feature that checks the ownership of the repository directory. This feature is designed to prevent unauthorized modifications by ensuring that the user executing Git commands matches the owner of the repository directory. In environments where volumes are mapped from Windows to Linux containers, discrepancies in user or group IDs can trigger these security checks, resulting in errors such as "detected dubious ownership" or "unsafe repository."

Here are some strategies for Handling Permission Issues in Volume Mapping:

  1. Change Repository Ownership:

    On Windows, use the takeown command to change the owner of the repository folder to the user running the Git command:

                                                
    takeown /f path_to_the_repository /r /d y
                                                
                                            

    On Linux, use the chown command:

                                                
    chown -R username:group path_to_the_repository
                                                
                                            

    This ensures that the user executing KEDEGit and KEDEMatcher commands has ownership of the repository directory

  2. Examine and Adjust Permissions on your repository folder from Windows: Adjust them to be as permissive as possible to test if it affects how they are perceived inside the container. Here’s how you can check the permissions on Windows:
    • Right-click on the repository folder.
    • Select "Properties".
    • Go to the "Security" tab to view or modify the permissions.
  3. Read-Only Volume Mounting:

    Using the ro (read-only) option with the -v or --volume flag when mounting directories in Podman or Docker is a viable way to manage some types of access and security issues, including problems like the "detected dubious ownership" or "unsafe repository" errors with Git. By mounting the directory as read-only, you ensure that the Git repository cannot be modified by any processes running inside the container, thus maintaining the integrity of your data. This approach doesn't require any changes to Git configurations or user permissions settings inside the container, making it simpler to manage and less error-prone.

    Given that KEDEGit's Git usage inside the container is limited to read-only operations like git log, git show, and git diff, using the ro (read-only) mount option is an effective and straightforward solution to the "detected dubious ownership" or "unsafe repository" issues,.

    Here's how you could adjust your Podman or Docker command to mount the volume as read-only:

                                                
    podman run -v path_to_the_repository:/usr/data:ro ...
                                                
                                            
                                                
    docker run -v path_to_the_repository:/usr/data:ro ...
                                                
                                            

Fix DNS resolution in WSL2

  1. Turn off generation of /etc/resolv.conf

    Using your Linux prompt, modify (or create) /etc/wsl.conf with the following content:

                                                
    [network]
    generateResolvConf = false
                                                
                                            

    using:

                                                
    cd ~/../../etc
     
    echo "generateResolvConf = false" | sudo tee -a wsl.conf
                                                
                                            

  2. Restart the WSL2 Virtual Machine

    Exit all of your Linux prompts and run the following Powershell command

                                                
    wsl --shutdown
                                                
                                            

  3. Create a custom /etc/resolv.conf

    Open a new Linux prompt and cd to /etc If resolv.conf is soft linked to another file, remove the link with:

                                                
    rm resolv.conf
                                                
                                            

    Create a new resolv.conf with the following content nameserver X.X.X.X

                                                
    touch resolv.conf
     
    echo "nameserver X.X.X.X | sudo tee -a resolv.conf
                                                
                                            

  4. Restart the WSL2 Virtual Machine:

                                                
    wsl --shutdown
                                                
                                            

  5. Start a new Linux prompt and check.

                                                
    cat /etc/resolv.conf
    
    cat /etc/wsl.conf
                                                                                                
                                                
                                            

Cloning Repositories on Windows from Azure DevOps

This PowerShell script uses a Personal Access Token (PAT) to authenticate with Azure DevOps and retrieve a list of repositories. It filters repositories containing "SEARCH_WORD_1" or "SEARCH_WORD_1" in their URLs (excluding those ending with "SEARCH_WORD_1%") and then clones each matching repository. The script ensures a secure connection by embedding the PAT in each clone URL, facilitating automated access to Azure DevOps repositories.

                            
#Set PAT token
$PAT = "YOUR TOKEN"
 
$uri = "https://dev.azure.com/YOUR_ORG/_apis/git/repositories?api-version=6.0"
 
$headers = @{
    Authorization = "Basic " + [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$($PAT)"))
}
 
# Make the API request to get the list of repositories
$response = (Invoke-RestMethod -Uri $uri -Method Get -Headers $headers)
 
$filteredResponse = ($response.value.remoteUrl | Where-Object {(($_ -like "*/SEARCH_WORD_1*") -or ($_ -like "*/SEARCH_WORD_1*")) -and ($_ -notlike "*SEARCH_WORD_1%*") })
 
foreach ($ulr in $filteredResponse) {
    $gitURL = "https://"+$PAT+($ulr.TrimStart("https://kognifai"))
    git clone $gitURL 2> $null
}
                            
                        

Updating Repositories on Windows from Azure DevOps

Here’s a PowerShell script to update the repositories that were initially cloned using the first script. This script assumes that each repository was cloned into a separate folder with a name that matches the repository's name.

                            
# Set PAT token
$PAT = "YOUR TOKEN"

$uri = "https://dev.azure.com/YOUR_ORG/_apis/git/repositories?api-version=6.0"

$headers = @{
    Authorization = "Basic " + [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$($PAT)"))
}

# Make the API request to get the list of repositories
$response = (Invoke-RestMethod -Uri $uri -Method Get -Headers $headers)

$filteredResponse = ($response.value.remoteUrl | Where-Object {(($_ -like "*/SEARCH_WORD_1*") -or ($_ -like "*/SEARCH_WORD_2*")) -and ($_ -notlike "*SEARCH_WORD_1%*") })

foreach ($url in $filteredResponse) {
    # Extract repository name from the URL
    $repoName = $url -replace "^.*/([^/]+)\.git$", '$1'
    
    # Define the local path of the repository
    $localPath = ".\$repoName"
    
    if (Test-Path $localPath) {
        Write-Host "Updating repository: $repoName"
        Set-Location -Path $localPath
        git pull 2> $null
        Set-Location -Path ..
    } else {
        Write-Host "Repository '$repoName' not found locally. Skipping update."
    }
}
                                
                            
                        

Explanation

  • Authentication: The script uses a PAT for authentication to access Azure DevOps.
  • Filtering Repositories: Filters repositories to include only those with "SEARCH_WORD_1" or "SEARCH_WORD_2" in their URL
  • Updating Repositories: For each filtered repository, it attempts to navigate to the respective local directory and run git pull to update it. If the local folder doesn’t exist, it skips that repository.

Getting started