Identifying Outliers

A Guide to Detecting and Excluding Suspected Templates

Overview

In software development, a "template" refers to a section of source code that bears striking similarity to code found elsewhere, often resulting from copy-pasting practices. These templates can exist as standalone text files or snippets of code copied and inserted into text files. To learn more about templates, please refer to the following explanation.

KEDEHub automatically detects and excludes such templates during the calculation of KEDE. However, there are instances where manual inspection is required to verify if suspected templates are indeed duplicates. To facilitate this process, KEDEHub provides a dedicated tool called "Outliers." Accessible through the application menu, this tool streamlines the identification and handling of suspected templates. You can reach it form the application manu as presented in the image below.

Below the tool description, you will find a search box specifically designed for locating projects within your company. This search feature utilizes the project's long name instead of project IDs (project name), and it returns up to 10 results. By default, the list displays the ten most active projects within the company, based on activity within the last 30 days leading up to the latest commit date.

When configuring a report using the "Outliers" tool, you can select the following parameters:

  • KEDE Frequency: Choose between daily or weekly intervals.
  • KEDE Limit: Select from 10, 20, 30, or 50 as the limit for KEDE values

  • The start and end dates for the report

Typically, we recommend using the weekly KEDE frequency, as daily reports may contain excessive noise. However, in cases where a deeper analysis of suspected templates is necessary, opting for the daily KEDE frequency is more suitable.

Checking Commits

To access the commit checking functionality, navigate to the menu and select the desired parameters. In this section, we will focus on Weekly KEDE, but the same actions can be applied to find Daily KEDE templates. Upon clicking the Search button, a list of weeks will be displayed. Each item in the list includes the week's date, the name of the contributor who committed the code, and the total KEDE value for that week. An example output is shown in the figure below.

To investigate suspected templates further, simply click on the corresponding commit number. In the highlighted example above, the commit number is highlighted in blue. Clicking on it will open the commit on GitHub in a new browser window. Please note that if the repository is private, you will need to provide credentials to access it. Additionally, if your code is not hosted on GitHub, you will need to locate the commit using its number in your Version Control System (VCS).

To determine if a commit has already been flagged as a template, click the "Template" button. In the figure below, there are two commits displayed. The first commit has already been marked as a template, while the second one has not and may require further examination, as it has 53K net added characters.

(Note: To safeguard privacy, email addresses have been obfuscated.)

Setting a Template

Once you have confirmed that a particular commit contains template code, it is important to exclude it from KEDE calculations. To do this, simply click on the "Nullify" button, which is visible in the figure below.

Upon clicking the "Nullify" button, the fields for "Template chars added" and "Template chars deleted" will be automatically populated, as shown in the figure below.

If you need to modify the excluded characters for a previously designated template, you can adjust the numbers accordingly. Please note that you have the flexibility to change the numbers for an already designated template as well.

After making any necessary changes, click the "Submit" button. The modified numbers will be saved in the KEDEHub database, and from that point forward, the commit will be excluded from KEDE calculations. Additionally, KEDE values for the week of the commit, for which the template was set, will be automatically recalculated.

Getting started