Understanding the details of Knowledge Discovery Efficiency (KEDE)
Abstract
Introducing KEDE, a powerful metric for optimizing software development. KEDE analyzes source code repositories and quantifies the knowledge gap developers face when starting tasks. This knowledge gap impacts their happiness and productivity.
With values ranging between 0 and 100, higher KEDE scores indicate better standing. Managers can use KEDE to compare projects, teams, and departments.
KEDE showcases the efficiency of developers in acquiring and applying missing knowledge. This insight helps teams improve processes, ultimately boosting happiness, productivity, and success.
Motivation
In 1999 Peter Drucker published “Knowledge Worker Productivity: The Biggest Challenge” where he claimed that:
“The most important, and indeed the truly unique, contribution of management in the 20th century was the fiftyfold increase in the productivity of the manual worker in manufacturing. The most important contribution management needs to make in the 21st century is similarly to increase the productivity of knowledge work and knowledge workers.”[2]
In order to achieve a fiftyfold increase in the productivity of knowledge workers we need to know two things: first, who are the knowledge workers and second, how to measure their productivity?
According to Peter Drucker knowledge workers are those who possess, utilize and create valuable knowledge[3]. In his definition the knowledge worker is an executive who knows how to allocate knowledge to productive use, just as the capitalist knew how to allocate capital to productive use[4]. In knowledge work the workers' knowledge comprise the means of production. The knowledge workers do not convert materials from one form to another, but convert knowledge from one form to another[24].
Now we can try to define productivity. There are plenty of opinions on how to measure the productivity of knowledge workers, but no single approach is accepted as standard[24]. Some people measure the productivity of knowledge workers by the output per unit time. Others measure knowledge workers by the effort put in producing the output e.g. time spent in valueadded activities. Yet others measure knowledge workers by the financial outcome produced by the output. Each of those approaches accounts for some important facet of knowledge work.
As Drucker noted, productivity is the application of knowledge to work[4].
What is the problem with quantifying efficiency in acquisition and application of knowledge in an organization? In one word  measurement. Indeed, it is very difficult to measure the efficiency in acquisition and application of knowledge for two reasons. Firstly, because knowledge is invisible. It is in people's brains, in books, in manuals, in working procedures, in files on hard disks, in short  everywhere. When the organization invests in acquiring new talent or in training its current people it is a challenge to measure what they have got for the money. Secondly, because acquiring knowledge is not enough. Attention must be paid if the knowledge is actually applied? If the knowledge is acquired but not fully utilized then the knowledge workers leave potential untapped.
Knowledge and knowledge discovery
"All knowledge is in response to a question. If there were no question, there would be no scientific knowledge." ~ Gaston Bachelard, The Formation of the Scientific Mind
In general, “knowledge” and “information” are vague terms. Here we will use the definitions from the Information theory.
Let us consider a classic example: we have a coin hidden in one of 8 equalsized boxes, as presented here. The probability to find the coin in any specific box is 1/8. In order to find the coin we are allowed to ask questions with "Yes" or "No" answers (binary questions). If an answer to a binary question allows us to exclude some of the possible locations of the coin then we say that we acquired some “information”. If the question allows us to exclude 50% of the possible locations we say that we acquired 1 bit of information.
Knowledge of coin location means we know it with certainty i.e. with a probability of 100%. The amount of missing information H is zero, if we know that one specific event has occurred i.e. if we need zero questions to ask.
“Missing information” is defined by the Information Theory introduced by Claude Shannon[1].
We can express the amount of missing information H in terms of the distribution of probabilities. In this example, the distribution is: {1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8} because we have 8 boxes with equal sizes. For calculating H we use Shannon's formula[1]. In a formal mathematical notation, the Shannon entropy H(X) of a discrete random variable X with possible values {x_{1}, x_{2}, ..., x_{n}} and a probability mass function P(X) is:
where n is the number of boxes, P is the probability to find the coin in each box.
We will use the interpretation of H as the amount of missing information, as explained in detail here.
The amount of missing information H in a given distribution of probabilities (p_{1}, p_{2},...p_{n}) is an objective quantity. H is independent of the strategy we choose of asking questions i.e. to acquire it. In other words, the same amount of information is there to be obtained regardless of the way or the number of questions we use to acquire it.
H depends only on the number of boxes and their size. The bigger the number of boxes, the bigger the average number of questions we need to ask in order to find the coin. The choice of strategy allows us to acquire the same amount of information by less or more than the average number of questions. The optimal strategy guarantees that we will get all the missing information by the average number of questions.
In the case of 2 boxes we will need to ask only one question. The value of the missing information H in this case is one. as calculated below:
With p equals 1/2 we acquire exactly 1 bit of information.
Knowledge discovery
So far, we have applied the Shannon formula to cases where the object to be found is known in advance. By asking binary questions of one bit each, we have been able to locate the hidden object.
However, in the reality of knowledge work, the "it" is not always known in advance. For example, in the popular “20questions” game., the object is only revealed through a process of questioning. In the case of a software developer, the "it" to be delivered may be a software program. In this scenario, the developer must acquire missing information from a variety of sources, including the business's requirements and user manuals on the technology to be used. Ideally, each question brings back a bit of information, and the developer must maintain a coherent understanding of all the answers received.
John Wheeler coined the phrase "it from bit," which symbolizes the idea that every item in the physical world has knowledge as its immaterial source and explanation[10]. According to this concept, reality arises from the posing of binary "yes/no" questions, as elaborated on here.
Knowledge discovery takes some knowledge to be discovered as input, or in other words, "how much you need to know."
We consider the knowledge discovery part as a black box, and we don't know how it produces tangible output in response to input questions. We do not consider whether the black box is operated by senior or junior developers.
The result of knowledge discovery is the knowledge discovered.
This knowledge is always real, as we have seen in both the "20 questions" and "Surprise 20 questions" games.
However, the knowledge that needs to be discovered is not always known in advance. As we have seen from the concept of "it from bit," it is not always possible to know in advance the prior knowledge or the knowledge required for a task. What we can do is quantify the knowledge discovered after the task has been completed.
Knowledge discovery process
"The completion of a mental activity requires two types of input to the corresponding structure: an information input specific to that structure, and a nonspecific input, which may be variously labeled “effort,” “capacity,” or “attention.” To explain man's limited ability to carry out multiple activities at the same time, a capacity theory assumes that the total amount of attention which can be deployed at any time is limited.." ~ Daniel Kahneman [19]
Knowledge is invisible, but as Drucker noted that a very large number of knowledge workers do both knowledge work and manual work[2]. He called those people “technologists” and stated they may be the single biggest group of knowledge workers. As examples for technologists Drucker provided surgeons, office clerks and dentists[2]. The first example I could think of are the IT jobs in general and software developers in particular. Manual work results in tangible output which is visible. That makes the manual part of knowledge work visible. Importantly the tangible output shows that the knowledge was actually applied. That helps us to define what is a knowledge worker.
The key word here is 'knowledge,' particularly the knowledge workers lacked before starting a task—the knowledge they needed to gather to successfully complete it.
This distinguishes 'manual' workers from 'knowledge' workers. For manual workers, much of the required knowledge is built into the machines they operate and the detailed procedures they follow. Conversely, knowledge workers need to engage in extensive knowledge acquisition as their tasks often demand a higher degree of problemsolving, and innovation.
In the picture below to the right we have Margaret Hamilton, whose code saved the Moon landing space mission of Apollo 11 in 1969, being awarded the Presidential Medal of Freedom by President Obama.
On the picture to the left we have Margaret standing next to part of the computer code she and her team developed for the Moon landing. What we witness is an example of the manual part of the knowledge work. The software developers had to type all of those pages manually. All the knowledge went from their brains, through their fingers and ended up as symbols on sheets of paper. The difficult and time consuming activity in creating Moon landing software was not in typing their alreadyavailable knowledge into source code. It was in effectively acquiring knowledge they did not already have i.e. getting answers to the questions they already had. Even more specifically, it was in discovering the knowledge necessary to make the system work that they did not know they were missing. The symbols are the tangible output of their knowledge work. The symbols are not the knowledge itself, but the trace of the knowledge left on the paper.
We can turn to physics for an analogy. A quantum (plural: quanta) is the smallest discrete unit of a physical phenomenon. For example, a quantum of light is a photon, and a quantum of electricity is an electron.
The tangible output represents the knowledge discovered through the knowledge discovery process, encapsulating the new information that has been acquired. We do not address the question of the quality of the tangible output, assuming that it meets appropriate standards.
There is no requirement for the tangible output to be symbols of source code contributed via a keyboard. Let's take as an example the case when we have bought furniture and need to assemble it. We need to read the instructions and discover how to use the tools to assemble the parts into a tangible output in the form of a piece of furniture. Here the quantum of tangible output is a tightened screw. When people assemble furniture they do knowledge work.
A knowledge discovery process consists of asking questions in order to gain the missing information H about "What" tangible output to produce and "How" to produce it.
In the case of knowledge discovery we are not able to calculate the amount of missing information H the way we did it with the gold coin , because there is no way to know the number of boxes n and the size of each box p_{i}. In real life knowledge discovery is a constant stream of questions asked and answers received. The number of boxes, their size and what's hidden in them constantly fluctuates based on the answers received. Each question may enlarge or shrink the set of possibilities. That is why in knowledge work we cannot apply the Shannon's formula and calculate the missing information H.
For that we will turn to Psychology where they study mental effort and cognitive load. Fortunately, there is scientific research that can help!
Different mental activities impose different demands on the limited cognitive capacity. An easy task demands little mental effort, and a difficult task demands much. Because the total quantity of mental effort which can be exerted at any time is limited, concurrent activities which require attention tend to interfere with one another. Human capacity to perform concurrent perceptual tasks is limited.
Easy tasks can be performed together with little interference, but more difficult tasks cannot. An even distribution of attention among concurrent activities is possible only at a low level of total mental effort. When total effort is high, one of the activities typically draws most of the attention, leaving little for the others. This suggestion implies that attention is divisible at low levels and more nearly unitary at high levels of effort[19].
As humans become skilled in a task, its demand for energy diminishes. The pattern of activity associated with an action changes as skill increases, with fewer brain regions involved. Talent has similar effects. Highly intelligent individuals need less effort to solve the same problems. The knowledge is stored in memory and accessed without intention and without mental effort[18].
Since humans can either type or ask a question then we conclude:
Now we have a means to infer the number of questions asked during a knowledge discovery process.
Example of a Human Knowledge Discovery Process
Let's use an example to see how we can infer the number of questions in practice. For that we'll have myself executing the task of typing on a keyboard the word “Honorificabilitudinitatibus”. It means “the state of being able to achieve honours” and is mentioned by Costard in Act V, Scene I of William Shakespeare's “Love's Labour's Lost”. With its 27 letters “Honorificabilitudinitatibus” is the longest word in the English language featuring only alternating consonants and vowels.
The way I will execute this task is to go to the "play text" or "script" of “Love's Labour's Lost”, look up the word and type it down. The manual part of the task is to type 27 letters. The knowledge part of the task is to know which are those 27 letters.
In order to track the knowledge discovery process I will put "1" for each time interval when I have a letter typed and "0" for each time interval when I don't know what letter to type.
I start by taking a good look at the word “Honorificabilitudinitatibus” in the script of “Love's Labours Lost”. That takes me two time intervals. Then I type the first letters “H”, “o”, and “n”.I continue typing letter after letter: “o”, “r”. At this point I cannot recall the next letter What should I do? I am missing information so I go and open up the script of “Love's Labours Lost” and I look up the word again. Now I know what the next letter to type is but acquiring that information took me one time interval. This time I have remembered more letters so I am able to type “i”,”f”,”i”,”c”,”a”,”b”,”i”. Then again I cannot continue because I have forgotten what were the next letters of the word, so I have to look it up again.in the script. That takes two more time intervals. Now I can continue my typing of “l”,”i”,”t”. At this point I stop again because I am not sure what were the next letters to type, so I have to think about it. That takes one time interval. I continue my typing with “u”,”d”,”i”. Then I stop again because I have again forgotten what were the next letters to type, so I have to look it up again in the script of “Love's Labours Lost”. That takes two more time intervals. Now I know what the next letter to type is so I can continue typing “n”,”i”.At this point I cannot recall the next letter. so I have to look it up again in the script That takes two more time intervals. After I know what the next letter to type is I can continue typing “t”,”a”,”t”,”i”,”b”,”u”,”s”. Eventually I am done!
At the end of the exercise I have the word “Honorificabilitudinitatibus” typed and along with it a sequence of zeros and ones.


H  o  n  o  r 

i  f  i  c  a  b  i 


l  i  t 

u  d  i 


n  i 


t  a  t  i  b  u  s 

0  0  1  1  1  1  1  0  1  1  1  1  1  1  1  0  0  1  1  1  0  1  1  1  0  0  1  1  0  0  1  1  1  1  1  1  1 
In the table we have separated the manual work of typing from the knowledge work of thinking about what to type.
The first row of the table shows the knowledge I manually transformed into tangible output  in this case the longest English word. The second row of the table shows the way I discovered that knowledge. There is a "0" for each time interval when I was missing information about what to type next. There is "1" for each time interval when I had prior knowledge about what to type next. Each "0" represents a question I needed to ask in order to acquire the missing information about what letter to type next. Each "1" represents prior knowledge.
In the exercise above we witnessed the discovery and transformation of invisible knowledge into visible tangible output.
We can also calculate the knowledge discovered by dividing the total number of questions asked by the total number of letters typed. Here H = 10/27 = 0.37 questions per symbol or 0.37 bits of information.
KEDE definition
Can we define a measure of how efficient a given knowledge discovery process is? It is reasonable to expect such a measure to have the following properties:
 It should be a function of the missing information H
 Its maximum value should correspond to H equals zero i.e. there is no need to ask questions, all knowledge is already discovered.
 Its minimum value should correspond to H equals Infinity i.e. we have no knowledge to start with.
 It should be continuous in the closed interval of [0,1]. This makes it very useful to be used as a percentage. This is because we need to be able to rank knowledge discovery processes by efficiency. The best ranked knowledge discovery process will have 100% and the worst 0%. That is practical and people are used to having such a scale.
 It should be anchored to a natural constraint and thus support comparisons across settings and applications
 It should be calculated per time period e.g. daily, weekly, quarterly, yearly
 It should infer H solely from observable quantities of the tangible output we can measure
To capture all of that we introduce a new metric named KEDE. KEDE is an acronym for KnowledgE Discovery Efficiency. It is pronounced [ki:d].
KEDE is inversely proportional on the knowledge discovered i.e. on the difference between knowledge required by a task and the prior knowledge of a person. In an extreme case, if the prior knowledge is a single unit, like the answer to a problem, the answer can be used to bypass cognitive processing of the problem. The net effect will be to allow the expert to reduce the use of cognitive capacity, as compared to a novice who does not have prior knowledge of the answer[30].
The more prior knowledge was applied the more efficient a Knowledge Discovery process is. Conversely, when a lot of required knowledge is missing then the knowledge discovery is less efficient.
Below is an animated example of calculating KEDE when we search for a gold coin hidden in 1 of 64 boxes.
In all six cases the required knowledge is 6 bits. That means, we need to ask six binary questions on average to find the gold coin. You can see that the knowledge to be discovered depends on the prior knowledge. KEDE quantifies the knowledge discovered i.e. the difference between required knowledge and prior knowledge.In order to increase KEDE of an organization the knowledge workers have to know "what" they are doing and "how" to do the "what". The more knowledge they apply the greater KEDE is. If knowledge workers start a project in a business domain they know nothing of, with a technology they know nothing of then KEDE will approach zero.
KEDE is high when the knowledge workers do possess the knowledge needed and apply it. The less knowledge discovered the better. The less questions the better. Less questions asked means lower level of perplexity for the humans involved.
KEDE is a general form of a metric for all Knowledge Discovery processes. That means for each specific context we have to define a specific way to calculate KEDE. For instance, if we do the knowledge work of a surgeon then both the knowledge discovered and the maximum amount of knowledge that could possibly be discovered need to be measured in a specific way.
Due to its general definition of Knowledge Discovery processes KEDE can be used for comparisons between productivity of organizations in different contexts. For instance to compare hospitals with software development companies! That is possible as long as KEDE calculation is defined properly for each context. In what follows we will define KEDE calculation for the case of knowledge workers who produce textual content in general and computer source code in particular.
Measuring software development
"...my teaching is a method to experience reality and not reality itself, just as a finger pointing at the moon is not the moon itself. A thinking person makes use of the finger to see the moon. A person who only looks at the finger and mistakes it for the moon will never see the real moon."~ Buddha via Thich Nhat Hanh
Now when we have defined what a Knowledge Discovery Efficiency (KEDE) is we need to be able to measure it in practice.
Unfortunately, in real settings the questions asked are invisible to us. They are asked by the knowledge workers either to themselves or to many other people. There is no way to get inside а human's head and count the questions asked while discovering knowledge. The process of discovering knowledge is a black box. We pragmatically adopt the positivist credo that science should be based on observable facts, and decide to infer the number of questions asked solely from the observable quantities we can measure. In reality the only thing we can measure is the tangible output.
We will simplify the reality that is knowledge work by accepting a few unproven assumptions or axioms. That is not the first time humans simplified reality in order to gain practical benefits. Euclid for instance created his geometry on the foundation of twentythree definitions, five unproven assumptions now known as axioms and five further unproven assumptions that he called common notions. Some of the definitions, such as “a point is that which has no part” and “a line is a length without breadth” are gross simplifications and abstractions not to be found in reality. Nevertheless, Euclidean geometry has been of great practical value to humankind. It has been used by the ancient Greeks through modern society to design buildings, predict the location of moving objects and survey land. However Euclidean geometry is not always applicable. The surface of a sphere satisfies all the other Euclidean axioms, but not the parallel postulate. On a sphere, the sum of the angles of a triangle is not equal to 180°. The surface of a sphere is not a Euclidean space, but locally the laws of Euclidean geometry are good approximations. Later on Lobachevsky created the "hyperbolic" geometry that rejects the validity of Euclid's fifth, the “parallel,” axiom. A modern use of hyperbolic geometry is in Einstein's General theory of Relativity. From all that historical review we shall acknowledge that founding knowledge discovery on assumptions follows the way science has been applied for practical gains.
Here are the assumptions our approach is based on:
 For the knowledge discovery process there is some true value of missing information H.
 The knowledge discovery process happens linearly as a constant stream of questions asked by a knowledge worker.
 At n regular time intervals we sample from the knowledge discovery process. This way we divide the constant stream of questions asked in n equally sized boxes. A sample can contain either a contributed symbol or nothing at all. If we sample nothing we assume there was a question asked by the knowledge worker, but no symbol was found. In other words  we assume that if there is no symbol contributed then there was a question asked.
 The samples are independent of one another, hence the order of the symbols and questions is irrelevant.
 Each symbol contributed was initially hidden in one of m equally sized boxes. The m equally sized boxes composed the search space M. There is no way to find the true value of m.
 Each symbol contributed was the result of a search done in the search space M by asking binary questions i.e. questions which are only answerable by "Yes" or "No".
 The knowledge worker always chooses the optimal strategy that allows for acquiring the missing information of the search space M with the average number of binary questions. In other words, we assume that with each question the knowledge worker removes half of the possibilities i.e. acquires one bit of information. That means that the number of questions asked per symbol contributed equals the missing information H_{M} of the search space.

At regular time intervals, a day or a week, we calculate the average number of questions per symbol by dividing the
total number of questions asked by the total number of symbols contributed during the interval.
It is a measure of the average uncertainty or "missing information" about each symbol,
assuming that each question contributes to reducing this uncertainty.
We know that the average number of questions per symbol equals the average missing information acquired by the underlying knowledge discovery process, because there is a mathematical theorem that supports this interpretation. The theorem states that the minimum average (expected) number of binary questions (also referred to as yes/no questions) required to determine an outcome of a random variable X lies between H(X) and H(X) + 1 [5]. This theorem provides a connection between Shannon entropy (H) and the expected number of binary questions needed to acquire missing information.
So, while the specifics of our model and the types of data it deals with may differ from many of the situations where information theory is traditionally applied, the concept of "missing information" as we've defined and used it does capture the same fundamental idea as entropy in information theory.

Efficiency means the smaller the average number of questions asked per symbol of source code contributed the better.
In other words  the less missing information per symbol of source code contributed the more efficient the
knowledge discovery process is.
More efficient knowledge discovery process doesn't always mean higher productivity!
Software development is a process of going from 0% knowledge about an idea to 100% knowledge of a finished product ready for delivery to a customer. In its simplest form, the job of software development is to reduce the number of unknowns to zero by the time the system is delivered. It is a process of knowledge acquisition and ignorance reduction[32]. Hence we model the software development process as a knowledge discovery process.
In software development the missing information H arrives through people  developers, managers, clients who are part of the process. Software developers serve as a communication channel that via asking questions encode information from the environment into symbols.
There are two major types of questions. First, looking for business knowledge about WHAT needs to be delivered. This information is expected from the people that requested the software developed. Second, looking for technical knowledge needed for the developers to know HOW to develop the WHAT.
Business knowledge usually is in the form of requirements. Technical knowledge is the expertise the software developers possess or need to acquire.
Unfortunately we don't have a way to capture the question software developers ask. We only have the output of their work in the form of computer code. Fortunately computer code is composed of symbols and now we will see how to calculate the average number of questions asked H using only the number of symbols of source code contributed per time unit.
Let's look at the example presented in the figure below.
We generalize the concept as follows. Both channels form a time series {x} = {x_{1} , . . . , x_{n} } of length n. We divide the time series {x} into S consecutive and nonoverlapping subseries w_{i}, where S is the number of symbols in the time series {x} and i is the subseries index ranging from 1 to S. Each subseries w_{i} consists of one symbol s_{i} and zero or more questions {q_{i}} = {q_{1} , . . . , q_{j} }, where j is in the interval [0,n/S]. Note, that each subseries w_{i} may have a different length. We count the number of questions q_{i} in each subseries w_{i}. Note that each subseries w_{i} contains only one symbol s_{i}. As per our assumptions, that means s_{i} was the result of a search in a search space M_{i} composed of m_{i} equally sized boxes. Next, we calculate the missing information H_{i} for each subseries w_{i} as equal to the number of questions q_{i}. After calculating the missing information H_{i} for each subseries, we construct a missing information sequence {H_{i}} of length S.
Finally, the average missing information H of the time series {x} is defined as the average of the missing information sequence {H_{i}} in the form of
$$H=\frac{1}{S}\sum _{t=1}^{S}{H}_{i}$$
(1)
We define the total number of questions asked Q for time series {x} as the sum of the missing information sequence {H_{i}} in the form
$$Q=\sum _{t=1}^{S}{H}_{i}$$
(2)
Hence the average missing information H equals the average number of questions asked to type one symbol.
$$H=\frac{Q}{S}$$
(3)
That is the same as the Information Rate formula, where Q is the Information Rate and S is the Symbols Rate.
However in our model we assume that the minimum symbol duration is one unit of time and is equal to the time it takes to ask one question. If that's the case, then in our timeseries model, each symbol (whether it's a question 'q' or a symbol 's') could be considered as occupying one unit of time t. Hence the sum of S and Q as measured in time units equals the length n of the time series {x}.
To apply the formula in practice we need to select a value for the length n of the time series {x}. The total available time T is implicitly contained in both Q and S. We add the total available time T explicitly and define r as the symbol rate (whether it's a question 'q' or a symbol 's') of the symbols channel S . r is measured in symbols per second and can be calculated using the symbol duration time t.
Now we have the length n of the time series {x} defined in terms of the total available time T and symbol rate r. We drop usage of n and instead from now on will be using N as maximum symbol rate.
We generalize for the sum of questions Q, symbols S and the maximum symbol rate N.
$$Q+S=N$$
(4)
When we insert that in the H formula we get:
Now we have a formula that allows us to calculate the average missing information H for a time series {x} if we know the maximum symbol rate N, the total available time T and the actual symbol rate S.
$$H=\frac{N}{S}1$$
(5)
Using the formula we can check if we'll get the same result as before. In our example the maximum symbol rate N is 9 symbols per 9 time units or 1 symbol per time unit and S is 3 symbols.
Let's also apply the formula to the typing of “Honorificabilitudinitatibus”:
We see that it took two questions for the first five letters, one question for the next seven letters, two questions for the next three letters, one question for the next three letters, two questions for the next two letters and two questions for the last seven letters. That is presented in the table below.
Questions (Q_{i})  Letters (S_{i})  Missing information (H_{i})  Maximum symbol rate (N_{i}) 

2  5  2/5  2+5 
1  7  1/7  1+7 
2  3  2/3  2+3 
1  3  1/3  1+3 
2  2  2/2  2+2 
2  7  2/7  2+7 
Sum(Q_{i}) = 10  Sum(S_{i}) = 27  Sum(N_{i}) = 10+27 
The last row has the total number of questions asked and total number of answers received i.e. letters typed. Since the average number of questions per symbol is the information acquired H, then H is 0.37 bits of information.
We have six subsets. Each subsets has its own missing information H_{i}. We can't just sum the missing information of the six subsets. The weighted sum of missing information is computed instead i.e. weighted by the size of the subsets:
If we use the H_{i} values from the table we get 0.37 bits of information:
The same can be calculated using the formula:
In order to use this formula in practice we need to know both S and N. We can count the actual number of symbols of source code contributed straight from the source code files. For N we want to use some naturally constrained value.
We define an instance of the metric KEDE  the general metric that we introduced earlier, for the case of knowledge workers that produce tangible output in the form of textual content to be:
$$KEDE=\frac{1}{1+H}=\frac{S}{N}$$
(6)
Anchoring N to natural constraints
N is the maximum number of symbols that could be contributed for a time interval by a single human being.
In the below formula for N we want to use some naturally constrained value.We pick T = 8 hours of work because that is the standard length of a work day for a software developer. Total working time consist of four components:
 Time spent typing (coding)
 Time spent figuring out WHAT to develop
 Time spent figuring out HOW to code the WHAT
 Time doing something else (NW)
We assume an ideal system where NW is zero.
To calculate the value of r we need to pick the symbol duration t.
The value of the symbol duration time t is determined by two natural constraints:
 the maximum typing speed of human beings
 the capacity of the cognitive control of the human brain
Typing speed has been subject to considerable research. One of the metrics used for analyzing typing speed is interkey interval (IKI), which is the difference in timestamps between two keypress events. We see that IKI is defined equal to the symbol duration time t. Hence we can use the research of IKI to find the symbol duration time t. Latest research that included 136,857,600 keystrokes from 168,960 participants, with on average about 810 key presses per participant, found that the average IKI is 0.238s [26]. There are many factors that affect IKI [6]. It is found that proficient typing is dependent on the ability to view characters in advance of the one currently being typed. The median IKI was 0.101s for typing with unlimited preview and for typing with 8 characters visible to the right of the tobetyped character but was 0.446s with only 1 character visible prior to each keystroke [7]. Another welldocumented finding is that familiar, meaningful material is typed faster than unfamiliar, nonsense material [8]. Another finding that may account for some of the IKI variability is what may be called the “word initiation effect”. If words are stored in memory as integral units, one may expect the latency of the first keystroke in the word to reflect the time required to retrieve the word from memory[9].
Cognitive control, also known as executive function, is a higherlevel cognitive process that involves the ability to control and manage other cognitive processes that permit selection and prioritization of information processing in different cognitive domains to reach the capacitylimited conscious mind. Cognitive control coordinates thoughts and actions under uncertainty. It's like the "conductor" of the cognitive processes, orchestrating and managing how they work together. Information theory has been applied to cognitive control by studying the capacity of cognitive control in terms of the amount of information that can be processed or manipulated at any given time. Researchers found that the capacity of cognitive control is approximately 3 to 4 bits per second[29][36], That means cognitive control as a higherlevel function has a remarkably low capacity.
Based on the above cited research we get:
 Maximum typing speed of human beings to be r=1/t=1/0,238=4.2 symbols per second
 Capacity of the cognitive control of the human brain to be approximately 3 to 4 bits per second. Since we assume one question equals one bit of information we get 3 to 4 questions per second.
 Asking questions is an effortful task and humans cannot type at the same time. If there was a symbol NOT typed then there was a question asked. That means the question rate equals the symbol rate, as explained here.
Since the question rate needs to equal the symbol rate we consider that 4.2 symbols per second is a rate higher than 3 to 4 bits per second. We need to get a symbol rate between 3 and 4 symbols per second. In order to get a round value of maximum symbol rate N of 100 000 symbols per 8 hours of work we pick symbol duration time t to be 0.288 seconds. That is a bit larger than what the IKI research found but makes sense when we think of 8 hours of typing. Having t of 0.288 seconds makes a symbol rate r of 3.47 symbols per second. That is between 3 and 4 and matches the capacity of the cognitive control of the human brain.
We define CPH as the maximum rate of characters that could be contributed per hour. Since r is 3.47 symbols per second we get CPH of 12 500 symbols per hour. We substitute T = h and r=CPH and the formula for N becomes:
where h is the number of working hours in a day and CPH is the maximum number of characters that could be contributed per hour.
We define h to be eight hours and get N to be 100 000 symbols per eight hours of work.
Anchoring KEDE to natural constraints
Using the new formula for N the formula for H becomes
Please note, that since N is calculated per hour so S also needs to be counted in an hour.
We see that the more symbols of source code contributed during a time interval the less missing information was there to be acquired. We want to compare the performance of different software development processes in terms of the efficiency of their knowledge discovery processes. Hence we rearrange the formula to emphasize that.
$$\frac{S}{h\times CPH}=\frac{1}{1+H}$$
(7)
The right hand part is the KEDE we defined earlier. We define an instance of the metric KEDE  the general metric that we introduced earlier, for the case of knowledge workers that produce tangible output in the form of textual content to be:
$$KEDE\mathit{=}\frac{S}{h\times CPH}$$
(8)
Now KEDE contains only quantities we can measure in practice. KEDE also satisfies all properties we defined earlier.
If we convert the KEDE formula into percentages then it becomes:
$$KEDE\mathit{=}\frac{S}{h\times CPH}\times 100\%$$
(9)
We can use KEDE to compare the knowledge discovery efficiency of software development organizations.
 Minimum value of 0 and maximum value of 100.
 KEDE approaches 0 when the missing information is infinite. That is the case of humans creating new knowledge. Examples are intellectuals such as Albert Einstein and startups developing new technologies such as Paypal.
 KEDE approaches 100 when the missing information is zero. That is the case of an omniscient being...like God!
 KEDE will be higher if knowledge workers don't spend time discovering new knowledge, but just applying prior knowledge.
 KEDE is anchored to the natural constraints of maximum possible typing speed and the capacity of the cognitive control of the human brain. Thus supports comparisons across contexts, programming languages and applications.
We would expect KEDE of 20 for an expert fulltime software developer, who mostly applies prior knowledge, but also creates new knowledge when needed.
If software developers don't have the knowledge needed they have to discover it. The knowledge they have to discover is the total of what they don't know they don't know and what they know they don't know. Prior knowledge is the easiest and the fastest to discover  it is there, one just applies it. In other words, when prior knowledge is applied then there is the most efficient knowledge discovery. Conversely, when a lot of knowledge is missing then the knowledge discovery is less efficient.
Using KEDE is the only metric that allows for quantifying human capital of any organization, happiness and productivity of and collaboration between knowledge workers.
Now we can see why we have the Buddha's quote in the beginning of this section. In our case the finger is the number of symbols of source code contributed and the moon is the number of questions asked during a time period. We are not interested in the symbols and the meaning they carry. We are interested in the number of questions developers ask themselves and others to discover the missing information.
Waste
"The general root of superstition : namely, that men observe when things hit, and not when they miss; and commit to memory the one, and forget and pass over the other."" ~ Francis Bacon
If we open the Linux Foundation projects web page we will see they proudly show 31.2M lines of code added weekly and 20M lines of code deleted weekly.
How should we evaluate that? Clearly adding something is good. Deleting may be good, but may be bad  depending on what was deleted. We need a way to quantitatively evaluate if deleting code is good or bad. For that we introduce the new metric named Waste (W).
Let's look at an example presented in the figure below. We have a developer asking questions about what symbol to type next.
On the horizontal axis we have the progression of time. At any given moment the developer can put something either in the questions channel or into symbols channel.
In our case we have three symbols for 9 units of time. The symbol X is wrong and represents an error in the transmission of questions into symbols. The arrow represents the fact that the error was detected and after that X was deleted and then replaced by the proper symbol s_{2}. Symbol X converted into question X.
Since the questions channel produces errors the symbol rate of the symbols channel is reduced by those errors. This is visible in our example where there are three symbols in total (X,s_{2},s_{3}). Since X is an error S is calculated using only the correct symbols (s_{2},s_{3}). Having S=3, Q=6 and D=1 we calculate the missing information H:
Which means the developer needs to ask three questions on average in order to type one symbol.
That is visible on the figure as well. H is also the average length of the messages w. In this case we have two messages w_{1} =“q_{1}q_{2}Xq_{3}q_{4}” and w_{2}=“q_{5}q_{6}”, where "X" is now a question. Their average length H is (5+2)/2=3.5. If X was not an error then we would have H=2 and three messages “q_{1}q_{2}","q_{3}q_{4}” and “q_{5}q_{6}”, then their average length H would have been 2. Hence the false symbol X increases H with 75%.
With some algebra we express H with S, D and N only.
where N is the maximum symbol rate for a time interval T, D is the number of symbols deleted during the same time interval, and S is the number of added symbols for the same time interval.
Let's examine how errors effectively reduce the maximum information that a channel can communicate.
Symbol Error Rate (SER) is the probability of receiving a symbol in error for a given time interval. Accordingly, the probability of receiving a symbol with no error is (1SER). Here is the formula for SER where D stands for the rate of symbols in error and S is the symbol rate of the total number of transmitted symbols.
SER is the name used in information theory. We need a name that would better represent the same quantity in the context of knowledge work. What actually happens is that with erroneous questions we waste knowledge discovery capacity.
Hence we will call this quantity Waste (W).
where D stands for the number of symbols deleted during a time interval T, and S is the number of added symbols for the same time interval.
In the example we have 1 error symbol X out of a total of 3 symbols, hence W is 33%.
Now we can amend the formula for calculating H and account for the probability of receiving a symbol without error.
If we have no error symbols then W is 0 and we've wasted no questions. This case H_{W=0} is also called actual missing information of the system H(X), which in our case equals the mutual information, as explained here.
In the context of information theory, if X and Y are both binary variables and the mutual information I(X;Y) equals the missing information H(X), it means that knowing Y removes all uncertainty about X. In other words, the conditional missing information H(XY) is zero.
The perceived missing information (H) can be considered as the sum of the information conveyed H_{W=0} (mutual information, or the information that was intended to be transmitted) and the information lost due to errors (which could be thought of as noise or interference)
$H\mathit{=}Mutual\mathit{}Information\mathit{+}Information\mathit{}Lost$
If we consider our example scenario where X is the correct symbol and Y is the answers to my questions, perceived missing information is the sum of the actual missing information of a system H(X) and the additional missing information perceived due to incorrect or misleading information H_{lost}. It represents the total amount of uncertainty perceived about a system, taking into account both the actual uncertainty and the additional uncertainty introduced by incorrect or misleading information. H_{lost} represents the amount of information that was thought to be gained but was actually not due to incorrect answers.
Mathematically, you could express this as:
$${H}_{perceived}\mathit{=}H\mathit{\left(}X\mathit{\right)}+{H}_{lost}$$
In our context:
$${H}_{W=0}=H\mathit{\left(}X\mathit{\right)}$$
$$H={H}_{perceived}$$
So, in this context, we could express this as:
Now we can calculate the amount of information lost due to errors for the case where W = 33%. If we have no error symbols out of 3 then W = 0 and we've wasted no questions.
It took one erroneous symbol X so W = 1/3.
Then we calculate the lost information due to waste:
As another example, let's have 2 error symbols out of 3. Then W = 2/3 = 67%.
Which shows that we've wasted 6 bits:
We simplify the expression for H_{lost} as:
$${H}_{lost}=H{H}_{W=0}=\frac{NW}{S(1W)}$$
(10)
This equation represents the difference in knowledge when there is no waste (W=0) versus when there is some amount of waste (W>0). As W increases, the amount of lost knowledge H_{lost} also increases.
Since KEDE is a function of H we get:
$$KEDE=\frac{1}{1+H}=\frac{1}{1+\frac{N}{S\left(1W\right)}1}=\frac{S\left(1W\right)}{N}$$
(11)
Substituting N the KEDE formula becomes:
$$KEDE\mathit{=}\frac{S\mathit{\left(}1W\mathit{\right)}}{h\times CPH}\times 100\%$$
(12)
where S is the actual symbols of source code contributed per hour, h is the number of working hours in a day, CPH is the maximum number of symbols of source code that could be contributed per hour, and W stands for the waste probability.
Below is presented how W affects KEDE and H:
where for illustration KEDE equals 10. We see that when W is bigger than 0.6 then H grows nonlinearly beyond 20.
Lost time
Quite often knowledge workers complain about much smaller things that limit their productivity. For instance, many people complain that meetings usually waste people's time. The claim is that if there are fewer meetings or at least shorter ones, their "lost time" will be saved and used for more productive work.
Why do we call the "lost time" hidden? That is because when calculating KEDE we assume that all the time knowledge workers spent consisted of asking questions and typing symbols. That means there is no Nonworking time during a work day. Hence if there is Nonworking time greater than zero then it was not taken into account or hidden from the formula. The factor we add to the KEDE formula should change the result in case we find the hidden "lost hours". The intuition here is that the calculated KEDE should increase in case we manage to eliminate or decrease the hidden Nonworking "lost time".
We call the factor  Efficiency Multiplier (EM) and calculate it using the formula:
where h are the 8 working hours, l are the hours lost in unproductive activities.
Efficiency Multiplier allows for estimating how much knowledge discovery efficiency should increase if we cut some of the hidden time that is lost in Nonproductive activities. Below is visualized the relationship between "lost time" and Efficiency Multiplier (EM).
It is a nonlinear relationship. When there are no hours lost the EM=1 and KEDE stays the same as previously calculated. If the lost time is up to four hours then Efficiency Multiplier and KEDE would increase in approximately a linear way. That means each minute saved would increase KEDE proportionately. If the lost time is more than six hours then the increase in KEDE would explode  one minute saved would increase KEDE disproportionately. That would mean the real KEDE was very high but hidden inside a work day.
We can now quantify the effect of saved time by adding to the KEDE formula the Efficiency Multiplier to account for the hidden "lost time".
KEDE formula
Where:
 S is the number of added symbols in a day;
 h is the number of working hours in a day;
 CPH is the maximum number of characters that could be contributed per hour;
 l are the hours wasted on nonproductive activities;
 W stands for the waste probability.
The above formula means that individual KEDE is:
 Inversely proportional to the difference between the level of knowledge the individual has and the level of knowledge required to do a job;
 Inversely proportional to the amount of waste the individual removes while doing the job;
 Inversely proportional to the amount of time wasted on nonproductive activities;
Conclusion
In the past doctors' determination of whether fever was present was qualitative. Physicians regularly touched the foreheads of their patients and gauged what kind and how bad the fever was. Not only the degree of heat was important but also its kind  healthy body heat and febrile heat. The latter was also characterized by qualitative aspects and could, for example, be “sharp” or “biting”. Scientists, however, wanted to discover reproducible laws in medicine, and the verbal descriptions were not working. Words are idiosyncratic; they vary from doctor to doctor and even for the same doctor from day to day. Numbers never waver. While today we trust numbers and measurements regarding medical issues, in ancient times it was not thought that phenomena like health could be recorded by numbers, let alone be measured. Then Sanctorio Sanctorius, a member of Galileo Galilei's circle of friends, developed the first fever thermometers to determine his patients' body heat. This was a paradigm shift.
The present day attitude toward measuring knowledge work very much resembles how temperature was understood 400 years ago. It is time to get rid of qualitative description of knowledge work and start describing it in numbers. It is time to stop measuring knowledge work by the tangible output i.e. working software with the requested functionality. It is time to stop measuring knowledge work by the effort put in producing the tangible output. It is time to start measuring the knowledge work by its essence which is the knowledge discovery itself. The new paradigm is represented by the knowledge discovery efficiency KEDE. KEDE can help us quantitatively understand what is going on inside the black box that is an organization developing software.
Additional quantities based on the KEDE model
Calculating Knowledge Discovered
Knowledge discovered is the cumulative sum of missing information (or questions, in our model) over a specific time period. It signifies the total amount of information contained in a set of messages, which is derived from the sum of the information in each individual message.
In our model, knowledge discovered sums the number of questions associated with each symbol in the time series, effectively quantifying the total amount of missing information or uncertainty across the entire series.
In (2), we defined the total number of questions asked Q for the time series {x} as the sum of the missing information sequence {H_{i}}. Now, we aim to calculate it using only quantities that we can measure in practice. Utilizing equations (4) and (11), we can do so as follows:
$$Q=NS=NS+DD=N\left(SD\right)D=NN\times KEDED$$
$$Q=N\left(1KEDE\right)D$$
Here, N represents the maximum symbol rate for a time interval T, D denotes the number of symbols deleted during the same time interval, and KEDE is the Knowledge Discovery Efficiency for the same time interval.
This calculation is based on the assumption that the messages are independent and identically distributed, meaning each message has the same probability and the receipt of one message doesn't affect the probabilities of the others.
This independence allows us to simply add the information content of each message to determine the total information. If the messages weren't independent, we would need to consider the joint probabilities of the messages to calculate the total information.
In standard information theory, there isn't a direct equivalent for 'knowledge discovered'. However, it can be loosely associated with the sum of the information content of all the messages. It can be viewed as a raw count of the total number of bits, not taking into account any probabilities or statistical dependencies that might exist among different bits. This perspective stems from the basic concept of a bit as a measure of information, and the fact that the total amount of information in a set of messages is the sum of the information in each individual message. For instance, if we have three messages and each message carries 2 bits of information, then the three messages will collectively carry a total of 2 * 3 = 6 bits of information.
However, it's crucial to underline that the concept of 'knowledge discovered', as defined in our model, doesn't perfectly map onto any single concept in standard information theory. It's a distinct measure, specific to our model and the KEDE framework.
Measuring the balance between individual capabilities and work complexity
KEDE is a ratio between the knowledge discovered and the maximum knowledge that could be discovered for a time period, as explained in details here:
$$KEDE\mathit{=}\frac{\mathit{1}}{\mathit{1}\mathit{+}H}=\frac{S}{Q+S}$$
where Q is the total number of questions asked in a time interval, S is the total number of answers acquired for the same time interval.
KEDE is continuous in the closed interval of (0,1]. KEDE is inversely proportional on the questions asked i.e. on the difference between knowledge required by a task and the prior knowledge of a person, and proportional to the answers acquired
We have only two possible outcomes, Answer and Question, with probabilities KEDE and (1KEDE), respectively. For calculating the balance between questions and answers we use Shannon's formula[1]
$$Balance\left({p}_{\mathit{1}}\mathit{,}\mathit{}{p}_{\mathit{2}}\right)=\sum _{\mathrm{i}=1}^{2}{\mathrm{p}}_{\mathrm{i}}{\mathrm{log}}_{2}{\mathrm{p}}_{\mathrm{i}}$$
In this context, p_{1} = KEDE and p_{2} = (1KEDE) and the Balance function of one variable is:
$$Balance\left(KEDE\right)=KEDE\times {\mathrm{log}}_{2}KEDE\left(1KEDE\right){\mathrm{log}}_{2}\left(1KEDE\right)$$
Figure below shows the function Balance(KEDE).
This function exhibits positive values throughout, presents a concave or downward curve, and peaks at 1 when KEDE equals 1/2. Importantly, the function returns zero at both KEDE = 0 and KEDE = 1, denoting maximum imbalance.
This property is consistent with what we intuitively expect from a quantity that measures the balance between questions and answers. If KEDE = 1, then we know for certain that now questions were asked. If KEDE = 0, then we know for certain that no answers were acquired. In both cases, there was no balance between questions and answers.
When KEDE is equal to 0, the developer may be in a state of anxiety, as the challenges are too great. On the other hand, when KEDE is equal to 1, the developer may be in a state of boredom, as the challenges are too low.
The optimal state is when KEDE is equal to 1/2, as this indicates a balance between questions and answers. For this case we get:
$$Balance\left(0.5\right)=0.5{\mathrm{log}}_{2}0.50.5{\mathrm{log}}_{2}0.5={\mathrm{log}}_{2}2=1$$
The numerical value of balance in this case is one. This state of KEDE = 1/2 is referred to as flow, and it is characterized by a balance between the challenges of software development and the individual's capabilities. Flow, as defined by Csikszentmihalyi, occurs at the boundary between boredom and anxiety, and is an optimal experience[33][34].
Flow is by definition an optimal experience,and is described as occurring rarely in regular life[34] Many psychological constructs, such as happiness, anxiety, and selfefficacy, represent continuous (i.e., spectrum and dimensional) constructs. Despite the fact that Csikszentmihalyi and his colleagues have conceptualized and operationalized flow as a discrete construct, a significant majority of studies have operationalized it as a continuous construct, which can be applied to the full range of participants' experience across the spectrum of conscious experience[35]. That makes sense looking at the diagram of balance as a function of KEDE.
It is clear that for any value of 0 < KEDE < 1/2, we have less balance than in the case KEDE = 1/2. For the case of KEDE=0.1 we get:
$$Balance\left(0.1\right)=0.1{\mathrm{log}}_{2}0.10.9{\mathrm{log}}_{2}0.9=0.47$$
This is the case of individual experience tending to anxiety, as its limiting case from the left.
It is also clear that for any value of 1/2 < KEDE < 1, we again have less balance than in the case KEDE = 1/2. For the case of KEDE=0.9 we get:
$$Balance\left(0.9\right)=0.9{\mathrm{log}}_{2}0.90.1{\mathrm{log}}_{2}0.1=0.47$$
This is the case of individual experience tending to boredom, as its limiting case from the right.
In general, values of KEDE less than 1/2 indicate a lack of balance and a tendency towards anxiety, while values greater than 1/2 indicate a lack of balance and a tendency towards boredom. In both cases, the level of balance is less than in the case of KEDE=1/2.
Comparing lost information between developers
We can use H_{lost} as a metric to compare two different software developers.
H_{lost}, as we've defined it here, represents the amount of information that was thought to be gained but was actually not due to incorrect answers. A lower H_{lost} would indicate a more efficient or accurate software development process, because it means less information was lost due to incorrect answers.
Here's how we might use it:
 Calculate H_{lost} for each process For each process, calculate H_{lost} using equation (10) based on the number of incorrect answers and the apparent reduction in missing information that was initially thought to have been achieved with each incorrect answer.

Calculate Information Loss Rate (L) Divide H_{lost} by the total perceived missing information for each process. By normalizing by H_{perceived}, we're essentially measuring the proportion of the total perceived missing information that was due to incorrect or misleading information. A lower value would indicate a more efficient process, because it means a smaller proportion of the perceived missing information was due to incorrect or misleading information. The formula would be:
$$L\mathit{}\mathit{=}\frac{{H}_{lost}}{{H}_{perceived}}=\frac{\frac{NW}{S\mathit{\left(}1W\mathit{\right)}}}{\frac{N}{S\mathit{\left(}1W\mathit{\right)}}1}\mathit{=}\frac{\frac{NW}{S\mathit{\left(}1W\mathit{\right)}}}{\frac{N\mathit{}S\mathit{\left(}1W\mathit{\right)}}{S\mathit{\left(}1W\mathit{\right)}}}\mathit{=}\frac{NW}{N\mathit{}S\mathit{(}\mathit{1}\mathit{}W\mathit{)}}$$
(13)
This normalized metric allow us to compare the efficiency of different processes even if they involve different numbers of questions or different initial entropies. The initial missing information (H_{initial}) for each process would typically be H(X), which is the actual (objective) missing information of the system representing the average minimum number of binary questions needed to reduce uncertainty to zero.
Let's consider the our example where we have two different levels of Waste: W=33% and W=67%. Firts we calculate the Information Loss Rate (L) for the case where W = 33%.
$$L\mathit{=}\frac{NW}{N\mathit{}S\mathit{\left(}\mathit{1}\mathit{}W\mathit{\right)}}\mathit{=}\frac{\mathit{9}\frac{\mathit{1}}{\mathit{3}}}{\mathit{9}\mathit{}\mathit{3}\mathit{\left(}\mathit{1}\mathit{}\frac{\mathit{1}}{\mathit{3}}\mathit{\right)}}\mathit{=}\frac{\mathit{3}}{\mathit{9}\mathit{}\mathit{3}\mathit{+}\mathit{1}}\mathit{=}\frac{\mathit{3}}{\mathit{7}}\mathit{=}\mathit{0.43}$$
This means that, on average, a loss of about 0.43 bits of information per question asked due to incorrect answers.
Next we calculate the Information Loss Rate (L) for the case where W = 67%.
$$L\mathit{}\mathit{=}\frac{NW}{N\mathit{}S\mathit{\left(}\mathit{1}\mathit{}W\mathit{\right)}}\mathit{=}\frac{\mathit{9}\frac{\mathit{2}}{\mathit{3}}}{\mathit{9}\mathit{}\mathit{3}\mathit{\left(}\mathit{1}\mathit{}\frac{\mathit{2}}{\mathit{3}}\mathit{\right)}}\mathit{=}\frac{\mathit{6}}{\mathit{9}\mathit{}\mathit{3}\mathit{+}\mathit{2}}\mathit{=}\frac{\mathit{6}}{\mathit{8}}=\mathit{0.75}$$
This means that, on average, a loss of about 0.75 bits of information per question asked due to incorrect answers.
Appendix
Why KEDE and not H?
Let's see how KEDE and H change based on the same symbols typed S:
For instance when we have KEDE=0.2 which means 20 000 symbols typed per work day then H is:
When we have KEDE=0.01 which means 1000 symbols typed per work day then H is:
When KEDE is 0.00001 which means 100 symbols typed per work day then H is:
When KEDE is 0.00001 which means 1 symbol typed per work day then H is:
It is better to use KEDE instead of H for quantifying efficiency of software knowledge discovery because H is a mathematical quantity that has many usages across sciences. KEDE on the other hand as constrained by the maximum typing speed of a human being. That makes it specific for knowledge discovery processes that have typed symbols as output.
Information Rate
The Information Rate, also known as Entropy Rate, is a measure of the average missing information or uncertainty per unit of time. It is represented by R and is calculated using the rate at which messages are generated and the average missing information of a message. Given that each message in our model contains only one symbol, we can simplify the formula as follows:
$$R=rH$$
In this equation, the key concepts involved are:
 R is the Information Rate (or Source rate), which is the rate at which information is generated by the source, typically measured in the number of bits of information per second. In this context, a bit is not just a binary digit (0 or 1); it's a measure of information content. It's based on the choice between two equally likely alternatives (like the flip of a fair coin). So, when we say "bits per second," we're talking about how much information is being conveyed, not just the number of binary digits. For example, an information rate of 10 Mbps (megabits per second) means that 10 million bits of information are transmitted each second.
 r is the Symbol Rate (or Baud Rate), which is the rate at which symbols are transmitted over a channel, typically measured in symbols per second (baud). A symbol in this context is a distinct state or condition of the communication channel that can represent a certain amount of information.
 H is the Information per Symbol, which is the amount of information that can be represented by a single symbol. It is typically measured in the number of bits of information per symbol. The information per symbol is calculated by taking the log_{2}(N), where N is the number of distinct symbols or states the channel can represent. For example, in a binary system, each symbol represents 1 bit of information, but in more complex systems, a symbol can represent more bits.
Information Rate is calculated as follows:
$$R=\left(rin\frac{symbols}{sec}\right)*\left(Hin\frac{bits}{symbols}\right)=\frac{bits}{second}$$
In Information Theory, while there isn't a specific theorem named for the direct relationship between symbol rate, information per symbol, and information rate, the concepts are fundamentally tied to the principles laid out by Claude Shannon, particularly in his seminal work, "A Mathematical Theory of Communication"[1]. The relationship between these quantities is a foundational aspect of information theory and digital communication.
To calculate the Information Rate in our model, we first define what constitutes a "unit of time". In our model, each symbol (either a question 'q' or a symbol 's') occupies one unit of time t. The total time T is nt units. We divide the time series into S consecutive and nonoverlapping subseries w_{i}, each containing one symbol and zero or more questions.
The rate r at which messages are generated is calculated as follows:
$$r=\frac{S}{T}=\frac{S}{nt}$$
Here, t is the duration of each symbol, n is the number of time units, S is the number of messages (symbols), and T is the total time period (total number of time units in the time series).
The average missing information H of a message is given by formula (1). Using this and formula (2), the Information Rate can be calculated as:
$$R=rH=\frac{S}{nt}\frac{1}{S}\sum _{t=1}^{S}{H}_{i}=\frac{1}{nt}\sum _{t=1}^{S}{H}_{i}=\frac{Q}{nt}$$
Here, the Knowledge discovered Q is the sum of the missing information of all messages in the time series. The terms t, n, S, and T have the same definitions as previously mentioned.
If we know the Information Rate (R) and the Symbol Rate (r), we can calculate the Information per Symbol (H), which is often related to the concept of entropy or average information in a symbol. For example, if the Information Rate is 6 bits per second and the Symbol Rate is 3 symbols per second, then:
$$\mathrm{H}=\mathrm{}\frac{\mathrm{R}\left(\mathrm{bits}/\mathrm{T}\right)\mathrm{}}{\mathrm{r}\left(\mathrm{symbols}/\mathrm{T}\right)}\mathrm{}=\frac{6\mathrm{bits}/\mathrm{T}}{3\mathrm{symbols}/\mathrm{T}}=2\mathrm{bits}/\mathrm{symbol}$$
This means each symbol carries 2 bits of information. This calculation assumes that the time series is stationary and ergodic, meaning the statistical properties of the time series do not change over time.
Example Information Rate calculation
Let's say the human can accept information from the external world at the rate of 10 bits per second. Thye write down symbols at the rate of 5 symbols per scond. What would be the Information rate (R), Symbol Rate (r), and Information per symbol (H)?
Based on the information provided, we can define the following parameters:
 Human Information Acceptance Rate: This is the rate at which a human can accept information from the external world. You've stated this is 10 bits per second. This can be considered as the Information Rate (R) in this context.
 Symbol Writing Rate: This is the rate at which the human writes down symbols, given as 5 symbols per second. This is the Symbol Rate (or Baud Rate, r).
Now, let's define and calculate each term:
 Information Rate (R): This is already given as 10 bits per second. It's the rate at which the human can process information.
 Symbol Rate (r): This is also given as 5 symbols per second. It's the rate at which symbols are written down.

Information per symbol (H):
We can calculate the average amount of information per symbol, often referred to as "bits per symbol," using the information rate and the symbol rate. This calculation assumes that the information is evenly distributed across the symbols.
$$\mathrm{H}=\frac{\mathrm{R}\left(\mathrm{bits}/\mathrm{second}\right)}{\mathrm{r}\left(\mathrm{symbols}/\mathrm{second}\right)}=\frac{10}{5}=2\mathrm{bits}/\mathrm{symbol}$$
Information Rate calculation for the longest English word
Let's calculate Information Rate (R), Symbol Rate (r) and Information per symbol (H) for the knowledge discovery process of the word “Honorificabilitudinitatibus” from here.
Let's first understand the data and assumptions provided:
 The word "Honorificabilitudinitatibus" has been typed, and the typing process is represented by a sequence of 0s and 1s, where "0" represents a time interval of uncertainty about what letter to type next, and "1" represents a time interval with knowledge of the next letter to type.
 The total time to generate the sequence is assumed to be 1 second.
 Each "0" (uncertainty) is equivalent to searching among a number of equally sized boxes for the letter to type, representing 1 bit of information gained when the letter is identified.
 Each '1' represents 0 bits of information gained.
Now, let's calculate:
 Total Symbols: Count the total number of symbols ("1"s) in the sequence. By counting, we find there are 27 symbols in total.
 Symbol Rate (r): Given the total time is 1 second, the symbol rate is the total number of symbols divided by time (in seconds)or 27.0 symbols per second.
 Information Bits: we consider the information content of both '0's and '1's. Since '0's represent 1 bit and '1's represent 0 bits, and there are 10 instances of "0" and 27 instances of "1" in the sequence, we get (10x1 bits +27x0 bits) = 10 bits of information were needed throughout the typing process.
 Information Rate (R): The total information bits divided by total time (in seconds). The information rate is 10 bits per second, given the total time to type the sequence is 1 second.
 Information per Symbol (H): The total information bits divided by the total number of symbols, which is H=10/27=0.37 bits of information for each symbol in the sequence.
This analysis provides a quantitative view of the typing process for the word "Honorificabilitudinitatibus," framing it within the context of information theory concepts like information rate and symbol rate.
Bit Rate and Information Rate
There is a distinction between these two ideas. The bit rate is the advertised bit rate of a communications channel. When we measure the rate of data transmission or processing, we use "bits per second." This unit inherently refers to the transmission of binary digits per second. For example, a data rate of 1 Mbps (megabit per second) means that 1 million binary digits (bits) are transmitted each second. In the case of an ISDN B channel in Europe it is 64 000 bits per second.
Information rate is the rate at which information is passed over the channel. The information rate can never be higher than the bit rate, but it might be lower.
An ISDN telephone transmits and receives 64 000 bits per second, even when there is total silence.
Total Information
In information theory, the total information of a sequence of symbols is derived from the fundamental concept of entropy, as defined by Claude Shannon. It represents the total quantity of information or the number of bits required to uniquely encode a sequence of symbols.
For a source that emits symbols from an alphabet of size $N$ (that is, there are $N$ possible symbols it can emit), each with equal probability, the entropy per symbol (average information content per symbol) is ${\mathrm{log}}_{2}(N)$.
If the source emits $L$ symbols, the total entropy or total information content of that sequence of symbols is $L{\mathrm{log}}_{2}(N)$. This represents the number of bits required to uniquely encode and store the sequence of symbols, assuming an optimal encoding.
This is a direct application of the definition of Shannon entropy for a discrete uniform distribution.
For a uniform distribution, P(xi) = 1/N for all i, so this simplifies to H(X) = log_{2}(N). So the total entropy of a sequence of L independent samples from X is L * H(X) = $L{\mathrm{log}}_{2}(N)$., which is the total information content of that sequence of symbols.
How to cite:
Bakardzhiev D.V. (2022) Understanding the details of Knowledge Discovery Efficiency (KEDE). https://docs.kedehub.io/kede/whatiskedederivation.html
Works Cited
1. Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal. 1948;27(3):379423. doi:10.1002/j.15387305.1948.tb01338.x
2. Drucker , Peter F, “KnowledgeWorker Productivity: The Biggest Challenge,California Management Review, vol. 41, no. 2, pp. 79–94, Jan. 1999, doi: 10.2307/41165987.x
3. Drucker , Peter F. (1959). The Landmarks of Tomorrow New York: Harper and Row, p.93.
4. Drucker , Peter F. (1993), Postcapitalist Society, Butterworth Heinemann, Oxford.
5, Cover, T. M. and Thomas, J. A. (1991), Elements of Information Theory, John Wiley and Sons, New York. page.95 in 5.7 SOME COMMENTS ON HUFFMAN CODES
6. Ostry, D. J. (1983). Determinants of interkey times in typing. In W. E. Cooper (Ed.), Cognitive aspects of skilled typewriting (pp. 225246). New York: SpringerVerlag
7. Shaffer, L, H. (1976). Intention and performance. Psychological Review, 83, 375393.
8. Shaffer, L. H. (1973). Latency mechanisms in transcription. In S. Kornblum (Ed.), Attention and performance IV (pp. 435446). New York: Academic Press.
9. Ostry, D. J. (1980). Executiontime movement control. In G. E. Stelmach & J. Requin (Eds.), Tutorials in motor behavior (pp. 457468). Amsterdam: NorthHolland.
10. Wheeler, J. A. (1990). Information, physics, quantum: The search for links. In W. H. Zurek (Ed.), Complexity, entropy, and the physics of information (Vol. 8, pp. 3–28). Taylor & Francis.
11. Oral history interview with John Archibald Wheeler, 1967 April 5. by Wheeler, John Archibald, 19112008 [Online]. Available: Transcript
12. [Online]. Available: Do Our Questions Create the World?
13. P.C.W. Davies and J.R. Brown, The ghost in the atom, Cambridge University Press, 1986.
14. Garvin DA. Building a learning organization. Harv Bus Rev. 1993 JulAug;71(4):7891.
15. Guide on Measuring Human Capital, UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE, 2016 [Online]. Available: https://unstats.un.org/unsd/nationalaccount/consultationDocs/HumanCapitalGuide.web.pdf
16. Kaplan DM (2012) How to demarcate the boundaries of cognition. Biol Philos 27(4):545–570
17. Deming, W. Edwards. 2018. Out of the Crisis. The MIT Press. Cambridge, Mass
18. Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
19. Kahneman D. (1973). Attention and Effort. Englewood Cliffs, NJ: PrenticeHall
20. Senge, P.M. 1990: The Fifth Discipline , Random House, 1990
21. Keeley, Brian, 2007, “Human Capital, How What You Know Shapes Your Life,” OECD Insights, Paris
22. Baron, A. (2011), "Measuring human capital", Strategic HR Review, Vol. 10 No. 2, pp. 3035. https://doi.org/10.1108/14754391111108338
23. Carroll, Lewis, 2009, Through the LookingGlass and What Alice Found There, Evertype
24. Ramírez, Y.W. and Nembhard, D.A. (2004), "Measuring knowledge worker productivity: A taxonomy", Journal of Intellectual Capital, Vol. 5 No. 4, pp. 602628. https://doi.org/10.1108/14691930410567040
25. Alves, R. A., Castro, S. L., & Olive, T. (2008). Execution and pauses in writing narratives : Processing time, cognitive effort and typing skill. International Journal of Psychology.
26. Dhakal, V., Feit, A. M., Kristensson, P. O., & Oulasvirta, A. (2018). Observations on Typing from 136 Million Keystrokes. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 112). Association for Computing Machinery. https://doi.org/10.1145/3173574.3174220
27. Holsbeeke, L., Ketelaar, M., Schoemaker, M. M., & Gorter, J. W. (2009). Capacity, Capability, and Performance: Different Constructs or Three of a Kind? Archives of Physical Medicine and Rehabilitation, 90(5), 849–855. https://doi.org/10.1016/j.apmr.2008.11.015
28. Levitin, D. J. (2015). The Organized Mind: Thinking Straight in the Age of Information Overload (Illustrated edition). Dutton.
29. Wu, T., Dufford, A. J., Mackie, M. A., Egan, L. J., & Fan, J. (2016). The Capacity of Cognitive Control Estimated from a Perceptual Decision Making Task. Scientific Reports, 6, 34025.
30. Bruce K. Britton, Abraham Tesser, Effects of prior knowledge on use of cognitive capacity in three complex cognitive tasks, Journal of Verbal Learning and Verbal Behavior, Volume 21, Issue 4, 1982, Pages 421436, ISSN 00225371, https://doi.org/10.1016/S00225371(82)907095.
31. Phillip G. Armour. 2004. Beware of counting LOC. Commun. ACM 47, 3 (March 2004), 21–24. https://doi.org/10.1145/971617.971635
32. Armour, P.G. (2003). The Laws of Software Process, Auerbach
33. Csikszentmihalyi, M. 1990. Flow: the psychology of optimal experience. United States of America: Harper & Row.
34. Csikszentmihalyi, M 1975. Beyond Boredom and Anxiety: The Experience of Play in Work and Games. San Francisco: JosseyBass
35. Abuhamdeh S (2020) Investigating the “Flow” Experience: Key Conceptual and OperationalIssues. Front. Psychol. 11:158.doi: 10.3389/fpsyg.2020.00158
36. Guan, Q., Wang, J., Chen, Y., Liu, Y., & He, H. (2021). Beyond information rate, the capacity of cognitive control predicts response criteria in perceptual decisionmaking. Brain and cognition, 154, 105788. https://doi.org/10.1016/j.bandc.2021.105788
Getting started