The Nature of Work
KEDE is independent of the nature of knowledge work.
All knowledge is qualitatively the same
"Knowledge is of no value unless you put it into practice." ~ Anton Chekhov
"An expert is not someone that gives you the answer, it is someone that asks you the right question." ~ Eliyahu M. Goldratt
Some may say that it is much more difficult to achieve a higher KEDE when working on a complex problem, than working on a simple problem. For instance developing embedded software to manage avionics on an aircraft compared with developing a web site. Here it is implied that a complex problem requires more knowledge than a simple one. That is a point we all agree with. However it is also implied that the knowledge needed for developing a web site is somehow qualitatively different from the knowledge needed for developing avionics software. Here qualitatively different means one is better than the other.
If that were true then KEDE could not be used for comparing developers, teams, projects and companies that work on different, in terms of complexity, problems!
Body of knowledge (BOK) refers to the complete set of concepts, terms and skills required to work in a particular domain or industry. There are many bodies of knowledge defined. For instance, there is the Software Engineering Body of Knowledge (SWEBOK) that describes generally accepted knowledge about software engineering. We acquire knowledge by asking questions. Different domains require different numbers of questions to be answered in order to learn their body of knowledge. We may say that the more questions needed to be answered the larger a body of knowledge is. In this regard some domains have larger bodies of knowledge than other domains.
A larger body of knowledge takes more time to learn than a smaller one. The market usually values higher professions which require a larger body of knowledge. The marker also puts a higher value on knowledge that is on demand. However, a higher market valuation doesn't make a body of knowledge better i.e. qualitatively different.
We argue that all knowledge is qualitatively the same. KEDE doesn't know which body of knowledge takes longer to acquire, and which takes less.
A senior web developer should not by default have a higher KEDE than a junior avionics developer. Using KEDE we don't quantify their level of expertise, but the difference between a their individual capability and the complexity of the work. Hence if we have a web developer with expert knowledge and an avionics developer with expert knowledge they could achieve similar KEDE.
Important is to emphasize that when we say "similar KEDE" this doesn't mean both developers share the same knowledge. The body of knowledge needed for web development is not the same as the body of knowledge needed for avionics development. They are two distinct domains and as such need different skills and experience. What we say is that even though the two bodies of knowledge are different, the level of mastering them should be the same for two senior developers! Put in another way - the senior web developer will be a junior avionics developer simply because the avionics domain is different from the web domain. However, for two experts in their own domains we would expect that the level of application of their expertise in their own domains to be equal. Using KEDE we quantify the level of application of their expertise in their own domains.
Let's take the Master of Science designation as another example. If Joe has MSc in Electrical Engineering and Don has MSc in Mechanical Engineering that doesn't mean Joe will be as good as Don if working as a mechanical engineer. However, MSc assures potential employers that both Joe and Don have mastered their distinct subjects at a similar level! We use KEDE to measure how efficiently people discover and apply knowledge, similarly to how MSc measures the educational level of a person.
Verbose languages
"A little knowledge that acts is worth infinitely more than much knowledge that is idle." ~ Kahlil Gibran
Programming languages can often be described as verbose or terse. Verbose means a language uses lots of symbols. Terse means fewer symbols. When using programming languages that are considered verbose, the developer must type relatively more symbols to develop the same functionality. When calculating KEDE we use the number of symbols of source code contributed in a time interval. We are interested to know how this will affect KEDE calculation?
Let's look at the mathematical notation which is the writing system used for recording concepts in mathematics. It makes mathematical concepts easier to write and remember. Let's take as an example the mathematical symbol nabla ∇. The symbol ∇, pronounced "del", denotes the vector differential operator used particularly in vector calculus.
The symbol ∇ is available in standard HTML as &nabla and in LaTeX as \nabla. In Unicode, it is the character at code point U+2207, or 8711 in decimal notation. Let's calculate KEDE for each of those cases.
Symbols typed | N | S | KEDE |
---|---|---|---|
∇ | 100 000 | 1 | 0.001% |
&nabla | 100 000 | 6 | 0.006% |
\nabla | 100 000 | 6 | 0.006% |
U+2207 | 100 000 | 6 | 0.006% |
8711 | 100 000 | 4 | 0.004% |
vector differential operator | 100 000 | 28 | 0.028% |
KEDE doesn't know and is not interested in languages. KEDE takes the number of symbols as input but actually calculates the number of questions asked. The less questions asked per symbol typed the greater the KEDE.
When calculating KEDE we assume that if there is no symbol typed then there was a question asked. Accordingly, the conclusion is that for the single ∇ typed there were 99 999 questions asked. When "vector differential operator" was typed there were 99 972 questions asked i.e. 27 questions less than for ∇. Hence, KEDE for ∇ should be 28 times lower exactly as we have.
If those extra 27 questions were not needed then instead of the single symbol ∇ there could have been the 28 symbols of ∇ defined in terms of partial derivative operators:
That is actually the reason mathematical notation was developed - to save time for knowledge discovery i.e. thinking. Thinking however requires asking questions and producing nothing hence should be detected by lower KEDE values.
Let's run a thought experiment with two teams. Team J works in Java. Team P works in Python. Apart from the programming language used all other variables are equal for the two teams. That means they work in the same business domain, in the same company, with the same client. Both Team J and Team P make no use of any automation tools, but type their code in simple text editors. Most importantly - their developers possess the same average typing speed. Let's say that both teams delivered in one day a program with the same functionality. Since Java is a verbose programming language Team J delivered a finished program of 10 000 symbols. Python is not a verbose programming language hence Team P delivered a finished program of 1 000 symbols. When we calculate daily KEDE Team J will have 0.1, and Team P will have 0.01. That means Team J is ten times more efficient in their knowledge discovery than Team P.
Wait a minute, for the same time Team P produced the same amount of work as Team J did, but we say Team P is 9 times less efficient than Team J? Is this correct? Is this fair to Team P? It all depends on what we mean by amount of "work". If by amount of "work" we mean the tangible output which in this case is working software with the requested functionality then yes, Team P did the same amount of "work". However, with KEDE we analyze not the tangible output i.e. working software with the requested functionality, but the knowledge work. Knowledge work involves discovering knowledge by asking questions. KEDE shows that Team P needed more questions to produce the same tangible output i.e. working software with the requested functionality.
With KEDE we don't measure the tangible output i.e. working software with the requested functionality. With KEDE we measure how efficiently knowledge workers found the knowledge they needed to produce that tangible output.
It looks like Team P despite typing fewer symbols in the same time frame, reached the same functional outcome as Team J implies that Team P faced more knowledge gaps or questions during the development process. This is evident in their lower KEDE score, showing Team J's superior efficiency in knowledge discovery. In essence, Team J either had a better grasp on what needed development and how to approach it or they were better at bridging their knowledge gaps swiftly.
Pushing the envelope further: If Team P exhibited the same skills in knowledge discovery as Team J, coupled with Python's inherent advantage of requiring fewer symbols, they could have potentially rolled out more features or functionality in the same time frame. This scenario underscores that KEDE reflects the true state of a knowledge discovery process, spotlighting any dormant potential within teams. In this analogy, KEDE unveiled the latent potential that Team P could harness.
Minimizing code size
I have made this letter longer than usual because I have not had time to make it shorter. ~ Blaise Pascal', "Lettres Provinciales", 1657
When developing embedded software developers need to optimize for the length of the source code. The less symbols of source code created the better. Some may say this shows KEDE is not applicable to embedded software because KEDE depends on the number of symbols of source code created. Less symbols of source code created will achieve lower KEDE even though it may be very time consuming to achieve that small code size.
Indeed, Embedded real-time systems typically require a lot more effort, and usually more time, than business systems. Embedded systems developers need to optimize not only for source code size. The key embedded system characteristics that in many cases require optimization also include system timing, drivers size, RAM usage and energy consumption. Optimizing each characteristic typically requires knowledge of different methods and techniques. In order to improve speed developers should acquire knowledge about compiler attributes and #pragma, but also know how to write optimizations that are still portable between compilers. On top of that, in applications that are extremely resource constrained, there comes a time when developers just need to dig into the compiler generated instructions. While languages such as C are standard, each compiler optimizes and generates the machine instructions slightly differently. The only real way to know what the compiler is doing is to review the assembly. So far we discussed only the software part of embedded systems development. But the job often requires knowledge of the hardware of embedded systems.
Is it possible for the developers to possess the knowledge and still get low KEDE just because of the short source code size? Let's perform a thought experiment for the extreme case of an omniscient artificial intelligence called AI. Since AI is omniscient it possesses all the knowledge required to develop embedded software optimized for all needed system characteristics including the smallest possible source code length. AI can develop an embedded program of 1000 symbols in length for 4.8 minutes. This is the extreme upper limit of knowledge discovery efficiency any human could achieve. However, if for an 8 hours day AI produces only one such embedded program it will have daily KEDE = 0.01. If on the other hand for an 8 hours day AI produces 100 such programs it will have the maximum KEDE i.e. perfect knowledge.
We see that in the case of perfect knowledge the length of the program by itself does not affect KEDE. What affects KEDE is how much knowledge needs to be acquired. KEDE postulates that if a team is more efficient in terms of knowledge discovery they should produce more embedded software for the same time.
Auto-generated code
“Once a programmer knows what to build, the act of writing code can be thought of as (1) breaking a problem down into simpler problems, and (2) mapping those simple problems to existing code (libraries, APIs, or functions) that already exist. The latter activity is probably the least fun part of programming (and the highest barrier to entry).” ~ OpenAI's blog
Auto-Generated Code means that the code wasn't written by a person, it was generated by an automated process. There are three major ways to use auto-generated code - code completion, boilerplate code and AI.
Code completion is a context-aware feature in which an Integrated Development environment (IDE) predicts the rest of a word a developer is typing. Developers can typically press the tab key to accept a suggestion or the down arrow key to accept one of several.
Many IDEs offer class creation wizards which auto-generate not only methods, but also class files, abstract or implemented methods, place the package statement at the beginning of a class and correct imports. When using programming languages that are considered verbose, the developer must write a lot of code to accomplish only minor functionality. Such code is called boilerplate and would be better generated automatically than typed by hand. For instance, in Java to provide encapsulation classes need methods for getting and setting instance variables. If the instance variables are declared as public, the accessor and mutator methods would not be needed. The definitions of these methods can be regarded as boilerplate. On top of that many IDEs offer templates. Templates serve as a shorthand for a snippet of code such as a main methods, a for loop etc. We type in a word and it will be transformed into the code snippet. We can also create custom templates which reflect a particular domain or project. Many frameworks offer wizards for generating a whole new application. One needs to answer a few questions and get a skeleton of an application.
Coding tools like OpenAI's Codex, GitHub Copilot and CodeT5, which generate code using machine learning algorithms, have attracted increasing attention. An AI-powered tool takes natural language prompts as input (e.g., “Say hello world”) and generates code for the task it is given. Such tools should reduce the time developers spend looking up reference manuals, browsing programming forums, and poring over code samples. On top of that, AI could help developers with code defect detection, e.g. predict whether code is vulnerable to exploits, and clone detection, which predicts whether two code snippets have the same functionality. In short, AI-powered code generation tools having learned a lot will definitely increase the knowledge discovery efficiency.
As AI-assisted programming becomes more prevalent, it is becoming increasingly clear that the "hard" part of the job is now knowing what question to ask or what task to initiate. This has always been the case, but AI simply makes it more evident. In order to ask the right question, you must already know half the answer. In other words, if you are experienced and a good thinker, AI programming will make your work easier and better.
If we put on the "knowledge discovery" lens we will see that such AI tools help answer questions about "How" to develop what is needed. The questions about "What" to develop are left to the humans who will have to accurately describe what they need. Yes, humans will have to accurately describe not what they want, but what they need. That is something AI tools cannot do...yet. AI tools that would actually help people know what they need and to communicate it effectively would be a true miracle.
When calculating KEDE we refer to the maximum possible typing speed for a human being. On the other hand, auto-generated code is typed much faster than humans. How will this affect KEDE calculation?
Let's run a thought experiment with Team J from the previous section. As we know Team J makes no use of any automation tools, but developers type their Java code in simple text editors. We add team G which also uses Java, but they use IDEs and all of their code-generation features. Apart from the difference in the usage of IDEs and simple text editors all other variables are equal for both teams. That means they work in the same business domain, in the same company, with the same client and most importantly - their developers possess the same average typing speed. Let's say that both teams delivered in one day a program with the same functionality and of 10 000 symbols in length. Team G used an IDE to auto-generate 90% of their code. That means Team G typed 1 000 symbols and generated 9 000 symbols of Java code. As before Team J typed all 10 000 symbols and generated none. When we calculate daily KEDE Team J would have 0.1, and Team G would also have 0.1. No difference in KEDE despite the auto-generated code. Wait a minute, for the same time Team G did 9 times less work than Team J did, but we say Team G is as efficient as Team J? Is this correct? Is this fair to Team J?
Again, it all depends on what we mean by "work". If by "work" we mean the manual physical effort of typing symbols then yes, Team G did 9 times less work. However, with KEDE we analyze not the manual effort, but the knowledge work. Knowledge work involves discovering knowledge by asking questions. When we generate code we use readily available condensed knowledge and there is no need to discover it. Hence, in doing knowledge work Team J was equal to Team G.
Generating code is much faster than typing it. Hence those 9 000 symbols of code that Team G generated should have saved them a lot of questions and consequently - time. For this saved time there could have been some more symbols of source code contributed and more tangible output i.e. working software produced. It looks like Team G saved time but didn't return it back to the company.
Refactoring
"The challenge of modifying the existing structure of a software system to cleanly represent the total knowledge base is the philosopher's stone of software maintenance." ~ Phillip Glen Armour[2]
Some may say that KEDE implies the Refactoring technique should be considered wasteful. That KEDE would nudge developers not to refactor.
Let's first define what refactoring is and what it is not. According to Martin Fowler, who popularized the technique, refactoring when used as a noun is: a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior[1]. The important part here is "without changing its observable behavior" Hence, not every change to source code is refactoring. Changing observable functionality is not refactoring. Fixing defects is not refactoring.
We can take a broader view on refactoring to include also changing the observable behavior of a product. As a product become more populated with knowledge and the opportunity to really test that knowledge increases as the knowledge becomes more testable, we start to find out if we really did know what we thought we did[2]. As we transition the product from the relatively controlled environment of the test system to the uncertain outside world, we start increasing the detection of necessary knowledge that exists outside. It is at this point that users have visibility into end-to-end processing which allows them to qualify the system's knowledge content[2].
From a knowledge discovery perspective refactoring shows that the proper internal structure was not known in advance. That the questions developers asked themselves or their architects were not answered properly in the first place. Some more questioning had to take place and after new knowledge (technical or in the form of new requirements) was discovered the structure of the software was changed accordingly. That is a clear waste, but does not mean it is bad to do refactoring.
The nature of software development requires software developers to constantly improve the knowledge content of the software produced. To do this, they often need to refactor. Therefore, refactoring is not inherently good or bad - it's simply a method that software developers employ to adapt to their ever-changing work environment. And no, KEDE does not say refactoring is bad. It just can quantify how inefficient a knowledge discovery process is. Here comes again the goal to learn as cheaply as possible. Hence refactoring can also be done more efficiently.
Works Cited
1. Martin Fowler DefinitionOfRefactoring
2. Armour, P.G. (2003). The Laws of Software Process, Auerbach
Getting started