Knowledge Discovery Patterns

A new perspective at the terms "iterative" and "incremental".

Incremental

"Skill to do comes of doing; knowledge comes by eyes always open and working hands." ~ Ralph Waldo Emerson, essay Old Age, included in the collection Society and Solitude (1870)

In software development iterative and incremental are usually defined from an observable functionlity perspective. Incremental is said to mean adding new features to a software application and iterative to mean refining the said application.

When analyzing source code from knowledge discovery perspective we lack the meta information about what functionality a piece of code belongs to. Hence we need to look from a new angle at the words iterative and incremental and redefine them.

In incremental mode a part is finished before a second part is added. It is like laying bricks. A new brick doesn't remove an existing one. In knowledge discovery language "a part" is a piece of knowledge. Hence when we add knowledge without deleting or dropping some knowledge we have already found then we are working in incremental mode.

Incrementally is how the longest English word “Honorificabilitudinitatibus” was typed earlier. I never deleted a letter, only added new ones.



H o n o r


i f i c a b i



l i t


u d i



n i



t a t i b u s
0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1

In this case Waste is zero as calculated:

All of this is best understood given an illustration. For that we'll use the Incrementing Mona Lisa (Source: Jeff Patton)

Here we can say that the color of each pixel is knowledge we are discovering. Whenever we find some knowledge we never drop it, we never change it i.e. we never waste it.

Iterative

In iterative mode a part is removed and a new a part is added in its place. In knowledge discovery language "a part" is a piece of knowledge. Hence when we drop prior knowledge and replace it with a new knowledge then we are working in iterative mode

If I type the “Honorificabilitudinitatibus” iteratively then it would be like that:



H o n o r X X


i f i c a b i X



l i t X


u d i



n i



t a t i b u s
0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1

After typing “Honor” I typed two letters which were wrong. They are marked with “X” in the table. I typed them because I thought they were the proper ones i.e. I thought I have the knowledge but I hadn't Later on when asked the question “0” I found I'e made a mistake and I had to replace them. After typing the second sequence “ificabi” I again made a mistake with one wrong letter marked with “X”. After asking the questions “00” I corrected it with “l”. After typing “lit” I again made a mistake with one letter marked with “X”. I corrected it to “u” after asking question “0”.

The result is that I added four wrong letters which were later replaced with the proper ones. I asked a question for each of those four “X” letters so max symbol rate N rose from 37 to 41 and Waste is close to 15%:

All of this is best understood given an illustration. For that we'll use the Iterating Mona Lisa (Source: Jeff Patton)

Here again the color of each pixel is knowledge we are discovering. The knowledge discovered is later on replaced with new knowledge. Hence the color of each pixel changes while we work on the painting. We can say we "refine" our knowledge based on the new information we receive.

This way of working resembles the painting technique, called "sfumato," which is to apply many translucent layers (glazes), mainly composed of an organic medium in very thin films, down to a micrometer scale. For instance the flesh tones were usually obtained by superimposing four layers: 1) the priming layer made of lead white, 2) a pink layer based on a mixture of lead white, vermilion, and earth, 3) a shadow layer made with a translucent glaze or an opaque paint (with dark pigments), and 4) varnish. Specialists from the Center for Research and Restoration of the Museums of France found that da Vinci painted up to 30 layers of paint on Mona Lisa[1].

Such an approach is not limited only to painting. Leo Tolstoy's epic War and Peace is another case in point. Scholars note that Tolstoy's progress on War and Peace frequently stalled as the author reworked portions of the book again and again. The novel took six years to write. During that time, Tolstoy is said to have rewritten the entire manuscript by hand at least 8 times, and individual scenes up to 26 times. Even after the six volumes of War and Peace were completed, Tolstoy went back and revised. He cut out pages and pages of commentary, eventually whittling the work down to four volumes.

That is very much what software developers do on a daily basis. We constantly refine the code in order to match the constantly changing knowledge about what to build and how to build it.

An example of that is Scrum, which is both iterative and incremental. The entire Scrum team is accountable for creating a valuable, useful Increment every Sprint. The development team takes a first cut at a system, knowing it is incomplete or weak in perhaps many areas. The team then iteratively refines those areas until the result is satisfactory. With each iteration, the software is improved through the addition of the knowledge discovered.

Another example of an iterative and incremental process is the Rational Unified Process (RUP). The RUP has determined a project life-cycle consisting of four phases. Each phase has one key objective and milestone at the end that denotes the objective being accomplished. Inside each of the phases the development process is iterative.

Below we have drawing Mona Lisa Iteratively and Incrementally (Source: Steven Thomas)

The Increment adds completely new features i.e. expanding the scope of the functionality offered. Each increment is refined iteratively and likely to refine existing functionality. Inside each increment prior knowledge is dropped and replaced by new knowledge.

Yet another example is found the case of MVP aka Minimum Viable Product. This is different from the three approaches above in that the goal here is to start with the smallest thing that can be put in the hands of real users they can actually use!

Below we have a drawing of the context of MVP (Source: Henrik Kniberg)

The top scenario (crossed with red lines) is the incremental approach we discussed above. In the context of a new startup business it fails us because we keep delivering stuff that the customer can't use at all. However, from knowledge discovery efficiency perspective it is the most efficient use of software development time. Some may say that this is applicable in a mass production of goods only. That's not correct. This way are produced most of the luxury cars like Rolls-Royce, Ferrari F40 and Mclaren F1 to name a few. Those cars feature numerous proprietary designs and technologies. That means a lot of knowledge is there in them.

The bottom scenario describes what usually happens when the company lacks a lot of knowledge both in regards "What" needs to be delivered and/or "How" to deliver the "What". Such a context is the natural habitat of technology startups. We start with a skateboard and end up with a convertible. Between them we have a scooter, a bike and a motorcycle. Let's not forget that here we look not from functionality perspective but from knowledge discovery perspective. Hence the question is - what can we say about the knowledge discovery process?

We can safely say that to acquire the knowledge to build a bike one needs to ask much less questions than to acquire the knowledge to build a car. That's correct simply because a car has some 59,999 parts and a bile less than 20 parts. Since a skateboard probably has no common parts with a bike we can say that the knowledge acquired to build the skateboard is lost when we decide to build a bike. Hence we could conclude that we have 100% waste on each transformation from a skateboard to a car.

However let's not forget that not all of the knowledge is in the tangible output - a car or a source code. Some of it, and people may say most of it is located in the heads of the knowledge workers. Hence the percentage of waste is never 100%.

This is just another example of the need to be able to learn as cheaply as possible. Approaches like Lean Startup can provide qualitatively arguments for that. Now, with the ability to measure waste in knowledge work we can quantitatively back their claims.

Works Cited

1. de Viguerie, P. Walter, E. Laval, B. Mottin, and V. A. Solé, “Revealing the sfumato Technique of Leonardo da Vinci by X-Ray Fluorescence Spectroscopy,” Angewandte Chemie International Edition, vol. 49, no. 35, pp. 6125–6128, 2010, doi: 10.1002/anie.201001116

Getting started