Knowledge Discovery
The internal workings of a Knowledge Discovery Process
What is Knowledge Discovery?
In the context of software development, the "knowledge discovery" refers to the set of activities and methods that a software development organization uses to acquire, understand, and apply the necessary knowledge throughout the software development lifecycle.
It encompasses the learning, problem-solving, research, and collaboration activities that help the team with the acquisition, understanding, and application of new information, concepts, and techniques needed to deliver a solution successfully. This process is essential for the software development organization to adapt and respond to various challenges of new or changed requirements that arise during the development.
Knowledge discovery may include:
- Identifying knowledge gaps: Determine areas where the team may lack expertise or where knowledge gaps may exist. This will allow you to plan for additional training, research, or external support to bridge these gaps.
- Acquiring new technical skills: Team members may need to learn new programming languages, frameworks, or tools to deliver the solution.
- Gaining domain knowledge: Developers may need to understand the specific industry or domain in which the software will be used, including its terminology, processes, and regulations.
- Research and analysis: The team may need to conduct research and gather information to make informed decisions or to solve complex problems that arise during development.
- Problem-solving: As developers encounter issues, they must apply their knowledge and skills to troubleshoot and find solutions.
- Collaboration and communication: Sharing knowledge and ideas among team members can help identify potential issues, improve understanding, and foster innovation.
- Adapting to changes: The knowledge discovery also involves adapting to changes in project requirements, technologies, or other factors that may impact the project's success.
Acquiring knowledge from the world
"Lack of knowledge…that is the problem. You should not ask questions without knowledge. If you do not know how to ask the right question, you discover nothing." ~ W. Edwards Deming
It has always been the case that the most difficult task is to find the right questions to ask. To ask the right question you need to know half the answer.
Missing information quantifies how much we do not know about something. However, to quantify how much we do not know, we have to have some knowledge about the thing we do not know. The things we could potentially know or ask questions about any object are infinite.
An epistemic action is any action taken to gather information from the world. This could include any act of active perception, monitoring, checking, testing, verifying, experimenting, exploring, enquiring, or looking[1].
The value of new information has been extensively studied in economics, where it is referred to as the "value of information," or "how much an agent is willing to pay for obtaining that information?" An example of this would be how much you would pay for the information "Each of the houses in this city costs one million dollars."
An epistemic action is the act of seeking information from a source. Sources of information include:
- direct experience (such as perceptive evidences);
- information provided by other people;
- reasoning (about other beliefs);
- categorization (reasoning about classes and similarities)[4].
We can view observing the world as the equivalent of receiving a message[9]. Observation has the same property of changing the observer's probability distribution over the observable states of the world and creating meaning in the observer's mind as receiving a message. The key difference is that the world has no intention of being observed, so the receiver of the message must proactively create it.
For example, before looking out the window, you might say there is a 50% chance it is raining. After looking out the window, you know whether or not it is raining. Regardless of the outcome of your observation, you have gained one bit of information. The probability distribution is now likely to be 100% or 0%. What matters is how the observation changes the observer's understanding of the thing observed. The observation has provided one bit of information about the thing measured.
The game of twenty questions is a useful example of acquiring knowledge[3].
The game of twenty questions
"Thus, twenty skillful hypotheses will ascertain what two hundred thousand stupid ones might fail to do. The secret of the business lies in the caution which breaks a hypothesis up into its smallest logical components, and only risks one of them at a time. What a world of futile controversy and of confused experimentation might have been saved if this principle had guided investigations into the theory of light!" ~ Charles Sanders Peirce [3]
The game "20 questions" is an old game that gained popularity in the late 1940s when it was used as the format for a successful weekly radio quiz program. In the traditional game, the inquirer leaves the room while the remaining people agree on an object - a person, place, or thing. The inquirer then comes back and has to guess what the object is by asking successively questions that can be answered with a simple "yes" or "no". If the inquirer cannot guess the object after asking 20 questions, the respondents have stumped the inquirer.
The traditional version of the 20 questions game has a deterministic solution. Using Shannon's formula, we can calculate that with 20 yes/no questions, one should be able to screen a multitude of 220 ~ 106 alternative words.
Let's suppose the object to be guessed is "Abraham Lincoln's stove pipe hat"[2]. The initial clue is "sugarloaf with animal associations".
For two teams of competent players presented with the initial clue the game might go as follows:
First team | Second team |
---|---|
1) Are the animal associations human? Yes. | 1) Is the object useful? Yes. |
2) Male or female? Male. | 2) Is it an item of dress? Yes. |
3) Famous or not? Famous. | 3) Male or female? Male. |
4) Connected with the arts? No. | 4) Worn below or above the belt? Above. |
5) Politician? Yes. | 5) Worn on the head or not? Head. |
6) USA or other? USA. | 6) Is it a famous hat? Yes. |
7) This century or not? Not. | 7) Winston Churchill's hat? No. |
8) Twentieth or Nineteenth century? Nineteenth. | 8) Abraham Lincoln's hat? Yes. |
9) Connected with the civil war? Yes. |
|
10) Lincoln? Yes. |
|
11) Is the object Lincoln's hat? Yes. |
|
To consistently excel at 20 Questions or knowledge work, you need a good mix of both:
- Prior knowledge (experience, instincts, and expertise about a specific subject matter)
- Strategy (for learning as much as possible with each question you ask in the game)
The choice of strategy allows us to acquire the same amount of information with more or fewer questions than the average number. The optimal strategy is to ask each question in a way that divides the remaining objects into approximately equal probability halves. For example: "male or female?", "worn below or above the belt?". The optimal strategy ensures that we will get all the missing information in the average number of questions.
It's worth noting that you can still play the game without the optimal strategy, but it will be suboptimal. Without subject matter knowledge, however, the game cannot be played at all. For example, if your opponent is thinking about Abraham Lincoln's stovepipe hat, but you have never heard of Abraham Lincoln.
Expertise in the relevant subject matter is critical[2].
This is also true for success in knowledge work.
It from bit
"Not until you start asking a question, do you get something. The situation cannot declare itself until you've asked your question. But the asking of one question prevents and excludes the asking of another.” ~ John Archibald Wheeler [7]
So far, we have applied the Shannon formula to cases where the object to be found is known in advance. By asking binary questions of one bit each, we looked for and found "it" - the hidden object. If the object is "it" and the result is "bit," we can say that we've got "bit from it".
However, in the reality of knowledge work, the "it" is not known in advance. If the knowledge worker is a software developer and the "it" to be delivered is a software program, the software developer needs to acquire missing information from various sources. At the minimum, the business should answer questions about the requirements and user manuals should answer questions about the technology to be used. Ideally, each question brings back one bit of information. The software developer needs to maintain a coherent frame of all the answers received."
We see that the "it" emerges from many bits of information. The software itself is constructed from the answers provided by participants at all levels. We can say that we've got "it from bit" - a phrase coined by John Wheeler. "It from bit" symbolizes the idea that every item in the physical world has knowledge as an immaterial source and explanation at its core. Reality arises from the posing of binary "yes'/"no" questions. [5]
John Archibald Wheeler, who coined the term 'black hole', drew attention to the connections between physics and information theory. He likened the job of a physicist to someone playing the "surprise" version of the game "20 questions".
"Surprise Twenty Questions" game
In the "Surprise Twenty Questions" game, the inquirer leaves the room while the respondents, unbeknownst to the inquirer, do not agree on an object. When the inquirer re-enters the room, they try to guess the object by asking a series of questions that can only be answered with a "yes" or "no". However, the group has decided to play a trick on the guesser - there was no object agreed upon to start with! The first person to be questioned will only think of an object and answer the question after the inquirer asks their question. Each person after that will do the same, making sure their response is consistent with the immediate question and all previous answers. A complex vortex of decision making is set up, a logical but unpredictable chain of ifs and thens. Yet somehow, this steady improvisation leads, though not always, to a final answer that everyone can agree on, despite the odds.
"...they had agreed in advance that this would be one where no word was agreed upon to start with. Every answer, however, would have to be consistent with all the answers that had gone before. So it was really harder for the people playing the game than it was for me. The point was the word “cloud” that had been produced really came more out of the questions that were asked than out of anything that they had agreed upon before the thing started."[6]
"However, the power I had in bringing the particular word "cloud" into being was partial only. A major part of the selection lay in the 'yes' or 'no' replies of the colleagues around the room. ... In the game, no word is a word until that word is promoted reality by the choice of questions asked and answers given. "[8]
At any given moment, there are many possible objects that are compatible with the answers already given. The set of possible objects is in a state of messy coherence. The inquirer can find things, but often finds things they didn't know they needed. It is like a partially constructed spider's web of connections that becomes visible during the questioning. Each successive question selects a subset among the possible objects, but the possible answers to the question are determined by the possible objects that remain.
Changes in the inquirer's cognitive state will alter the respondents, and changes in the respondents will likewise ripple into the inquirer's cognitive state. The inquirer and the respondents are working together as a larger cognitive system because they are able to affect each other and therefore satisfy the 'mutual manipulability criterion', which specifies that two entities that can reciprocally alter each other's state belong to the same system[16].
To make this work, the respondents have to ensure that their combined answers still define at least one possible real object. T heir answers must be coherent with each other, requiring logic, context, vision, and cognition that is common to all respondents. If this requirement is met, then gradually, over a varying number of questions, an object finally emerges. This object is discovered together by all persons present during the questioning process - an object that was not selected ahead of time and could not have been predicted. The result is truly deterministic, but only in hindsight, only in retrospect.
The "Surprise Twenty Questions" game is a beautiful example of how humans discover knowledge.
How to cite:
Bakardzhiev D.V. (2022) Knowledge discovery. https://docs.kedehub.io/kede/kede-knowledge-discovery.html
Works Cited
1. Castelfranchi, C., Lorini, E. (2003). Cognitive Anatomy and Functions of Expectations. IJCAI03 Workshop on Cognitive modeling of agents and multi-agent interaction, Acapulco, Mexico.
2. Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery, 2nd Edition (2nd edition). Wiley-Interscience.
3. Peirce, C.S.. (1998). The Essential Peirce, Volume 2: Selected Philosophical Writings (P. E. Project, Ed.). Indiana University Press.
4. Pezzulo, G., Lorini, E., & Calvi, G. (2004). How do I Know how much I don't Know? A cognitive approach about Uncertainty and Ignorance. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 26, No. 26).
5. Wheeler, J. A. (1990). Information, physics, quantum: The search for links. In W. H. Zurek (Ed.), Complexity, entropy, and the physics of information (Vol. 8, pp. 3–28). Taylor & Francis.
6. Oral history interview with John Archibald Wheeler, 1967 April 5. by Wheeler, John Archibald, 1911-2008 [Online]. Available: Transcript
7. [Online]. Available: Do Our Questions Create the World?
8. P.C.W. Davies and J.R. Brown, The ghost in the atom, Cambridge University Press, 1986.
9. Garner, W.R. (1962). Uncertainty and Structure as Psychological Concepts, New York, Wiley.
Getting started