Knowledge Discovery Efficiency (KEDE) and Ashby's Law of Requisite Variety

Abstract

We address Real-world applications of Ashby's Law by adopting Ashby's strict black-box perspective: only external behaviour is observable. First we define the multi-staged selection process of narrowing down and selecting the appropriate response from the set of alternative responses as the Knowledge Discovery Process. We then establish H(X|Y) as the knowledge to be discovered, which is the gap in internal variety that had to be compensated by selection. This quantifies how much disorder the regulator still permits and, conversely, how close the system comes to meeting Ashby's requisite-variety condition. In information-theoretic terms, perfect regulation requires H(X|Y) = 0. Then we quantify the knowledge to be discovered H(X|Y) based on the observable outcomes. Building on this result, we generalize Knowledge-Discovery Efficiency (KEDE) - scalar metric that quantifies how efficiently a system closes the gap between the variety demanded by its environment and the variety embodied in its prior knowledge. KEDE operationalises requisite variety when internal mechanisms remain opaque, offering a diagnostic tool for evaluating whether biological, artificial, or organisational systems absorb environmental complexity at a rate sufficient for effective regulation. Finally we present applications of KEDE in diverse domains, including typing the longest English word, measuring software development, testing intelligence, basketball game, assembling furniture, and speed of light in medium.

Introduction

The Law of Requisite Variety, formulated by W. Ross Ashby, states that for a system to effectively regulate its environment, it must have at least as much variety/complexity as its environment[1]. This principle is foundational in disciplines such as cybernetics, control theory, and machine learning.

The concept of requisite variety has since been applied across diverse domains, including organizational theory, ecology, and information systems. It underscores the necessity for systems to adapt to environmental complexity in order to maintain stability and achieve intended outcomes.

Real-world attempts to apply Ashby's Law of Requisite Variety face three persistent obstacles. (i) Combinatorial explosion: enumerating all relevant states of a system and its environment quickly becomes intractable, especially when hidden or unmeasured variables are present. (ii) Dual control dilemma: a regulator must simultaneously amplify its own control variety and attenuate external variety—an optimization that is delicate in multiscale, hierarchical, and time-varying settings such as digital ecosystems or military command structures. (iii) Resource constraints: limited data, computational power, and organisational capacity often preclude sophisticated control architectures. Existing remedies—markup-language state catalogues, iterative multidimensional sampling, and distributed self-organising controllers—mitigate but do not eliminate these limitations.

In section 2, we provide a detailed overview of Ashby's Law of Requisite Variety, including its mathematical formulation and implications for system regulation. We also discuss the present day understanding of residual variety and its significance in the context of Ashby's Law. In section 3, we discuss the challenges of applying Ashby's Law to real-world systems, including combinatorial explosion, dual control dilemma, and resource constraints. We propose a solution to these challenges by treating the system as a black box, observing probability of successful outcomes to disturbances, and estimate the gap in its internal variety based on that. In section 4, we start establishing the solution by introducing the Knowledge Discovery Process, which is narrowing down and selecting the appropriate response from its set of alternative responses. We then establish H(X|Y) as the knowledge to be discovered, which is the gap in internal variety that had to be compensated by selection. In section 5, we show how to quantify the knowledge to be discovered H(X|Y) based on the observable outcomes. In section 6, we generalize the Knowledge-Discovery Efficiency (KEDE) - scalar metric that quantifies how efficiently a system closes the gap between the variety demanded by its environment and the variety embodied in its prior knowledge. Finally, in section 7, we explore applications of KEDE in various domains, demonstrating its utility as a diagnostic tool for evaluating system performance and adaptability.

The Law of Requisite Variety

For a system to effectively regulate its environment, it must have at least as much variety as its environment.

Set-based formulation

Regulation achieves a “goal” using a set of responses against a set of disturbances. It is assumed it is already determined what is to be the goal, i.e. what are the acceptable values of the essential variables. The Table of Outcomes T has cells indexed by disturbance-response pairs (d,r). The cell entries are outcome-values z ∈ Z.

T R
r₁ r₂ r₃ ...
D d₁ z₁₁ z₁₂ z₁₃ ...
d₂ z₂₁ z₂₂ z₂₃ ...
d₃ z₃₁ z₃₂ z₃₃ ...
d₄ z₄₁ z₄₂ z₄₃ ...
... ... ... ... ...

A disturbance-response pair determines an outcome-value, and that outcome-value is then evaluated at the level of the essential variables:

T : D × R Z φ : Z E

Where:

  • D is the set of disturbances.
  • R is the set of regulatory responses.
  • T(d,r) is the entry in the cell indexed by row d and column r .
  • Z is the set of all possible outcome-values that can appear anywhere in the table T.
  • E is the set of values of the essential variables that matter for maintaining the system's goal.
  • φ maps each outcome-value to its corresponding essential-variable value.
  • η ⊆ E is the acceptable subset / region in the essential-variable space. These are the values of the essential variables compatible with survival or otherwise acceptable. Successful regulation means that the realized outcomes map into this subset.

Correspondence between outcome-values and essential-variable values. The “outcomes” are simple events, without any implication of desirability. An “outcome” is a concrete or functional result; the “value of E” is the formal value of the essential-variable value associated with that outcome.[1].

Multiple cells in the Table of Outcomes T may contain the same outcome-value z ∈ Z. That means one outcome-value appearing in multiple different cells. This is distinct from the case in which multiple distinct outcome-values map to the same essential-variable value under the function φ. It does not by itself imply that different outcome-values must map to the same value of E.

The special case is the reduced representation in which the table T directly contains values of the essential variables, i.e. Z = E and φ = idE, so that the table entries are already values of the essential variables.[1] More generally, however, φ may be bijective or many-to-one, depending on the level of description adopted.

Bijective correspondence between outcome-values and essential-variable values. This is the case in which each relevant outcome-value maps to a unique essential-variable value, and each relevant essential-variable value is represented by exactly one outcome-value. Then Z and E are conceptually distinct but informationally equivalent: φ : Z E with z₁ z₂ φ (z₁) φ (z₂) and every relevant e E has exactly one corresponding z Z . On this reading, Z may still be understood as the outcome-value space and E as the essential-variable space, but every relevant distinction in realized outcomes is mirrored one-for-one at the level of the essential variables. This is the form used in later reformulations such as D R Y E with an explicit bijection between the outcome variable and the essential variable E.[31]

Illustrative example. Let Z be the set of fine-grained thermoregulatory outcome-states, and let E be the set of essential-variable values relevant to regulation. Say zA = “body temperature 36.4°C”, zB = “body temperature 36.6°C”, and zC = “body temperature 36.8°C”. Let the corresponding essential-variable values be eA, eB, and eC, where each of these values is distinct. Then φ(zA) = eA, φ(zB) = eB, and φ(zC) = eC, with no collapse of distinctions. In that case the distinctions present in Z are preserved exactly in E.

Many-to-one correspondence between outcome-values and essential-variable values. This is the case in which multiple distinct outcome-values map to the same essential-variable value. Ashby says that a particular outcome can be treated “as unit with unit,” but “in another context, [it] may be analysed more finely.[1]” That means an outcome can be used as a single table-entry in the regulatory schema, even though in a more detailed analysis it may unfold into a whole trajectory, microstate, or process.[1] Then different outcome-values can collapse to the same essential-variable value: z₁ z₂ φ (z₁) = φ (z₂) = e .

On this reading, Z contains finer-grained realized outcomes, while E contains the coarser essential-variable values that matter for survival or goal-attainment. The many-to-one relation therefore expresses that several distinct realized outcome-states may be equivalent from the standpoint of regulation. Successful regulation is still judged at the level of the essential variables: distinct realized histories may all count as equally successful if they yield the same relevant value of E and hence fall within the acceptable region η ⊆ E.

This is a refinement of Ashby's basic regulatory schema, but it is an added modeling layer[1]. In Ashby's basic formulation, an outcome may be treated as a unit. In this article we use only the bijective correspondence between outcome-values and essential-variable values.

Illustrative example. Let Z be the set of fine-grained thermoregulatory outcome-states, and let E be the set of coarser essential-variable values relevant to regulation. Say zA = “a realized thermoregulatory state whose coarse temperature is 36.6°C, with oscillation pattern A” and zB = “a realized thermoregulatory state whose coarse temperature is 36.6°C, with oscillation pattern B”. Then zA and zB are distinct elements of Z, because they differ as fine-grained realized outcomes, even though both map to the same coarse essential-variable value e = “body temperature 36.6°C”. In symbols, φ(zA) = φ(zB) = e. If e ∈ η, then both fine-grained outcomes count as acceptable. There is no requirement that the regulator distinguish between oscillation pattern A and oscillation pattern B unless that finer difference matters to the essential variables.

Requisite variety

The regulator does not choose outcome-values directly. It selects a response in the presence of a disturbance, and thereby determines the outcome Y that is actually realized. The regulatory chain is D → R → Y → E, where Y is the outcome variable and E is the essential variable[31]. The task of regulation is to keep the realized outcomes within the goal subset, or equivalently to keep the corresponding values of the essential variables within the acceptable region.

Convention on variety. In this subsection, V denotes count-variety: for any finite set A , we define V (A) := |A| , the number of distinguishable values/states in A under the adopted classification. Thus all formulas written with V are cardinality statements, not entropy statements. If one instead wishes to use logarithmic variety, define H (A) := log2 V (A) ; then the corresponding additive formulas are obtained by taking base-2 logarithms of the count-variety formulas below.

If the regulator uses a response rule ρ : D R , then the realized outcome variable under that rule is

Yρ : D Z , Yρ (d) := T (d,ρ(d))

The realized outcome-set under ρ is therefore Yρ (D) = { Yρ(d) : dD } Z . Its corresponding essential-variable image is φ ( Yρ (D) ) E .

Let η E be the acceptable subset / region in essential-variable space. Then successful regulation requires that the realized outcome-set Yρ (D) generated under ρ map into the acceptable region: φ ( Yρ (D) ) η

Equivalently, successful regulation requires that the outcome-set Yρ (D) is a subset of the preimage of the acceptable region: Yρ (D) φ1 [η] Here φ1 [η] denotes the preimage of the set η , not a functional inverse.

This subset relation is the primary success condition. It says that every realized outcome produced under the response rule ρ must lie in the acceptable outcome-set. In Ashby-style terms, regulation succeeds when the realized outcomes remain within the goal subset, and in later entropy formulations this is expressed as the requirement that the variety of outcome be no greater than the variety of the survival region[1].

Therefore, at the level of count-variety, success implies the following necessary numerical bound:

V ( φ ( Yρ (D) ) ) V ( η )

Similarly, from Yρ (D) φ1 [η] we obtain the necessary bound

V ( Yρ (D) ) V ( φ 1 [ η ] )

These count-variety inequalities are consequences of the subset condition. They are not, by themselves, equivalent to it: a realized outcome-set Yρ (D) may have cardinality no greater than that of an acceptable set and still fail to be a subset of that set. So success is fundamentally a matter of set inclusion, while the variety inequalities are derived necessary conditions.

Under the reduced representation in which Z = E and φ = idE, the outcome-values are already values of the essential variables, so the success condition reduces to

Y ρ ( D ) η

and therefore implies the numerical bound

V ( Y ρ ( D ) ) V ( η )

The law of requisite variety is then stated at the same count-variety level. In a finite count-variety idealization, the regulator's variety constrains how small the residual realized outcome-set Yρ (D) can be, so in the familiar Ashby-style lower-bound form:

V ( Y ρ ( D ) ) V D V R

Where:

  • V(D) is the count-variety in the disturbances.
  • V(R) is the count-variety in the regulatory responses.
  • V(Yρ(D)) is the residual count-variety in the realized outcome-set under the rule ρ.

This quotient form is a finite count-variety idealization of the more standard logarithmic form[1]. In this form, the law gives a lower bound on residual outcome-variety, while successful regulation imposes an upper bound through the acceptable set. Hence a necessary numerical compatibility condition for success is

V D V R V ( φ 1 [ η ] )

and, in the reduced representation Z = E,

V D V R V ( η )

This numerical compatibility condition is still only necessary, not sufficient. The decisive success criterion remains: φ ( Yρ (D) ) η .

Thus the law is stated at the level of the realized outcome-set Yρ (D) , not at the level of a single outcome token. A single realized outcome yt is used to state success at a particular time, whereas requisite variety concerns the residual variety across realized outcome-set Yρ (D) under disturbance and regulation.

Outcome map induced by a response rule

At time t , let the regulator use a response rule ρt : D R . Given the fixed Table of Outcomes T : D × R Z , this rule induces an outcome map

Yt : D Z , Yt ( d ) := T ( d , ρt ( d ) ) .

Thus Yt assigns to each disturbance d the outcome-value that results when the regulator responds according to the rule ρt . the image of the induced outcome map across the disturbance set is the outcome-set Yt (D) defined by

Yt (D) = { Yt (d) : d D } Z .

At the same time, let ηt E be the acceptable region in the essential-variable space, and let the corresponding acceptable subset of outcome-values be

O acc , t := φ 1 ( η t ) = { z Z : φ ( z ) η t } .

Successful regulation at time t means that all outcomes generated under the current response rule lie in the acceptable outcome-set:

Y t ( D ) O acc , t .

Equivalently, successful regulation is the condition

φ ( Y t ( D ) ) η t .

This formulation keeps distinct three levels: the fixed outcome table T, the induced outcome map Yt determined by the current response rule, and the time-indexed acceptability criterion determined by ηt.

Goal revision with fixed table and time-indexed acceptable outcomes

We consider a regulator acting against disturbances within a fixed system structure. The aim is to model the simple case in which the structure of the system does not change from time t1 to time t2, but the criterion of success does.

1. Fixed outcome table

Let the Table of Outcomes be the mapping

T : D × R Z

where:

  • D is the set of possible disturbances,
  • R is the fixed set of available regulatory responses,
  • Z is the set of possible outcome-values that can occur in the cells of the table.

The crucial assumption is that, from time t1 to time t2, the system structure does not change. Therefore the table T does not change either. The same disturbance-response pairs still lead to the same outcome-values. The values in the cells of T remain available throughout.

2. Essential variables and acceptability

Let

φ : Z E

map each outcome-value to its corresponding essential-variable value. At each time t, let

ηt E

be the acceptable region in essential-variable space at that time. Then the corresponding acceptable subset of outcome-values is

O acc, t := φ-1 ( ηt ) = { z Z : φ ( z ) ηt } .

Thus Oacc,t is the set of all outcome-values that are acceptable at time t. We are therefore tracking acceptability at the level of outcome-values in Z, via the pullback along φ. Acceptability-status is time-indexed, while outcome-identity in Z is not.

3. What changes and what does not

When the goal changes from t1 to t2:

  • the table T remains fixed,
  • the outcome space Z remains fixed,
  • the mapping φ remains fixed,
  • but the acceptable region changes from ηt1 to ηt2,
  • and therefore the acceptable outcome-set changes from Oacc,t1 to Oacc,t2.

So the change is not that outcome-values disappear from the table. The change is that some fixed outcome-values in Z may cease to be acceptable, while others may become acceptable. That is why it is correct to time-index Oacc,t.

4. Deletion, addition, and substitution

With this interpretation, the notions of deletion, addition, and substitution are legitimate, provided they are understood as operations on the acceptable subset Oacc,t, not on the table T itself.

Deletion

An outcome-value z is deleted from the acceptable set between t1 and t2 iff

z O acc,t1 and z O acc,t2 .

This does not mean that z disappears from the table. It means only that z, though still a possible outcome-value in T, is no longer acceptable under the revised goal.

Addition

An outcome-value z is added to the acceptable set between t1 and t2 iff

z O acc,t1 and z O acc,t2 .
Substitution

In the minimal set-theoretic sense, a substitution occurs when one acceptable outcome loses acceptability and another gains acceptability between t1 and t2. That is, there exist b, cZ such that

b O acc,t1 O acc,t2 , c O acc,t2 O acc,t1 .

Then one may say that b is removed from the acceptable set and c is admitted into it. This expresses substitution as coexistence of loss and gain; a stronger one-for-one replacement notion would require an additional correspondence between lost and gained elements.

5. Persistent, lost, and gained acceptability

It is useful to decompose the transition from t1 to t2 into three parts.

The persistently acceptable outcomes are

P12 := O acc,t1 O acc,t2 .

The outcomes that lose acceptability are

L12 := O acc,t1 O acc,t2 .

The outcomes that gain acceptability are

G12 := O acc,t2 O acc,t1 .

Then

O acc,t2 = ( O acc,t1 L12 ) G12 .

This equation states precisely how the acceptable outcome-set changes over time.

6. Realized outcomes under a response rule

Now let the regulator use a response rule at time t:

ρt : D R .

Since T is fixed, changing the regulator's behaviour means changing the response rule ρt, not changing the table. The set of outcomes realizable under that rule across the disturbance set is

Yt(D) := { T ( d , ρt ( d ) ) : d D } Z .

This is different from Oacc,t:

  • Oacc,t is the set of outcomes that would count as acceptable at time t,
  • Yt(D) is the set of outcomes actually realizable under the regulator's current policy across disturbances.

This set-based representation tracks which outcome-values can occur under the policy, but it does not track multiplicity, frequency, or probability. Its role here is to support a universal success condition over the disturbance set.

7. Successful regulation at time t

Regulation is successful at time t iff all outcomes realizable under the current rule lie in the acceptable outcome-set at that time:

Yt(D) O acc,t .

Equivalently,

φ ( Yt(D) ) ηt .

In pointwise form, this is

d D , φ ( T ( d , ρt ( d ) ) ) ηt .

This expresses the regulatory task exactly: given the current acceptable region, choose responses so that the resulting outcomes remain within it for every disturbance under consideration.

8. Goal revision without structural adaptation

This model describes the simple case in which:

  • the system structure is unchanged,
  • the available response repertoire R is unchanged,
  • the outcome table T is unchanged,
  • but the goal changes, hence the acceptable region changes,
  • and the regulator responds by using a different response rule from the same repertoire.

That is regulation under an externally revised criterion of success, achieved within a fixed repertoire. It is not structural adaptation in the stronger sense, because the regulator does not alter its own structure or acquire a new repertoire. Stronger adaptation enters only when the fixed repertoire is no longer sufficient and the system must reorganize itself.

9. Clean interpretation of deletion and substitution

We can now state the interpretation precisely. When time passes from t1 to t2:

  • values in T are not deleted,
  • values in Z are not deleted,
  • rather, some values are deleted from the acceptable subset Oacc,t,
  • and others may be added to it.

So deletion and substitution are acceptable notions, provided they are understood as applying to acceptability-status, not to existence in the fixed outcome table. That is the key distinction.

10. Compact proposition

The whole formulation can be summarized as follows.

Proposition. Let T:D×RZ and φ:ZE be fixed. For each time t, let ηtE be the acceptable region and let Oacc,t = φ-1 (ηt) Z . Let ρt:DR be the regulator's response rule at time t, and let

Yt(D) = { T ( d , ρt (d) ) : dD } .

Then successful regulation at time t is the condition

Yt(D) O acc,t .

If the acceptable region changes from ηt1 to ηt2 while T remains fixed, then the induced change in acceptable outcomes is

O acc,t2 = ( O acc,t1 L12 ) G12 ,

where

L12 = O acc,t1 O acc,t2 , G12 = O acc,t2 O acc,t1 .

Thus deletion and substitution under goal revision are operations on the acceptable subset of the fixed outcome space Z, not on the fixed table T itself.

Information-Theoretic Formulation

All acts of regulation can be related to the concepts of communication theory by noticing that the “disturbances” correspond to noise, and the “goal” is a message of zero entropy, because the target value E is constant. Thus, the law of Requisite Variety says that R 's capacity as a regulator cannot exceed R 's capacity as a channel of communication.

The variety is measured by the logarithm of its value. If the logarithm is taken to base 2, the unit is the bit of information. Applying this to Ashby's Law we get:

log2(VDVR) = log 2 V ( D ) - log 2 V ( R )

Ashby's law of requisite variety can be expressed in Shannon-style notation by representing the relevant environmental distinctions and the system distinctions as random variables. In this formulation, the environment is not the whole world indiscriminately, but only the set of environmental behaviors that require distinct responses from the system[56].

In practice, we use the Shannon information entropy, denoted by H. For a quantifiable variable, entropy is just another measure of variance. If we assume equiprobable disturbances/responses and treat variety as cardinality, then H reduces to log 2 V, yielding:

HEHDHR
under the assumption of equiprobable/ uniform prior probabilities.

Ashby's Law can be interpreted as a cybernetic analogue of Shannon's "Noisy channel coding theorem" which states that communication through a channel that is corrupted by noise may be restored by adding a correction channel with a capacity equal to or larger than the noise corrupting that channel. The disturbance D, which threatens to get through to the outcome E, clearly corresponds to the noise; and the correction channel is the system R, which is supposed to restore the outcome E[8].

Ashby's law can thus be reformulated clearly:

The information-processing capacity (entropy) of a control system must be at least as large as the information (entropy) in the system it regulates.

It has been shown shown that the law of requisite variety can be extended to include knowledge or ignorance by simply adding this conditional uncertainty term[31] When buffering is present, part of the environmental variety is absorbed passively before reaching the regulator. This reduces the effective disturbance entropy by an amount K, which is the buffering capacity:

H E H D + H ( R | D ) H ( R ) K

Where:

  • H(E) is the residual variety i.e. the realized essential-variable distribution, not the size of the set E
  • H(R) is the entropy of the regulator, representing its information-processing capacity.
  • H(D) is the entropy of the disturbances, representing the complexity of the environment.
  • H(R|D) is the conditional entropy of the regulator given disturbances, representing the lack of requisite knowledge i.e. the ignorance of the regulator about how to react correctly to each appearance of a disturbance D. Only a regulator that knows how to use the available regulatory variety H(R) to react correctly to each disturbance D will reach the optimal result of regulation[31],
  • K is the buffering capacity measured in bits of disturbance variety absorbed before reaching the regulator. Buffering is the passive absorption or damping of disturbances i.e. the amount of noise that a system can absorb without requiring an active regulatory response.

A necessary condition for effective control is that the regulator have enough effective capacity to offset disturbance variety up to the buffering term: H(R)H(D)+ H(R|D)  K . But this condition alone is not sufficient. In the general case it gives only a lower bound on the residual outcome variety. Near-optimal regulation additionally requires that the regulator's ignorance about which act to use be negligible, H(R|D)0, and that the other idealizing assumptions of the Ashby–Aulin setup hold, in particular that the regulator can use its available acts appropriately for the realized disturbance and that no extra regulator–outcome interaction term is degrading performance.

Successful (essential) outcomes E do not depend solely on the variety of responses H(R) available to a regulator R; the system must also know which response to select for a given disturbance. Effective compensation of disturbances requires that the system possess the ability to map each disturbance to an appropriate response from its repertoire. The absence or incompleteness of such knowledge can be quantified using the conditional entropy H(R|D)[31]. In other words, H(R|D) measures how much the regulator R lacks the requisite knowledge to match responses to disturbances. In the absence of such requisite knowledge, the system would have to select responses, until eliminating all disturbances. Thus, merely increasing the response variety H(R) is not sufficient; it must be complemented by a corresponding increase in selectivity, that is, reduction in H(R|D) i.e. increasing knowledge. H(R|D) = 0 represents the case of no uncertainty or complete knowledge, where the action is completely determined by the disturbance. This requirement may be called the law of requisite knowledge[29].

H(R|D) reminds us that response alone is not sufficient: if the regulator does not know which response is appropriate for the given disturbance, it can only try out regulatory actions at random, in the hope that one of them will be effective and that none of them would make the situation worse. The larger the H(R|D), the larger the probability that the regulator would choose a wrong regulatory response, and thus fail to reduce the variety in the outcomes H(E). Therefore, this term H(R|D) has a “+” sign in the inequality: more uncertainty (less knowledge) produces more variation in the essential variables E[54].

In the ideal best-opportunity case of regulation, effective control requires both sufficient regulatory variety and negligible uncertainty about how to use it for the disturbance encountered. In that limiting case, the regulator has enough available response variety to counter the relevant disturbance variety, HRHD and the residual ignorance about which response to apply is driven to zero, HR|D0 so that the regulator can use its available acts optimally against the disturbance[7].

In other words, under these idealizing assumptions, the regulator has enough effective variety to match the disturbance distinctions that matter for control, and it knows how to deploy that variety appropriately. This should be read as the optimal-control limit, not as a blanket claim that every control problem is solved whenever HRHD and HR|D=0 hold in isolation..

Regulator's learned law of action

The regulator's accumulated structure M is its learned law of action i.e. its prior knowledge. In the deterministic case, M : D → R. In general (learning, uncertainty), treat it as a policy P(R | D), i.e. how probability mass is allocated across responses for each disturbance.

If the Table of Outcomes T is the fixed space of possibilities, then M is the mechanism that induces a measure over possible paths through that space. Across time, the object M is not fixed. Changes in M alter which disturbance-action pairs become more or less probable and therefore how the system's realized trajectories are distributed over the Table of Outcomes T. The probability mass is shifted away from previously selected or ineffective responses and reassigned to alternative responses for the same disturbance.

Since H(R)-H(R|D)=I(R:D) the law simplifies to: HEHD-I(R:D)-K

The mutual information I(R:D) represents the requisite knowledge of the regulator R about how to react correctly to each disturbance D, i.e. the amount of regulatory variety that is effectively correlated with and therefore absorbs the variety in the disturbances. Such knowledge may be realized structurally as the regulator's learned law of action M represented by a mapping M: D → R , by which disturbances are mapped to regulatory responses[54]. The mutual information I(R:D) quantifies how much of the regulator's learned law of action M: D → R effectively couples disturbances to responses, while the remaining uncertainty H(R|D) quantifies the lack of requisite knowledge[29].

Core challenges in applying Ashby's Law to real systems

We conducted a literature review aimed at identifying the primary challenges and limitations associated with applying Ashby's Law in real-world systems.

A central challenge that emerges is the measurement of variety. In most of the reviewed literature, the concept of variety is either poorly defined or not explicitly measured, resulting in ambiguity and potential misinterpretation of the law's implications. Key obstacles to effective measurement include:

  • The direct measurement of variety is fundamentally incomputable for all but the simplest systems [14].
  • Hidden variables introduce uncertainty and complicate measurement efforts [15].
  • Trade-offs often arise between variety at different scales [16].
  • A combinatorial explosion occurs when attempting to enumerate all possible system states [15,16].
  • Resource limitations constrain the feasibility of comprehensive measurement [20].
  • Environmental complexity is frequently “unknowable,” preventing complete assessment [25].
  • Most studies lack explicit or standardized methods for quantifying variety [14,17'-20,25,27].
  • Existing approaches often lack rigorous quantitative validation [17].

Several measurement methods have been proposed, including:

  • Markup language-based variety estimation [18],
  • Iterative sampling techniques [21],
  • Entropy and determinism metrics to evaluate communication complexity, where greater variety was correlated with improved effectiveness [22],
  • Social network and cluster analysis to assess resilience [23], and
  • Multiple Correspondence Analysis (MCA) for capturing organizational complexity [24].

In addition, a subset of studies estimate variety through observed performance rather than structural attributes. Notable examples include:

  • Communication-based performance measures, employing determinism metrics to evaluate repeatable patterns in team behavior [22];
  • Team performance assessments, using task-based surveys to evaluate an organization's risk-handling capabilities [23];
  • Leadership behavior analysis, based on actual behavioral responses to simulated scenarios [26]; and
  • Relative performance comparisons, assessing organizational effectiveness across contexts using perception-based rather than absolute metrics [14].

While these performance-based approaches provide practical insights, they often rely on subjective or indirect indicators of variety, which may introduce biases and limit their generalizability. For example, performance outcomes may fail to account for hidden variables or the underlying complexity of the system [15]. Moreover, these approaches remain underrepresented in the literature, where structural and theoretical analyses still dominate.

In summary, although numerous methods for measuring variety have been proposed, no single comprehensive or universally accepted solution has emerged. Quantification remains a persistent challenge in the application of Ashby's Law to complex real-world systems.

Solution

These challenges significantly hinder the practical application of Ashby's Law. Whether considering a human, an AI model, or an organization, we are typically limited to observing external behavior rather than internal mechanisms—unless we are able to "open the box."

Ashby himself emphasized that all real systems can be considered black boxes. He argued that while black boxes mimic the behavior of real objects, in practice, real objects are black boxes: we have always interacted with systems whose internal workings are, to some extent, unknown.

This leads to what Ashby termed the black box identification approach [2], which involves:

  1. Perturbing the system by applying external disturbances,
  2. Measuring the system's responses to these perturbations, and
  3. Inferring the internal variety or capacity from the observed input-outcome relationships.

In most practical scenarios, we are only able to observe the outcomes of a system. These observable outcomes can be used to infer bounds on the system's internal variety—specifically, the extent of variety it must possess or lack in order to exhibit the observed behavior.

We propose such an approach: to treat the system as a black box, observe the probability of successful outcomes to disturbances, and estimate the gap in its internal variety based on that. Let E denote the event that the system gives an response to disturbance D, and let R be the regulator's action. In information-theoretic terms, perfect regulation requires H(R|D) = 0[31]. Using our novel information-theoretic estimator, empirical estimates of P(E=1) are used to quantify H(R|D) in bits of information. This quantifies how much disorder the regulator still permits and, conversely, how close the system comes to meeting Ashby's requisite-variety condition.

Knowledge Discovery Process

The process of selection may be either more or less spread out in time. In particular, it may take place in discrete stages. What is fundamental quantitatively is that the overall selection achieved cannot be more than the sum (if measured logarithmically) of the separate selections. (Selection is measured by the fall in variety.) 13/17[2]

Ashby's selection in design and regulation via requisite variety are structurally identical: they describe how constraints (or regulation) reduce the variety of possible outcomes from an initial space. in Ashby, constraints, tests, feedback, rules, observations are all selection mechanisms that reduce variety.

Ashby[13/15 [2]] measures the amount of selection in bits as:

σ=log2VbeforeVafter

Where
  • σ is the amount of selection (the amount by which the variety is reduced) or the information gained, i.e., how much the uncertainty has been reduced.,
  • Vbefore is the variety before the selection i.e. before a constraint (filter, decision, control action) is applied, and
  • Vafter is the variety after the selection i.e after the constraint is applied.

From here on, we treat "variety in bits" as Shannon entropy i.e., using the distribution over possible outcomes. If possible outcomes are equiprobable, this reduces to Ashby's log2|V| counting form.

Thus every time we introduce a rule or a constraint we throw away some of the possibilities and gain information equal to the logarithm of that reduction:

  • “What fraction of possibilities remains?” '- that is Vbefore / Vafter
  • “How many bits of information does this represent?” '- that is σ = log2(Vbefore/Vafter)

Rather than a single act, selection is often a multi-stage process consisting of k successive selections from a range of possibilities[2][4]. Each selection stage reduces the set of admissible alternatives, progressively transforming an initial space of variety into a more constrained set of alternatives with the goal to produce an acceptable outcome. Formally, this process can be understood as a sequence of k uncertainty-reducing operations, where each selection narrows the possibility space and thereby decreases entropy. Mathematically we have:

V0σ1V1σ2V2σ3...σkVk

We denote the amount of selection achieved at each stage i as: σ1 , σ2 , σk . The total selection is the sum of the amount of selections achieved at each stage because logarithms turn multiplications of ratios into additions:

σtotal=i=1kσi

where k is the number of selections preceding an acceptable outcome.

The total process therefore consists of k such selections, each conditioned on the result of prior selections. Reductions add only for nested/refining partitions (each stage refines the previous stage's partition of possibilities). If two stages constrain the same dimension in overlapping ways, we must count the second stage's reduction relative to the first, not from the original variety. The number of selections k thus characterizes the depth of the selection process and corresponds to the number of distinct uncertainty-reducing decisions required to reach the final state.

We refer to the multi-stage process of narrowing down and selecting the response from its set of alternative responses to produce an acceptable outcome as a Knowledge Discovery Process.

We can say that we've got "it from bit" - a phrase coined by John Wheeler. "It from bit" symbolizes the idea that every item in the physical world has knowledge as an immaterial source and explanation at its core[6].

Information-Theoretic Formulation of Staged Selection

Aulin-Ahmavaara and Heylighen characterize the ignorance term H(R|D) at the level of disturbance, regulator, action, appropriate response, and quality of regulation. They describe it as uncertainty about how to react correctly to a disturbance and how to use the available regulatory acts appropriately or optimally. However, their prose does not fully fix the granularity of the response variable. It leaves open whether the response denotes: (i) an equivalence class of acts that achieve the same acceptable outcome, (ii) the exact concrete response emitted, or (iii) the optimal response among several acceptable ones.

In the present article, we operationalize that idea at the level of the success-relevant response class to be fixed. Accordingly, X denotes the random variable ranging over success-relevant response classes for the observed disturbance Y. A realized value x of X is the particular success-relevant response class selected by the regulator in an episode. Concrete responses are treated as belonging to the same response class whenever they are equivalent at the acceptable-outcome / goal layer for the task class under study.

Under this modeling choice, H(X|Y) is the residual uncertainty about which success-relevant response class X must be selected to achieve an acceptable outcome, given the observed disturbance Y. This is an operationalization of lack of requisite knowledge, not a claim that the source texts uniquely force this exact response granularity.

Illustrative example. If the disturbance is “pay $10,” and both “use one $10 bill” and “use ten $1 bills” may be equally acceptable responses, from the valuation layer's point of view. If both bill combinations are in the same success-relevant class, then they should not count as different X-values. In that case the regulator may still spend time selecting one concrete act and there may still be uncertainty about which concrete response will be fixed even if there is no uncertainty about whether payment can succeed. That residual uncertainty is not necessarily ignorance in Aulin's strong sense. It may just be unresolved choice among equally acceptable responses. In Ashby's own setup, acceptability is defined by the goal relation between disturbance and response yielding an acceptable outcome; multiple different responses can belong to that acceptable set. If they are in different classes because the environment cares about speed, convention, change preservation, or policy, then they can still be distinct X-values. In that case, uncertainty over which bill combination to use is genuine lack of requisite knowledge about how to use the available acts optimally.

Following the information-and-selection reading of Ashby's law, which says that the amount of rational selection is limited by the information available[57], we introduce a stochastic formulation, where X is the response to be fixed, Y is the observed disturbance, and Zi is the information-bearing signal available at stage i. The table below aligns Ashby's language of regulation with Shannon's information-theoretic quantities by showing that both describe the same process: the progressive reduction of uncertainty about which action will succeed.

Ashby term Symbol Shannon / Information-theoretic term Symbol
Disturbance (observed at decision time) D Conditioning variable (given side-information) Y
Regulator response selected R The success-relevant response class that will be committed for the observed disturbance Y X
Requisite knowledge of the regulator R about how to react correctly to each disturbance D. I(R:D) Stored requisite coupling between the regulator's responces X and the disturbance Y. I(X:Y)
Lack of requisite knowledge of the regulator about which response will produce an acceptable outcome given a disturbance H(R|D) The regulator's residual uncertainty about which success-relevant response classX will be fixed, given the disturbance Y. H(X|Y)
Selection signals (tests, observations, feedback, constraints, rules, partial executions) Z1, Z2, …, Zk Auxiliary information sources that reduce uncertainty about which response X is acceptable for a given Y Z1, Z2, …, Zk
Residual variety after the i-th selection stage V(R | D, Z1, …, Zi) Conditional entropy remaining after i-th selection stage H(X | Y, Z1, …, Zi)
Selection achieved at stage i (reduction in variety due to one constraint) log2   Vbefore / Vafter,i Conditional mutual information acquired at stage i I(X ; Zi | Y, Z<i)
Residual variety after k selection stages V(R | D, Z1, …, Zk) Conditional entropy of the regulator's selected success-relevant response class, given the observed disturbance and k selection signals H(X | Y, Z1, …, Zk)
Total selection achieved (successful adaptation) log2 Vbefore / Vafter Total mutual information acquired through all selection stages I(X ; Z1, …, Zk | Y)

In that formulation, when mapping one stage of selection with one selection signal Zi, the expected selection achieved at stage i is represented by conditional mutual information:

σi = H X | Y , Z<i - H X | Y , Zi = I X ; Zi | Y , Z<i

The selection Ashby counts in bits at stage i is conditional mutual information I(X ; Zi | Y, Z<i) and expected remaining uncertainty is H(X | Y, Z≤i). In the stochastic restatement of Ashby's staged selection, the amount of selection achieved at each stage is represented in expectation by the conditional mutual information contributed by that stage's signal. Thus Ashby's staged selection is not replaced by mutual information; rather, it is re-expressed probabilistically as expected uncertainty reduction under successive information-bearing constraints. This is a modeling restatement of Ashby's staged-selection idea, not a claim that Ashby originally formulated conditional mutual information. This restatement is equivalent to Ashby's staged selection in expectation.

As selections accumulate, the remaining uncertainty H X | Y , Z1 , , Zi shrinks. The total amount of selection Ashby describes corresponds in expectation to the total mutual information accumulated across stages.

σ total = H X | Y - H X | Y , Z1 , , Zk = I X ; Z1 , , Zk | Y

Shannon's chain rule for mutual information is:

I X ; Z1 , , Zk | Y = I X ; Z1 | Y stage 1 + I X ; Z2 | Y , Z1 stage 2 + ... I X ; Zk | Y , Z<k stage k

Those k summands are exactly the k amounts of selection Ashby would add up when he says “the total selection is the sum of the separate selections.”

If stages share information or impose overlapping constraints, summing their marginal “reductions” overcounts. The correct decomposition credits each stage only for the reduction of residual uncertainty left by previous stages. Therefore the per-stage contributions must be conditional (incremental) to avoid double counting, because stages may share information or constrain overlapping parts of X. The correct staged accounting is the mutual-information chain rule over the selection signals Z:

I X ; Z1 , , Zk | Y = i = 1 k I X ; Zi | Y , Z < i

i.e., each stage is credited only for the information it contributes beyond what earlier stages already provided.

After k selections we have:

H X | Y , Z1 , , Zk   = H ( X   |   Y ) initial   variety - [ I X ; Z1 , , Zk | Y ] total   selection

The principle is independent of how the ruling is expressed; all that matters is the fraction of the search-space that each selection stage discards. But whatever mix we choose, the grand total must still cover H(X|Y) if we want the essential variable E to reach (or stay at) zero entropy with respect to its target value.

Episodes and windows

An episode is a staged selection process ending with an acceptable outcome. It is characterized by a basic structure:

  • An observed disturbance realization Y = y : During the episode, y is treated as fixed and fully observed. The within-episode uncertainty is not about which disturbance occurred, but about which response must be selected.
  • A selected success-relevant response class X.
  • A sequence of k selection signals (Z1, Z2, …, Zk).
  • An outcome o O is an externally visible episode-closing commitment to an entry value in the Table of Outcomes (T).

An acceptable closure is an episode-closing commitment whose realized outcome o belongs to the acceptable-outcome set induced by the goal on E , i.e. v o lies within the admissible region of essential-variable states. In Ashby's terms, this is the acceptable-outcome relative to the goal criterion slice of the broader outcome table, not the whole table. A closure event also records that the regulator has committed some success-relevant response class for that episode, but doesn not imply uniqueness of the selected success-relevant response class. Multiple distinct responses may still yield acceptable outcomes. Therefore, acceptable closure records that the episode ended with a committed selected success-relevant response class whose realized outcome is acceptable, but it implies H X | Y = 0 only under the additional assumption that the regulator has resolved uncertainty over the success-relevant response class.

Episode closure occurs only after the regulator has already fixed the episode's selected success-relevant response class X . Therefore a counted closure certifies that the search for that episode has terminated with a finalized response selection.

A counted closure means that the episode has ended with an acceptable closure under the operational ledger. Thus every realized episode contributes exactly one realized closure event.

A window is an externally chosen observation interval, which contains consecutive non-overlapping episodes. It should be understood in two simultaneously valid ways. First, it contains a time-ordered history of realized closure events . Second, at any evaluation time t , that history induces a current ledger state consisting of those closures that still survive as operative accepted closures. These are not competing views. They are two mathematically linked aspects of the same process. The first records everything that happened. The second records what still counts.

Regulation (within-episode)

Following Ashby's idea that selection may be distributed across stages and that total selection is the sum of separate selections, we model within-episode response fixation as a staged uncertainty-reduction process. In this model, complete response fixation occurs when the regulator has resolved uncertainty over the success-relevant response class: H X | Y , Z1 , , Zk   = 0 equivalently, I X ; Z1 , , Zk | Y = H X | Y Or equivalently, the sum of the bits removed by selection by every stage must at least equal the bits of uncertainty injected by the original range of possibilities or by disturbances. This is a specialization of Ashby's staged-selection idea to the response-fixation model, not a restatement of the general law of requisite variety.

A disturbance Y is fixed and fully observed at episode start and is treated as given. All remaining uncertainty is only about which response class X will be selected; subsequent Z is evidence about X, not new disturbance info. This initial uncertainty is the lack of requisite knowledge, measured as the initial conditional entropy H(X|Y). The system then applies a sequence Z1, Z2, …, Zk of constraints, tests, feedback signals, or rules, each of which removes some possibilities and therefore reduces the lack of requisite knowledge. In Ashby's terms this is “selection”; in Shannon's terms each stage i contributes conditional mutual information I(X ; Zi | Y, Z<i). “Non-outcome time” is entirely spent on discriminating information about X for the current episode, and outcome emissions are atomic and don't hide extra selection. “episode closes with an acceptable outcome” event occurs only after the regulator has already fixed the success-relevant response class X (so the closure event certifies that the episode's X has been identified). Therefore, the episode closes with an acceptable outcome event certifies that the episode's X has been identified.

Regulation uses staged evidence Z1:k to eliminate uncertainty within an episode t, where k is the number of selection signals within the episode t. During an episode staged selection supplies Z 1 : k until the response class X is effectively determined:

H ( X Y , Z 1 : k ) 0   I ( X ; Z 1 : k | Y ) = H ( X | Y ) H ( X | Y , Z 1 : k ) H ( X | Y )

This is “regulation”: success happens because H(X|Y,Z=z) becomes small after receiving a particular selection signal z.

Regulation can use within-episode evidence Z to determine the response class X for a given disturbance Y during an episode t. Learning makes that success persistent by updating the stored structural coupling (mapping) M so future episodes start with less esidual uncertainty about which response will be selected for the same class of disturbances.

Learning (across episodes in a window)

Let Mt denote the regulator's internal stored structural coupling (mapping) at the start of episode t. In Ashby's terms, M is not a separate object but the regulator's law of action i.e. the learned functional relation by which disturbances are mapped to responses to produce acceptable outcomes[54]. By contrast, the Table of Outcomes remains the fixed environmental relation T : Y x X O , which specifies what outcome would result from each disturbance-response pair. Learning does not alter that table. Learning does not alter that table. Learning alters the regulator's stored coupling Mt, and therefore alters the response law induced at the start of later episodes.

In the across-episode analysis that follows, we treat X as the same selected selected success-relevant response class variable for the same task class across episodes: its response alphabet and coding granularity are held fixed. Thus, although the stored structural coupling Mt may be updated from one episode to the next, the semantic definition of X is unchanged across the comparison.

For episode t, the stored coupling Mt induces a conditional response law P t ( X | Y ) := P ( X | Y ; M t ) .

For each episode t in a window, Mt is treated as a parameter (not a random variable) fixed at the start of that episode: Therefore we do not write entropies or mutual informations “conditioned on Mt”. Instead, we write the information-theoretic quantities induced by the distribution generated from the mapping Mt:

H t := Ht (X|Y) := HPt (X|Y) I t := It (X;Y) := IPt (X;Y)

Here I t is the stored requisite coupling: the amount of response-selection structure that is already aligned with the disturbance distinctions relevant to the task. Dually, H t is the residual lack of requisite knowledge at episode start: after the disturbance is known, it measures how uncertain the regulator still is about which response it will select under its current stored structure.

What changes across episodes is not the semantic definition of the task, but the regulator's stored coupling for that task. Thus, the learning question is: after one episode's within-episode discoveries have been incorporated into M t , does the next comparable episode begin with stronger stored coupling M t + 1 and less residual ignorance than before?

For the specific purpose of reading changes in stored requisite knowledge and changes in lack of requisite knowledge as exact duals, we impose the additional comparability assumption that the marginal response entropy is invariant across the comparison:

H t ( X ) = H t + 1 ( X )

Under this assumption, the standard identity I t ( X ; Y ) = H t ( X ) H t ( X | Y ) implies

Δ t I = I t + 1 ( X ; Y ) I t ( X ; Y ) = ( H t + 1 ( X Y ) H t ( X Y ) )

Therefore, within this across-episode comparison regime, an increase in stored requisite knowledge is exactly equivalent to an equal decrease in lack of requisite knowledge: Δ I t = Δ H t ( X Y )

If this comparability assumption is relaxed, then Δ I t and Δ H t ( X Y ) need not coincide, because part of the change in mutual information may come from drift in the marginal entropy H t , not only from the conditional ignorance term H t ( X Y ) .

We do not relax this comparability assumption in what follows.

Learning Axiom (Structural Knowledge Accumulation). A system is said to learn, in the structural-coupling sense, if and only if the within-episode evidence stream Zt = (Zt,1, …, Zt,kt) is incorporated into an updated mapping Mt+1 = Update(Mt, Zt), such that for subsequent encounters with the same class of disturbances Y, the stored requisite coupling increases:

I t + 1 ( X ; Y ) > I t ( X ; Y )

Under the comparability assumption stated above, this is equivalently expressed as a decrease in lack of requisite knowledge:

H t + 1 ( X | Y ) < H t ( X | Y )

In words: learning means that after the update of stored structure, the regulator begins the next comparable episode with stronger disturbance-response coupling Mt+1 and therefore less residual ignorance about which response to select. We do not assume any particular memory mechanism (overwrite, patching, versioning, or full replacement), but only track the effect of the update on the regulator's induced coupling measures.

Strong Learning Assumption (Posterior-Becomes-Prior Rule). A stronger form of learning is obtained when the regulator stores and reuses, without loss, the uncertainty reduction achieved within episode t. For a stable task class, we then assume that the posterior coupling achieved by the end of episode t becomes the prior stored coupling at the start of episode t+1.

Under the ignorance view, this may be written as the following Posterior-Becomes-Prior Rule:

H t + 1 ( X | Y ) := H t ( X | Y , Z t )
Under this assumption, the posterior uncertainty achieved by the end of episode t becomes the prior uncertainty at the start of episode t+1.

Under the dual knowledge view, and under the comparability assumption on H t ( X ) , this is equivalently:

I t + 1 ( X ; Y ) = H t ( X ) H t ( X Y , Z t )
.

This Posterior-Becomes-Prior Rule is a strong additional assumption, not a general consequence of conditioning alone. It requires, at minimum:

  • the same task class or disturbance semantics across episodes,
  • successful retention of the within-episode discoveries,
  • reuse of that stored structure in later episodes,
  • no intervening forgetting or context drift that would invalidate the stored coupling.

Corollary (Complete Adaptation under Strong Learning). Under the Posterior-Becomes-Prior Rule, repeated successful adaptation drives the stored requisite coupling toward its task-class ceiling and drives the residual lack of requisite knowledge toward zero for the task class: I t ( X ; Y ) H ( X ) H t ( X | Y ) 0

In that limit, the response becomes effectively determined by the disturbance for the task class under study: H t ( X | Y ) = 0 This is the state of complete adaptation for the task class: no further within-episode discovery is required in order to determine the response.

Clarification. The weak learning axiom is sufficient to define learning. The strong form is only needed because in what follows our model will assume that within-episode discoveries are fully carried forward as next-episode prior structure.

Temporal scope of windows and operative outcomes

The present model combines two distinct structures that must be kept separate. First, there is Ashby's fixed two-dimensional Table of Outcomes T:Y×XO , which specifies what realized outcome would result from each disturbance-response pair. Second, there is the one-dimensional temporal trace of completed episodes observed across time. The table of outcomes is a fixed space of possibilities. The temporal trace records only which episodes actually completed, and in what order.

A window is a bounded temporal slice of that one-dimensional trace. It contains the non-overlapping episodes that complete within the chosen start and end times, ordered by completion time. An episode remains defined as one temporally bounded search-and-fixation process ending in exactly one externally visible closure. Thus windows are temporal objects, while Ashby's table T is not temporal: it is the fixed outcome space within which temporally ordered episodes select and revise realized outcomes.

For a given decision target τ , let the operative outcome at the start of a time interval mean the outcome in O that is currently in force for that target immediately before the next episode acts on it. That operative outcome may have been established earlier in the same window or in prior history before the window began. Accordingly, a current episode may act upon a target whose operative outcome was not produced in the current window, but is nevertheless the currently operative point in the fixed outcome space.

This means that temporal correction is not restricted to outcomes first closed in the same window. A later episode in the current window may revise an already-operative outcome inherited from earlier time. For example, a later review episode may overturn an outcome that was previously accepted and currently remains operative at window start. In that case, the corrective episode occurs in the present window, but the outcome being corrected belongs to the same fixed Table of Outcomes T .

Accordingly, throughout what follows, “final” always means final relative to the currently observed bounded temporal window, not necessarily final for all future time. A closure may survive to the end of one window and still be revised by a later episode in a subsequent window. This does not imply that the earlier window was defined incorrectly. It means only that windows are local temporal observations of an ongoing process of regulation and correction unfolding over time.

Rework

Having distinguished the fixed two-dimensional Table of Outcomes from the one-dimensional temporal trace of completed episodes, we can now define rework precisely.

Rework is any externally observable behavior in which previously produced artifacts are revised, undone, deleted, replaced, or corrected due to new evidence (e.g. failing tests, defects, requirement changes, reversals, rollbacks). Rework is observable through external change signals such as deletions, reversions, churn, reopened items, or corrective follow-up actions. Cybernetically: rework is capacity spent on revising prior episode-closing commitments, which reduces marginal episode-closure per unit capacity because the system revisits and repairs earlier choices instead of closing new episodes.

In Ashby's terms, if appropriate effects appear before the corresponding causes/questions have been fully resolved, one must look for the missing channel that carried the needed information. Ashby explicitly says that apparent overstepping of the limitation leads us to search for the additional communication channel that accounted for it[8]. That is very close to our model: the overrule reveals that the earlier fixing of response was not actually final.

Rework should be understood temporally but not structurally. It is not a change in the structure of Ashby's table T:Y×XO , nor a change in the valuation mapping v:OE . What rework changes is which realized outcome, and therefore which induced value, is currently operative for the same target within that fixed outcome space. Thus rework is a temporally later revision of an operative outcome-value assignment inside a fixed two-dimensional space of possible outcomes.

Accordingly, a rework episode is a later non-overlapping episode that fixes a different operative outcome-value assignment for a target that already had an operative outcome in force when that episode began. The prior operative outcome may have been established earlier in the same window or in prior history before the window opened. Thus rework is counted in the window in which the corrective episode occurs, even when the outcome being corrected was already operative at window start. What matters is not where the earlier operative outcome was first produced in time, but that an additional corrective selection had to occur after a prior selection had already fixed the matter operationally. Under the information-selection reading of Ashby's law, this matters because the amount of selection that can be performed is limited by the information available; hence each rework episode is evidence of additional selection burden, and therefore of additional informational burden, beyond the initial fixing of response.

Illustrative example. Consider a football game. In Episode 1, the referee performs a staged selection process and closes the episode with response x1 to disturbance y1, yielding realized outcome o1 = T(y1, x1). with value v ( o2 ) = goal allowed . Now suppose that in Episode 2, VAR reviews the same target τ. The earlier outcome o1 is already operative for that target when the new episode begins. VAR then performs an additional staged selection process, selects response x2, and closes with outcome o2 = T(y2, x2), where v ( o2 ) = no goal v ( o1 ) . This closure is counted as rework because it changes the operative outcome-value assignment of a target that had already been fixed operationally. In the VAR example, the final accepted state “no goal” is not obtained by one selection process but by the accumulated effect of two temporally separated staged selections acting on the same target: the referee's initial fixation and VAR's later corrective overrule.

Temporal ledger formulation of episodes, rework, and net closures

A window Ω is observed up to an evaluation time t . Let the realized closure events in window Ω up to evaluation time t be indexed by u t .

For each such closure event define:

  • τ u : the decision target fixed by closure u ,
  • v u : the value fixed for that target by closure u ,

For each evaluation time t , let S gross t denote the cumulative number of realized closure events up to t .

A realized closure event may later be superseded by a later corrective closure on the same target. To represent what is still counted in the ledger at evaluation time t , define the survival indicator

A t = { 1 if closure u is still the currently operative accepted closure for its target at time t 0 otherwise

Thus A t does not ask whether closure u once happened. It asks whether closure u still survives in the current ledger at time t .

The net surviving closure count at evaluation time t is therefore S net t = u < t Ω A t

This is also the number of targets that currently possess an operative accepted closure in the ledger at time t . This quantity is time-indexed. It is not the cumulative number of all realized episodes in history. It is the number of episodes that still count as operative accepted closures in the ledger at evaluation time t .

Rework as superseded closure history

Define the cumulative rework count up to evaluation time t t by W t = u < t Ω 1 - A t

This counts those realized closures that did occur, consumed capacity, and were accepted when made, but no longer survive as the currently operative accepted closure for their target at evaluation time t .

Because every realized closure event up to time t either survives in the current ledger or has been superseded, we have the exact identity S net t = S gross t - W t

Effect of a corrective closure. Suppose that at time t 1 the ledger contains n operative accepted closures: S gross t 1 = n W t 1 = 0

Now suppose that by a later time t 2 , one already-operative target is revised by a new corrective closure. Then:

  • one new realized closure has occurred, so S gross t 2 = n + 1
  • one earlier closure on that target has become superseded, so W t 2 = 1
  • the corrective closure itself is now the surviving operative closure for that same target, so S net t 2 = S gross t 2 - W t 2 = n - 1 - 1 = n

Thus a corrective closure increases temporal selection burden while leaving the number of currently surviving operative closures unchanged. This is the crucial point: after correction, the number of old unchanged surviving closures becomes n - 1 , but the new corrective closure becomes the new operative closure for that same target, so the total number of surviving operative closures remains n , not n - 1 .

This distinction makes the role of rework precise. At the level of realized history, a corrective act is a genuine later episode: it consumes bounded capacity and contributes additional staged selection. At the level of the current ledger, that corrective act does not add a net-new operative target closure. It replaces an earlier one on the same target. Therefore rework increases the amount of selection expended in the window without necessarily increasing the number of ledger-counted operative episodes at evaluation time t .

For a window I = t a , t b , define the increments Δ S gross I = S gross t b - S gross t a Δ W I = W t b - W t a Δ S net I = S net t b - S net t a

Then automatically: Δ S net I = Δ S gross I - Δ W I .

Fixed-capacity comparison across windows

The previous formulation is dynamic: evaluation time advances and the ledger is re-read. A second, complementary comparison holds the total window capacity fixed and asks how observable rework changes what can be achieved under that fixed budget.

Here we are instead using a fixed-capacity comparison across windows where N is anchored and held constant, then rework shows up by lowering achievable S net relative to a no-rework baseline.

Thus the no-rework baseline is the maximum achievable S net under the fixed capacity constraint.

Rework, retraction, and net learning

A mapping update may include both the addition of improved structure and withdrawal of previously stored structure. These internal components must be distinguished from externally observable rework and from the net epistemic effect of the update as a whole.

Observable rework. Rework is any externally visible correction, undo, replacement, deletion, or repair of prior commitments. It is defined at the level of artifacts and execution behavior, not at the level of the regulator's internal coupling ledger.

Internal retraction and accretion. A mapping update of M may contain both a retraction component in which previously stored structure is withdrawn or weakened, and an accretion component, in which new or improved structure is added or strengthened.

These internal components should not individually be identified with negative learning or positive learning. In particular, withdrawing false or obsolete structure is often part of successful learning.

Net learning. Learning is evaluated by the net epistemic effect of the whole update after it is complete. The relevant question is not whether some fragment of prior structure was removed, but whether the completed update leaves the regulator with stronger or weaker stored disturbance-response coupling than before.

Net learning criterion

We evaluate learning by the net change in stored requisite coupling induced by the whole update:

ΔtI := It+1 - It
This is the primary across-episode learning quantity.

Under the comparability assumption H t ( X ) = H t + 1 ( X ) , the same net change may be written dually as a change in lack of requisite knowledge

ΔtH := Ht+1 - Ht Δ I t = Δ H t

Accordingly:

  • Net positive learning iff Δ I t > 0 (equivalently, a net decrease in lack of requisite knowledge under the comparability assumption Δ H t < 0 ).
  • Zero net learning iff Δ I t = 0 (equivalently, under the comparability assumption, Δ H t = 0 ).
  • Net negative learning iff Δ I t < 0 i.e. the final mapping is worse than the initial one with respect to stored requisite coupling (equivalently, a net increase in lack of requisite knowledge under the comparability assumption Δ H t > 0 ).

Thus, negative learning is reserved for a net worsening of stored requisite coupling after the whole update is complete. It should not be inferred merely from the fact that part of the update involved deletion, withdrawal, or correction of previously stored structure.

Exact knowledge ledger of net change

For knowledge accounting purposes, define the positive and negative parts of the net coupling change:

Gt := max ( ΔtI ,0) Lt := max ( -ΔtI ,0)

Then the stored requisite coupling satisfies the exact ledger identity:

It+1 = It + Gt - Lt

Under the comparability assumption, the same net update may be read dually in ignorance form:

Ht+1 = Ht - Gt + Lt

This ledger records only the net epistemic effect of the update. It does not assert that the internal update process itself consisted of a pure gain or a pure loss. A single episode may contain both correction of prior structure and acquisition of improved structure, yet still end with Gt > 0 .

Why rework is not the same as negative learning

An episode may contain observable rework and still yield net positive learning. For example, a software developer may correct a previously committed line of code by deleting a wrong symbol and replacing it with the correct one. This is clearly rework at the artifact level. But if the completed update leaves the developer with stronger stored disturbance-response coupling for future comparable episodes, then the episode yields net positive learning, not negative learning.

Accordingly, all three combinations are possible:

  • Rework + net positive learning
  • Rework + zero net learning
  • Rework + net negative learning.

Therefore, observable rework does not by itself imply Δ I t < 0 . Rework is an execution-level phenomenon; negative learning is a coupling-ledger judgment about the net epistemic effect of the completed update.

Quantifying the Knowledge Discovery Process

How is the desired regulator to be brought into being? With whatever variety the components were initially available, and with whatever variety the designs (i.e. input values) might have varied from the final appropriate form, the maker Q acted in relation to the goal so as to achieve it. He therefore acted as a regulator. Thus the making of a machine of desired properties (in the sense of getting it rather than one with undesired properties) is an act of regulation[2].

We now turn from the mathematical formalization of Ashby's staged selection process to its operationalization. The quantity of theoretical interest is the regulator's H t , which is latent. In practice, we do not directly observe the disturbance Y, response class X, evidence Z1, Z2, …, Zk or the internal selection process directly. What we do observe is the externally visible execution stream: bounded action capacity, episode-closing commitments, and observable rework.

The goal therefore is not to recover H t directly. but to numerically approximate it in bits, by constructing an operational count-based identity, and under stated assumptions interpret it as an estimator up to the standard one-bit ideal-coding gap. Both quantities are different kinds of objects, but they will be connected by an explicit measurement model. For example, temperature is latent at the molecular level; a mercury column is a different physical object; but it still serves as an estimator only because there is a calibration model connecting them.

Accordingly, the structure of the argument in what follows is: (1) define the latent window-average ignorance term, (2) relate episode-level binary discrimination depth to conditional entropy, and (3) construct a black-box observable operational count-based identity, and under the idealized counting model interpret it as an operational estimator for that average ignorance up to the standard one-bit ideal-coding gap.

Defining Knowledge To Be Discovered

Ashby's Law of Requisite Variety provides the general control frame: disturbances must be countered by adequate regulatory variety if essential variables are to be kept within acceptable bounds, it must possess sufficient variety relative to the relevant variety of the environment[1]. In modern entropy-based restatements, this requirement can be expressed as a matching condition between relevant environmental distinctions and the system distinctions available to answer them[56]. That control frame tells us what successful regulation requires, but it does not by itself define the regulator's residual uncertainty inside a concrete episode of action.

Heylighen's Law of Requisite Knowledge adds the missing condition for effective regulation: it is not enough for a regulator to possess a repertoire of possible responses; it must also possess the requisite knowledge needed to select the appropriate one for the disturbance encountered. Otherwise, increased action variety increases the chance of choosing the wrong action, forcing trial-and-error selection[29]. In information-theoretic form, that remaining lack of requisite knowledge is expressed by the conditional entropy term H(X|Y). That residual lack of requisite knowledge is represented by the conditional entropy HR|D . In Heylighen's formulation, HR|D=0 means complete knowledge, while HR|D=HR means complete ignorance[29].

In the present model, we retain that same object, but write it using the episode notation Y for the observed disturbance and X for the selected success-relevant response class. Thus the regulator's lack of requisite knowledge is represented by HX|Y . This is not introduced here as a new replacement for Ashby's law. It is the same kind of ignorance term, now expressed in the symbols of the operational model.

Crucially, conditional entropy is itself an average quantity: it is the expected value of the entropies of the conditional response distributions, averaged over the conditioning variable[5]. Therefore the primary object of interest in this article is not a one-off episode quantity taken in isolation, but the regulator's average lack of requisite knowledge over a window of episodes.

Window-level estimand

Let a window W contain m episodes. For episode t , let Mt be the regulator's stored structure at episode start, inducing a response law Pt(X|Y) . If the regulator structure changes during the window, then each episode may have a different induced law Pt(X|Y) . The corresponding episode-start ignorance term is

Ht (X|Y) := HPt (X|Y)

Knowledge To Be Discovered is the regulator's average episode-start uncertainty about which success-relevant response class will be fixed for the observed disturbance:

H ¯ W : = 1 Snet W t = 1 Snet W H t ( X | Y )
averaged over the net surviving closure count Snet W ti.e. the cardinality of the ledger-counted episode population whose closures survive as accepted by the end of the bounded execution window W . H¯ W may also be read as the latent estimand of the regulator's average lack of requisite knowledge H X | Y .

Thus the estimand is explicitly relative to the window-end ledger of surviving accepted closures. This has important consequences:

  • the estimand is not global over the full Ashby table,
  • it is not window-invariant,
  • it may change if the same work is sliced into different windows,
  • and it is non-retroactive across already-closed windows, even if it is retroactive within the window.

Episode-level interpretation

Although the estimand is window-level, each episode still has a natural local interpretation. For a particular episode t with observed disturbance realization Y=yt , the regulator begins with a latent response-selection uncertainty Ht X|Y=yt . Within the episode, staged evidence Z1:k reduces that uncertainty until a response is sufficiently fixed for closure. Thus the episode-level search process is the local mechanism, while H¯ W is the window-level quantity summarizing those episodes on average.

Relation to a pooled window distribution

We treat the whole window as generating one empirical joint distribution P W ( X | Y ) , and define pooled window entropy as:

H W ( X | Y ) = y P ^ W ( y ) x P ^ W ( x | y ) log P ^ W ( x | y )

If the regulator's structure is approximately stable throughout the window, so that the induced response law does not materially drift across episodes, then the average ignorance term may be represented by the pooled window distribution:

HW (X|Y) H¯ W

But in general they should be kept distinct. If learning, forgetting, or context drift changes Mt within the window, then the window-average ignorance term is more faithful than a single pooled conditional entropy.

What this definition does and does not mean

  • It does preserve the cybernetic meaning of the ignorance term: uncertainty about which response to select for a given disturbance[31].
  • It does not redefine Ashby's law as a theorem about one isolated episode.
  • It does not identify Knowledge To Be Discovered with arbitrary outcome uncertainty, with success probability, or with the full entropy of the environment's outcome table.
  • It does provide the latent quantity that the operational execution model will later try to estimate from window-level counts.

Ideal decision-depth layer

The quantity H ¯ W is the latent estimand of Knowledge To Be Discovered H X | Y . To connect this latent uncertainty to decision depth, we first introduce the latent ideal binary question depth q t , which inside an episode t, denotes the ideal minimum expected number of binary questions required to identify the selected success-relevant response class X after the disturbance Y has been observed.

The earlier quantity k denotes the number of staged selections in Ashby's general selection process. q t is not the same as Ashby's general stage count k in all cases, but only in the specialized optimal binary-questioning version of that staged-selection process. Thus q t is a restricted special case of the earlier k , not the general case itself.

Under the assumption that the regulator follows an optimal binary questioning strategy for the posterior P X | Y = y , for each Y=y , the latent ideal binary question depth satisfies the Shannon +1 bound[5]:

H t X | Y = y E [ q t | Y = y ] < H t X | Y = y + 1

Averaging over Y yields:

H t X | Y E [ q t ] < H t X | Y + 1

Equivalently, averaging over the number of episodes mW that survive as accepted closures in the window yields:

H ¯ W E ¯ W [ q ] < H ¯ W + 1

(1)

where E ¯ W [ q ] := 1 mW t=1 mW E [ q t ] .

Thus the quantity E ¯ W [ q ] is the ideal minimum expected number of binary question depth associated with the latent Knowledge To Be Discovered for a window.. It is an information-theoretic benchmark, not yet an operational count from the execution trace.

This distinction matters. The bound above applies only to the ideal minimum expected number of binary questions under an optimal questioning procedure. It does not by itself characterize the regulator's realized execution process, because the actual within-episode search may be redundant, overlapping, delayed, non-optimal, or only approximately binary in its information yield. Accordingly, the operational counting model introduced next should be understood as a separate approximation layer, not as a direct Shannon identity.

Operational layer

Consider a regulator that interacts with an environment in episodes. In each episode: A disturbance Y=y is observed at episode start and is treated as fixed for the episode. The regulator ultimately emits an episode-closing commitment oi (an externally visible acceptable outcome) by selecting responses. Let X denote the available response set (Ashby's R ; the action alphabet). Let X denote the regulator's selected success-relevant response class random variable, with values in X , distributed according to the regulator-induced policy P(X|Y) (induced by its current coupling/structure M t ).

The Knowledge To Be Discovered quantity H ¯ W is the regulator's average uncertainty about which success-relevant response class will be fixed for an observed disturbance, operationalized through the execution-trace model. It is not read directly from the execution trace. What the trace provides is an operational count of atomic action units expended per net surviving closure. The remainder of this section introduces a measurement model that uses those observable counts to construct a calibrated proxy for the latent window-average Knowledge To Be Discovered.

The present model does not estimate a property of Ashby's full two-dimensional outcome table T. It estimates a window-relative, ledger-relative quantity induced by the one-dimensional execution trace of completed episodes observed in that window. Accordingly, both the episode population and the corresponding average lack of requisite knowledge are defined relative to the closures that survive in the observer's ledger at window end. Thus, “final” in this model means final-within-the-window-ledger, not necessarily final for all future time.

That follows from three modeling choices we have already made.

  • First, the object we observe is not Ashby's full space of possibilities T:DxR→O. It is only the temporal trace of what actually completed in the window. So we are no longer estimating a property of the full outcome table. We are estimating a property of the regulator as revealed through the bounded execution channel in that window.
  • Second, episode identity is tied to observable closure in the trace, not to an abstract disturbance-response pair considered independently of observation. So whether something counts as an episode depends on whether it survives as a counted closure in that window's ledger.
  • Third, rework is defined relative to the history visible to the observer in or before that window. So finality is not metaphysical finality. It is finality relative to the ledger available at window end.

Accordingly, the structure of the argument in what follows is: (1) define the latent window-average ignorance term, (2) relate episode-level binary discrimination depth to conditional entropy, and (3) construct a black-box observable operational count-based proxy, and under the idealized counting model interpret it as an operational estimator for that average ignorance up to bounded approximation error.

Operational counting assumptions. We work with a deliberately coded execution model. The primitive unit is not raw clock time. Instead, we operationally partition the execution stream into counted atomic units chosen fine-grained enough that each counted non-closure unit contributes at most one response-relevant binary discrimination. If an observed act yields more than one such binary discrimination, it is represented as multiple counted units. Once we do that, the unit is no longer just a time interval. It becomes a measurement unit or accounting quantum.

We are not discovering that real time naturally comes in one-bit chunks. We are defining an operational coding scheme under which observed work is decomposed into one-bit-equivalent units. Under this coding convention:

  • Atomic action types. Each counted atomic action unit is classified as exactly one of the following:
    • a non-closure discrimination unit, contributing at most one response-relevant binary discrimination, or
    • a closure unit, recording one externally visible episode-closing commitment.
  • Scope of the observed outcome stream. The operational model does not track the full Ashby outcome space. It tracks only those episode-closing commitments whose realized outcomes are counted, by the operational ledger, as acceptable closures.
  • Shared execution channel.
    There is a single shared execution channel with finite window capacity N , measured in counted atomic action units. No parallel channels are assumed in the base model.
  • One closure per counted episode. Each counted episode contains exactly one closure unit.
  • Gross closure-event Any externally observable fixation episode-closing commitment that the ledger counts as an acceptable closure for one episode. Some survive as counted episodes; some are later overruled and become rework.
  • Episode is defined at the net ledger level In this operational model, an episode is defined not by any provisional closure-event, but by a closure that survives as an accepted observable closure relative to the ledger available at window end, not eternal finality.
  • Binary selection units.
    A counted atomic action unit that does not contain an episode-closing commitment is treated as one binary discriminating selection step. One such step contributes at most one bit of information relevant to fixing X given Y . If a real-world test yields more than one bit, it is represented as multiple counted binary units. This is the operational bridge from staged selection to question-depth.
  • No hidden selection inside closure.
    A closure unit is atomic at the observation level. It records that the episode has closed and does not hide additional uncounted discrimination inside the same counted unit.
  • Gross closure indicator.
    For each counted atomic action unit t ∈ { 1,...,n} in the observation window define the binary event-type indicator:

    Bt= { 1 iff counted unit t is a gross closure unit,, 0 otherwise }

    Bt is a purely operational a gross channel marker, not a net-of-window-end marker. It is therefore an execution-level marker, not the response variable X itself, not the goal variable / acceptability criterion G, not a success probability, and not a value of the essential variable E.
  • Gross and net accounting. Gross closures record everything that closed in the window. Later, when rework is introduced, some gross closures may fail to survive as net accepted closures in the window ledger. This affects the accounting of surviving closures, but not the meaning of the counted atomic action unit itself.

Let T:DxRO be Ashby's two-dimensional outcome table: disturbances D , responses R , and outcomes O . In Ashby's framing, regulation works by selection in this table, and the law of requisite variety can be read as a law relating information and selection: the amount of selection that can be performed is limited by the information available.

Let Ω be an externally chosen observation window, and let EΩ = ( e1 , e2 , , en ) be the non-overlapping episodes completed in that window, ordered by completion time. The sequence EΩ is the one-dimensional temporal trace observed in the window. This trace is distinct from Ashby's two-dimensional table T because it only records the episodes that completed in the window, not the entire table of possible outcomes. We refer to this trace as the execution channel divided into atomic time units.

We work with a deliberately narrow operational model. The regulator is a black box. We do not observe its staged selection process. Each episode begins with an observed disturbance Y=y , which is treated as fixed for that episode. The regulator then performs a sequence of internal or externally mediated discriminations until the response class X is sufficiently fixed for the episode to close acceptably.

Let {et} = {e1,,en} denote the complete time-ordered sequence of events over a fixed window of n discrete time units, where each unit is fully utilized by exactly one atomic action on a single shared execution channel. Each counted atomic action unit is classified by the binary event-type indicator Bt which is 1 if the counted atomic action unit contains an episode-closing commitment and 0 otherwise.

In this base section, B t marks a closure that the ledger presently counts as an acceptable episode closure. Later, when rework is introduced, the accounting will be refined into gross closures, wasted closures, and net closures, without changing the basic meaning of B t as an event-type marker.

Partition the event stream {et} into the S consecutive non-overlapping episodes {wi}, where episode wi corresponds to responding to one disturbance realization Y = yi , consists of exactly one outcome oi that closes the episode and contains exactly ki actual counted discriminating action units (selections) preceding that outcome,

We now introduce a distinct operational quantity qt , which denotes the actual counted number of discriminating action units expended in episode t under the black-box execution model: q t = u w i ( 1 - B u ) where w i is the set of counted atomic action units belonging to episode t . Since each episode has exactly one closure unit, this just counts the non-closure units inside that episode.

Operational counting identity and decision-depth interpretation

We now introduce the first exact quantity in the operational layer.

The total number of counted gross closure-events in the window is

S = t = 1 n B t
Because the ledger counts only acceptable episode-closing commitments, S is not a count of arbitrary activities or arbitrary realized outcomes but a count of acceptable closures.

Since each counted episode contains exactly one closure unit, summing over all counted episodes gives Q = i = 1 S q i = n - S This quantity is interpreted, under our coding convention, as the total number of binary-discrimination-equivalent units in the same window.

If the minimum outcome duration is one unit of time, equal to the time it takes to make one selection, then each action occupies one unit of time t. Hence the sum of S and Q equals the length n=Tt of the time series. The total available time T is therefore: Qt+St=T. We define r=t-1 as the execution rate of the channel, and since n=Tt=Tt-1=Tr then Q+S=Tr. We now drop usage of n and instead use N as maximum action capacity (selections + outcomes) per window. N=Tr

Under the single-channel accounting model, every counted atomic unit is used either for one non-closure discrimination unit or for one closure unit. Therefore, Q = N - S

Accordingly, the window-average counted non-closure depth per closure is

q ^ = 1 S i = 1 S q i = Q S = N S - 1

(2)

This is an exact counting identity, not an approximation. It states only that q ^ is the observed sample mean number of counted non-closure units per counted closure in the window.

Under the coding convention above, this exact identity receives a decision-depth interpretation. Because the execution stream has been partitioned into counted atomic units such that each counted non-closure unit contributes at most one response-relevant binary discrimination, the quantity Q is measured directly in binary-discrimination-equivalent counted units. Therefore q ^ is the observed average actual counted decision depth per closure in the execution trace.

This is not yet a Shannon identity for entropy. It is an exact estimator of the window-average realized counted decision depth under the operational coding scheme.

To connect this operational quantity to the latent ignorance term, we must distinguish actual counted decision depth from ideal decision depth.

If, in addition, the counted execution process is assumed to approximate ideal binary questioning closely enough that the observed counted depth is a good estimator of the corresponding ideal depth as per to Equation (1) above, then Equation (2) provides the operational bridge from observable execution counts to the latent information-theoretic quantity.

Therefore, q ^ should be read in two stages. First, exactly and operationally, it is the observed average counted decision depth per closure. Second, under the additional ideal-binary-questioning interpretation, it serves as an estimator of the latent Knowledge To Be Discovered up to the standard one-bit ideal-coding gap and any remaining mismatch between realized counted search and ideal questioning.

Exanple of Knowledge discovery process

We illustrate the knowledge discovery process in Fig. 1.

Fig.1 A Knowledge Discovery Process. A regulator faces a sequence of disturbances Y and selects responses X over counted atomic action units t. The total available time T is divided into n units, each of which is used either to perform one optimal binary selection (one internal node traversal in a decision tree) or to emit one realized outcome in O. Three realized outcomes o1, o2, and o3 are produced. For disturbance y1, the response is already determined by the stored structural coupling between X and Y (i.e., H X | Y = y1 = 0 ), so outcome o1 is emitted with zero selections. For disturbance y2, the regulator must search within the candidate response set Ω2 , traversing four binary decision nodes (z1z2z3z4) before identifying the response and emitting o2. For disturbance y3, the candidate set Ω3 requires two binary selections (z5z6) before outcome o3 is produced. Accordingly, the per-disturbance ideal binary question depths illustrated in the figure are: q1=0 , q2=4 , and q3=2 . These are decision-tree depths, not entropies. For each disturbance realization, Shannon's optimal binary questioning bound gives Ht X | Y = yt E [ qt | Y = yt ] < Ht X | Y = yt + 1 . Thus, for y2, the four-step path implies H2 X | Y = y2 4 , with equality only in the special dyadic case where the posterior over terminal alternatives is exactly equiprobable over 24 possibilities. Similarly, for y3, the two-step path implies H3 X | Y = y3 2 , with equality only in the special dyadic case of four equiprobable terminal alternatives. The average illustrated binary question depth is therefore (0 + 4 + 2)/3 = 2. Under the ideal optimal-binary-questioning interpretation, this average decision depth is an operational benchmark for the latent conditional entropy H(X|Y), not an identity with it. This average decision depth quantifies the Knowledge To Be Discovered as a question-depth benchmark revealed by the figure.

Observable rework

For gross closure et, define the prior operative value vt as the most recent already-operative value for the same target, whether that value came from earlier in the window or from before the window. An episode et is an additional corrective selection iff it fixes a different value vt for a target τt that already had an operative value vu in the prior episode, whether that value came from earlier in the window or from before the window.

Gross closure-events are externally visible fixation commitments. If a later corrective selection overrules an earlier closure for the same decision target within the same window, the earlier closure is not counted as a separate episode in the window's net ledger. It is counted as observable rework.

Target and operative value. For every counted atomic action unit t with Bt = 1 , let τt denote the decision target fixed by the gross closure at t and vt denote the operative value fixed for that target by that gross closure. These quantities are defined only when Bt = 1 .

Additional corrective selection / supersession indicator A gross closure in window Ωt is counted in Wt iff, when it occurs, the same target already has an operative outcome-value assignment, and the current closure changes that assignment.

Define the retrospective supersession indicator:

ACS t = { 1 if Bt = 1 and target τt already had an operative value before episode t + and the closure at t changes that operative outcome-value assignment 0 otherwise }

The binary indicator Bt is defined as a gross closure marker, not a net marker. This is necessary because rework must remain visible in the execution trace. A closure can only later be classified as rework if it was first recorded as a gross closure event in the channel. Accordingly, rework classification is applied after gross closure marking. The indicator ACSt is therefore retrospective: it identifies those gross closures that are later superseded within the same bounded window by a later gross closure for the same target fixing a different operative value.

Rework count Define rework operationally as the observable count: W t = t = 1 n ACS e t A current-window gross closure is counted in Wt iff it fixes, for a target that already had an operative realized outcome in Ashby's fixed outcome space, a different operative outcome-value assignment than the one previously in force.

Here Wt is an observable execution-level count. It does not directly reveal the sign of the regulator's internal coupling change, and it should not be identified with net negative learning Lt = max(-ΔtI, 0). A nonzero Wt is evidence that corrective activity occurred, but it does not by itself imply ΔtI < 0. A window may contain observable rework and still end with net positive learning in the regulator's stored coupling.

Including W yields an operational equivocation rate per net outcome under non-retroactive windowing; i.e. the accounting is retroactive within the window, even if it is not retroactive across already-closed windows. W increases the effective number of binary discriminations required to identify the finally accepted outcomes.

What Wt shows is that some previously counted closure-events were not yet final relative to the accepted-outcome ledger. Their correction required additional discrimination before the window's net accepted closures were fixed. Therefore, in the present model, observable rework is not treated as free. It is evidence that the effective number of binary discriminations required to reach the net accepted closures in the window was larger than the gross closure count alone would suggest.

The significance of W is not merely that an outcome changed. Its significance is that an additional corrective selection had to be performed after a prior selection had already fixed the matter operationally. Under the information-selection reading of Ashby's law, the amount of selection that can be performed is limited by the information available. Hence each counted rework episode is evidence of additional selection burden, and therefore of additional informational burden, beyond the initial fixing of response. In the VAR example, the final accepted state “no goal” is not obtained by one selection process but by the accumulated effect of two temporally separated staged selections acting on the same target: the referee's initial fixation and VAR's later corrective overrule. That is why the rework count W must be included in the operational model. Thus W counts those gross closures in the current window that consume bounded selection capacity not to add a net-new accepted fixation, but to revise a target that had already been fixed earlier.

Net closures Then the number of net new accepted closures for the window is: S net = t = 1 N B t ( 1 - ACS t ) = S gross - W .

Where:

  • Bt records everything that closed in the window,
  • ACSt classifies which of those gross closures were later superseded,
  • Sgross counts all gross closures in the window.
  • Wt counts the number of corrective closures,
  • Snet is the net new accepted closures for the window.

In this model, Snet counts net new accepted closures, not merely closures that remain operative at window end. A corrective closure may survive as the currently operative value and still be subtracted from Snet , because it revises a matter that had already been fixed operationally.

Therefore Snet should not be interpreted as the latent episode count mW . It is an observable ledger quantity that enters the measurement model as the count of net-new accepted closures.

Under this convention, observable rework does not merely increase burden while leaving the episode population unchanged; it reduces the net number of surviving episodes and thereby increases the effective average binary selection depth per net accepted closure.

If Snet = 0 , the window may still contain real episodes. It just delivered zero net-new progress. So the metric becomes undefined or divergent as a progress ratio, not as an episode count.

Illustrative example. Consider again the VAR example of a football game. In Episode 1, the referee performs a staged selection process and closes the episode with response x1 to disturbance y1, yielding realized outcome o1 = T(y1, x1). with value v ( o1 ) = goal allowed . Since this closure is the first operative closure for target τ in the current accounting horizon, Window 1 records: Sgross,1 = 1 , W1 = 0 Snet,1 = 1 Now suppose that in Episode 2, VAR reviews the same target τ. The earlier outcome o1 is already operative for that target when the new episode begins. 𝑜 VAR then performs an additional staged selection process, selects response x2, and closes with outcome o2 = T(y2, x2), where v ( o2 ) = no goal v ( o1 ) . This closure is counted as rework because it changes the operative outcome-value assignment of a target that had already been fixed operationally. Therefore Window 2 records: S gross,2 = S gross,1 + 1 = 1 + 1 = 2 , W 2 = 1 Snet,2 = Sgross,2 - W2 = 2 - 1 = 1

Every episode ends with a realized closure. Gross counting records all such completed closures. Net counting records only those closures that remain accepted in the current ledger after target-level overwriting by later episodes. Therefore, the net number of accepted closures Snet is always less than or equal to the number of gross closures Sgross .

Selection-equivalent debit under fixed capacity

Per window t, let:

  • Nt be anchored atomic action capacity in the window (all counted atomic action units in the window: non-closure discriminations plus all gross closure emissions, whether they survive or not).
  • Sgross,t be the number of accepted episode-closing outcomes observed in the window.
  • Wt be the number of those gross closures that are later corrected, overruled, withdrawn, replaced, or otherwise fail to survive as net accepted closures within the same window, with 0 ≤ Wt ≤ Sgross,t.
  • Snet,t be the number of net new accepted closures, defined by: Snet,t = Sgross,t - Wt .
  • Qt be the number of binary selections (non-closure atomic units) in the same window: Qt = Nt - Sgross,t .

Because observable rework outcomes consume the same constrained channel capacity while not contributing to net new closures, we treat them operationally as a selection-equivalent debit. This does not mean that rework outcomes are literally binary questions about X. Rather, it means that under fixed channel capacity it occupies one counted atomic action unit that could otherwise have been used either:

  • to perform one further discriminating selection, or
  • to emit an accepted closure that survives correction.

Since wasted outcomes consume the same constrained channel capacity while not increasing Snet , define the effective non-net-progress selection-equivalent effective debit:

Qeff,t = Qt + Wt

Using Qt = Nt - Sgross,t and Snet,t = Sgross,t - Wt , we obtain:

Qeff,t = ( Nt - Sgross,t ) + Wt = Nt - Snet,t

Thus every observed corrective closure increases the effective debit against the net accepted closures that survive correction within the window.

For each net accepted closure i, let q i denote the counted number of non-closure atomic units effectively attributable to that surviving closure under the single-channel counting model. Then the window-average counted decision depth per net accepted closure is:

q ^ eff = 1 Snet i = 1 Snet q i = Qeff S net = Q + W Sgross - W = N Sgross - W - 1 = N Snet - 1

(3)

This is an exact accounting identity, not an approximation.

Knowledge To Be Discovered Conservative Estimator

Accordingly, define the operational Knowledge To Be Discovered estimand for the window by

H ^ W = 1 Snet i = 1 Snet q i

(4)

Then for a bounded observation window with maximum atomic action capacity N, gross closure count Sgross, and observable rework count W, Equation (3) gives the exact estimator

H ^ W := q ^ eff = N Sgross - W - 1

(5)

Under the counting model, this estimator is exact in the following sense: it is precisely the observed sample mean number of effective non-closure atomic units per net accepted closure in the window.

H ^ W is an operational estimator for the latent window-average Knowledge To Be Discovered H ¯ W in the specific sense that it aims to approximate the latent average decision depth E ¯ W [ q ] which in turn upper-bounds H ¯ W within one bit under optimal binary questioning.

The result is operational: it links an abstract information-theoretic quantity H ¯ W to directly observable execution counts ( N , S , W ) , without assuming access to the true distribution. It provides an operational upper bound on conditional entropy under idealized assumptions (optimal binary selection, single shared channel, no idle capacity). It does not claim to recover the true entropy exactly in arbitrary systems.

Operational interpretation

Under the model's execution semantics, each counted non-closure atomic unit is treated as one binary discriminating selection act relevant to fixing the selected success-relevant response, while each closure unit is treated as a pure closure act containing no hidden uncounted discrimination.

Under those assumptions, H ^ W is exactly the observed average actual binary decision depth per net accepted closure in the execution trace.

This is the first and strongest estimator claim in the model. It does not depend on Shannon's source-coding theorem. It depends only on the operational accounting axioms.

Entropy interpretation under additional axioms

We now introduce the second, stronger reading.

Equation (1) applies to the ideal minimum expected number of binary questions, not directly to arbitrary realized work counts.

Therefore, to read the count-based estimator in Equation (5) as an estimator of latent information-theoretic Knowledge To Be Discovered, we must add the following ideality axioms: (i) each counted non-closure atomic unit contributes exactly one response-relevant binary discrimination, (ii) closure units contain no hidden selection, (iii) he realized search process is effectively ideal, so that actual counted decision depth matches ideal binary question depth at the level of the window average, (iv) the counted population of net accepted closures is aligned with the episode population over which H ¯ W is averaged, and (v) observable rework W records gross closures that do not survive as net accepted closures within the window.

Under these additional axioms, the operational estimator coincides with the ideal minimum expected number of binary questions at the window level, and therefore

H ¯ W H ^ W < H ¯ W +1

In that idealized regime, the count-based estimator is not merely an operational proxy. It is an estimator of the latent window-average Knowledge To Be Discovered up to the standard one-bit ideal-coding gap.

What is exact and what is assumed

The logic should therefore be read in the following order.

  • Equation (3) is exact by accounting identity.
  • Its reading as average actual binary decision depth is exact within the operational model.
  • If those axioms are weakened, then the estimator remains exact for the operational quantity H ^ W , but only an upper-side operational proxy for the latent information-theoretic quantity H ¯ W .

This is the precise sense in which the approximation problem has not disappeared. It has not been removed by algebra. It has been localized: exactness belongs to the operational accounting theorem, while the entropy interpretation enters only through explicit additional axioms.

Conservative meaning

The estimator is conservative in a narrow operational sense.

This quantity q ^ eff is an operational upper bound relative to the no-rework baseline, because Snet = Sgross - W Sgross , and therefore H ^ W = N S net - 1 N S gross - 1

Thus, under fixed channel capacity, observable rework can only weakly increase the observed average effective decision depth per net accepted closure relative to the corresponding no-rework baseline. This is an exact monotonicity property of the operational estimator. It does not by itself prove that the latent conditional entropy increased by the same amount; it proves only that the observed execution burden per surviving accepted closure increased.

Increase in effective question depth due to observable rework

For decomposition purposes, define the corresponding no-observable-rework baseline while holding the gross observed closure count Sgross,t fixed:

H ¯ W base,t := Nt Sgross,t - 1

This baseline is a conditional comparison only. It asks: how much larger is the observed effective question depth when some gross closures fail to survive as net accepted closures within the same window? It is not a universal causal counterfactual about what would have happened in a different world.

Accordingly, define the increase in effective question depth attributable to observable rework, under the fixed-gross comparison, by:

Δ H ^ W , t , := H ¯ W net,t - H ¯ W base,t = Nt ( 1 Snet,t - 1 Sgross,t ) = Nt Wt Sgross,t ( Sgross,t - Wt )

This expression is always nonnegative, and for fixed Nt and Sgross,t, it increases monotonically with Wt. Thus, when more gross closures fail to survive correction, the observed effective question depth per net accepted closure must increase.

The interpretation is operational and information-theoretic:

  • Sgross,t counts all observed closure-events in the window,
  • Wt counts those that do not survive as net accepted closures,
Therefore the surviving accepted outcomes are supported by fewer net closures on the same bounded channel, which increases the average number of binary-discrimination-equivalent action units associated with each surviving accepted closure.

So the role of Wt here is not to represent internal ignorance directly, nor to determine whether learning was positive or negative. Its role is narrower and explicit: observable rework increases the effective question depth revealed by the execution stream, because some previously counted closures did not remain accepted and therefore additional discrimination was effectively required before the window's net accepted closures were fixed.

It should not be read as an identity between:

  • artifact-level observable rework Wt, and
  • internal net coupling loss Lt.

A window with Wt > 0 may still exhibit net positive learning in the internal ledger, and a window with ΔtI < 0 need not expose all of that loss through visible repair outcomes in the same window.

In summary, observable rework affects the estimator H ^ W implicitly through the reduction of Snet,t relative to Sgross,t. The quantity Δ H ^ W , t , isolates exactly that increase, under the fixed-gross comparison, as the increment in effective question depth attributable to observable rework.

Knowledge-Discovery Efficiency (KEDE) Metric

Now we generalize the Knowledge-Discovery Efficiency (KEDE) - scalar metric that quantifies how efficiently a system closes the gap between the variety demanded by its environment and the variety embodied in its prior knowledge[28].

We rearrange the formula (5) and insted of H ^ X | Y we use HX|Y for notation simplicity. and get the formula for Knowledge-Discovery Efficiency (KEDE) metric[28]:

KEDE = 1 1 + H X | Y = S-W N

(6)

KEDE is a scalar metric that quantifies how efficiently a system closes the gap between the variety demanded by its environment and the variety embodied in its prior knowledge[28]. KEDE is an acronym for KnowledgE Discovery Efficiency. It is pronounced [ki:d].

Efficiency means the smaller the average number of selections made per outcome the better. In other words - the less knowledge to be discovered per outcome the more efficient the knowledge discovery process is.

KEDE has the properties:

  • It is a function of the missing information H
  • Its maximum value corresponds to H equals zero i.e. there is no need to make selections, all knowledge is already discovered.
  • Its minimum value corresponds to H equals Infinity i.e. we have no knowledge to start with.
  • It is continuous in the closed interval of [0,1]. This makes it very useful to be used as a percentage. This is because we need to be able to rank knowledge discovery processes by efficiency. The best ranked knowledge discovery process will have 100% and the worst 0%. That is practical and people are used to having such a scale.

What does KEDE measure?

  • Regulation consumes execution capacity to cope with missing knowledge.
  • Knowledge discovery converts that consumed capacity into persistent internal variety, reducing future consumtion of the execution capacity.
  • KEDE measures the efficiency of this conversion: how much execution capacity is spent on discovery versus production, under a successful-adaptation regime.

KEDE effectively converts the knowledge to be discovered H(X|Y), which can range from 0 to infinity, into a bounded scale between 0 and 1.

KEDE is a measure of how much of the required knowledge for completing tasks is covered by the prior knowledge.

Due to its general definition KEDE can be used for comparisons between organizations in different contexts. For instance to compare hospitals with software development companies! That is possible as long as KEDE calculation is defined properly for each context. In what follows we will define KEDE calculation for the case of knowledge workers who produce textual content in general and computer source code in particular.

Anchoring KEDE to Natural Constraints

In our model, N is always the theoretical maximum action rate (selections + outcomes) in an unconstrained environment, and S is the observed outcome rate under specific conditions over a given interval.

A key question is how to assign a natural constraint to N. That is, what constitutes an appropriate reference value for the maximum action rate (selections + outcomes)?

We may turn to physics for an instructive analogy. A quantum (plural: quanta) represents the smallest discrete unit of a physical phenomenon. For instance, a quantum of light is a photon, and a quantum of electricity is an electron. In this context, the speed of light in a vacuum serves as a fundamental upper bound for N. However, identifying an analogous natural constraint for human activity—particularly knowledge work—presents greater challenges.

Consider the example of typing. Here, the quantum can reasonably be defined as a symbol, since it is the smallest discrete unit of text. A symbol may be a letter, number, punctuation mark, or whitespace character. To determine the appropriate bin width Δt, we refer to empirical data on the minimum time required to produce a single symbol. Typing speed has been subject to considerable research. One of the metrics used for analyzing typing speed is inter-key interval (IKI), which is the difference in timestamps between two keypress events. We see that IKI is defined equal to the symbol duration time t. Hence we can use the research of IKI to find the symbol duration time t. Studies have reported an average IKI of 0.238 seconds [26], yielding a maximum human typing rate of approximately N=1/t=1/0,238=4.2 symbols per second

A similar approach can be applied to tasks such as furniture assembly. In this case, a plausible quantum is a single screw tightened, since it represents a minimal, repeatable unit of outcome. We then identify Δt as the average time required to tighten one screw. Empirical studies report that this task typically takes between 5 and 10 seconds[34]. Using the upper bound, we estimate the maximum screw-tightening rate as N=1/t=1/10=0.1 screws per second.

This methodology offers a principled way to estimate N using domain-specific quanta and empirically grounded time durations, enabling the application of our model to a broad range of human tasks.

The next question concerns the appropriate definition of outcome for measuring S and N.

Both N and S can always be discretized—or “binned”—in a way that preserves the total information rate, regardless of whether the outcome arises from natural processes, human behavior, or machines. By choosing a bin width Δt small enough (e.g., milliseconds), the range of possible tangible outcomes within each bin shrinks dramatically. This reduced range leads to less uncertainty in each bin, which compensates for the smaller time interval. Yet the ratio

total outcome in bin Δt

remains an accurate measure of information rate.

As Δt becomes smaller, the measurements of S and N become more precise, as they reflect outcome over finer time intervals. But how small should Δt be? This dilemma is resolved by considering the granularity of outcomes associated with the outcome. The set E of outcomes can be thought of as the effects of the regulation process — the resulting states after the regulator responds to disturbances. In our model E is a sequence of {0,1}, where 0 = wrong outcome(failure to regulate) and 1 = acceptable outcome. So the presence of a concrete outcome leads to a natural binning of the outcomes, It also enables a clear distinction between signal (the entropy associated with producing the outcome) and noise (the residual variability unrelated to success or failure).

For example, two distinct symbols typed (e.g., ‘a' vs. ‘b') are clearly different outcomes. However, if one symbol is typed in 91 milliseconds and another in 92 milliseconds, this minute variation is inconsequential to the outcome. Such timing fluctuations are typically unintentional, irrelevant to task performance, and should not be considered part of the outcome. In practical terms, if the theoretical upper bound N is known—for instance, 4.2 symbols per second as derived from human typing speed, and the observed rate is S=1 symbol per second, then time should be partitioned into one-second bins. Each bin then yields a single outcome: either 1 (a symbol was successfully typed) or 0 (no symbol typed or incorrect input).

This binning principle generalizes beyond typing. Whether analyzing foot strikes in trail running (where negligible spatial change occurs over milliseconds) or the discrete moves in solving a Rubik's cube (where each turn resolves multiple potential states into a single action), binning ensures that no intermediate state need be modeled explicitly.

Physical applicability claim. For any isolated physical system to which a finite entropy bound applies, the number of physically distinguishable states is finite. Therefore the system admits a binary encoding whose length is bounded by the corresponding entropy bound expressed in bits. In holographic settings, this gives an upper bound of N=A4lp2ln2 binary discriminations. Here they use the Planck length l p = ( G c 3 ) 1 / 2 m meters) and its associated surface area, the Planck area l p 2 = G c 3 . Hence the Knowledge To Be Discovered Estimator applies to such physical systems after representing admissible states by a bounded sequence of binary discriminations.

Applications

The knowledge-centric perspective builds on Ashby's Law of Requisite Variety by emphasizing that successful outcomes depend not only on a system's range of possible responses, but also on its ability to select the right response for each disturbance. This requires internal “system knowledge” that maps disturbances to appropriate actions. As Francis Heylighen proposed in his “Law of Requisite Knowledge,” effective regulation demands more than variety—it demands informed selection[29]. This knowledge-centric lens provides a foundation for analyzing how systems—biological, technical, or organizational—achieve control not just through options, but through understanding. The model we present operationalizes this perspective by estimating the informational requirements a system must satisfy to achieve its observed level of regulatory performance.

In what follows, we apply this knowledge-centric perspective to a range of domains, including motor tasks and manual assembly, industrial assembly lines, software development processes, speed of light in a medium, intelligence testing and sports performance. In each case, the model enables us to estimate, in bits of information, the amount of knowledge a system must lack to produce its observed level of performance. By quantifying the knowledge to be discovered H(X|Y), we assess how much uncertainty was there in the system's ability to select appropriate responses. This allows us to compare systems not by tangible outcomes, but by the hidden knowledge structures required to achieve them, offering a unified lens for analyzing adaptation, skill, and control across diverse contexts.

Tightening screws

We can apply our model to motor tasks such as furniture assembly. In this context, a natural unit of outcome — or “quantum” — is the tightening of a single screw.

Skilled workers engaged in manual assembly tasks can typically insert and tighten standard screws at a rate of 6'-12 screws per minute under optimal, repetitive conditions — such as those found in furniture construction or industrial assembly lines. In contrast, automated screw-tightening machines can achieve significantly higher rates, often between 30 and 60 screws per minute [34] More complex manual tasks, such as high-torque applications involving ratchets or Allen keys, typically reduce the rate to 2'-4 screws per minute due to the increased effort and precision required. In surgical or medical contexts, such as orthopedic screw insertion, accuracy and the avoidance of overtightening are paramount; here, rates often fall to 1'-2 screws per minute, or approximately one screw every 30'-60 seconds [46].

Context Typical Rate (screws/minute) Notes
Automated (machine) 30'-60 For comparison, not manual
Fast, repetitive tasks 6'-12 Assembly line, minimal torque required
High-torque/manual 2'-4 Metalwork, ratchets, Allen keys
Surgical/precision 1'-2 Orthopedic, high accuracy, low speed

The key observation is that rates decrease as torque, task complexity, or required precision increases. If we take the machine rate as the maximum possible outcome N and the observed human rate as S, we can estimate the average number of bits of information H(X|Y) that the human operator must process per action.

KEDE=SN H(X|Y)=1KEDE-1=NS-1=6012-1=4 bits/screw

This implies that the human must absorb approximately 4 bits of information, on average, to tighten a single screw under typical conditions.

The rate at which a person tightens screws depends on various factors, including:

  • Screw type and size
  • Material being fastened
  • Required torque
  • Tool used (screwdriver, ratchet, etc.)
  • Operator skill and fatigue
These constitute the disturbance variety D faced by the human operator. The operator, acting as a regulator, responds with selections from their internal repertoire of skills — the regulatory variety R.

This interpretation aligns with existing research, which suggests that task difficulty directly influences the amount of information a task imparts [47, 48]. When difficulty is appropriately matched to the individual's skill level, the task yields maximal informational value [49], and the time required reflects the interaction between task complexity and the individual's regulatory capacity [50].

Using our model, we transform a sequence of real-world actions in furniture assembly into a granular, time-based measure of regulatory capacity. This enables us to quantify — in bits — how much variety the individual must absorb in order to successfully complete the task.

Typing the longest English word

Let's use an example scenario to see Ashby's law applied to human cognition and knowledge work.

For that we'll have myself executing the task of typing on a keyboard the word “Honorificabilitudinitatibus”. It means “the state of being able to achieve honours” and is mentioned by Costard in Act V, Scene I of William Shakespeare's “Love's Labour's Lost”. With its 27 letters “Honorificabilitudinitatibus” is the longest word in the English language featuring only alternating consonants and vowels.

The way I will execute this task is to go to the "play text" or "script" of “Love's Labour's Lost”, look up the word and type it down. The manual part of the task is to type 27 letters. The knowledge part of the task is to know which are those 27 letters.

In order to track the knowledge discovery process I will put "1" for each time interval when I have a letter typed and "0" for each time interval when I don't know what letter to type.

I start by taking a good look at the word “Honorificabilitudinitatibus” in the script of “Love's Labours' Lost”. That takes me two time intervals. Then I type the first letters “H”, “o”, and “n”.I continue typing letter after letter: “o”, “r”. At this point I cannot recall the next letter. What should I do? I am missing information so I go and open up the script of “Love's Labours Lost” and I look up the word again. Now I know what the next letter to type is but acquiring that information took me one time interval. This time I have remembered more letters so I am able to type “i”,”f”,”i”,”c”,”a”,”b”,”i”. Then again I cannot continue because I have forgotten what were the next letters of the word, so I have to look it up again.in the script. That takes two more time intervals. Now I can continue my typing of “l”,”i”,”t”. At this point I stop again because I am not sure what were the next letters to type, so I have to think about it. That takes one time interval. I continue my typing with “u”,”d”,”i”. Then I stop again because I have again forgotten what were the next letters to type, so I have to look it up again in the script of “Love's Labours Lost”. That takes two more time intervals. Now I know what the next letter to type is so I can continue typing “n”,”i”.At this point I cannot recall the next letter. so I have to look it up again in the script. That takes two more time intervals. After I know what the next letter to type is I can continue typing “t”,”a”,”t”,”i”,”b”,”u”,”s”. Eventually I am done!

At the end of the exercise I have the word “Honorificabilitudinitatibus” typed and along with it a sequence of zeros and ones.



H o n o r


i f i c a b i



l i t


u d i



n i



t a t i b u s
0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1

In the table we have separated the manual work of typing from the knowledge work of thinking about what to type.

We made visible both the manual work and the knowledge discovery parts of a Knowledge Discovery process.

The first row of the table shows the knowledge I manually transformed into tangible outcome - in this case the longest English word. The second row of the table shows the way I discovered that knowledge. There is a "0" for each time interval when I was missing information about what to type next. There is "1" for each time interval when I had prior knowledge about what to type next. Each "0" represents a selection I needed to ask in order to acquire the missing information about what letter to type next. Each "1" represents prior knowledge.

We know that there is knowledge applied when we see the tangible outcome of the process. We know there was knowledge discovered when we see there was at least one selection made.

In the exercise above we witnessed the discovery and transformation of invisible knowledge into visible tangible outcome.

KEDE calculation

We can calculate the KEDE for this sequence of outcomes.

KEDE=SN=2737=0.73

We can also calculate the knowledge discovered H(X|Y) in bits of information.

H(X|Y)=NS-1=3727-1=0.37

We've turned a real-world sequence of action and hesitation into a fine-grained, time-based measurement of regulatory capacity — effectively measuring how much variety I needed to absorb with external help i.e. my knowledge discovered.

Measuring software development

In order to use the KEDE formula (6) in practice we need to know both S and N. We can count the actual number of symbols of source code contributed straight from the source code files. For N we want to use some naturally constrained value.

N is the maximum number of symbols that could be contributed for a time interval by a single human being.

In the below formula for N we want to use some naturally constrained value:

To achieve this, the following estimation is performed. We pick T = 8 hours of work because that is the standard length of a work day for a software developer.

To calculate the value of r we need to pick the symbol duration t.

The value of the symbol duration time t is determined by two natural constraints:

  1. the maximum typing speed of human beings
  2. the capacity of the cognitive control of the human brain

Typing speed has been subject to considerable research. One of the metrics used for analyzing typing speed is inter-key interval (IKI), which is the difference in timestamps between two keypress events. We see that IKI is defined equal to the symbol duration time t. Hence we can use the research of IKI to find the symbol duration time t. It was found that the average IKI is 0.238s [26]. There are many factors that affect IKI [6]. It was also found that proficient typing is dependent on the ability to view characters in advance of the one currently being typed. The median IKI was 0.101s for typing with unlimited preview and for typing with 8 characters visible to the right of the to-be-typed character but was 0.446s with only 1 character visible prior to each keystroke [7]. Another well-documented finding is that familiar, meaningful material is typed faster than unfamiliar, nonsense material[8]. Another finding that may account for some of the IKI variability is what may be called the “word initiation effect”. If words are stored in memory as integral units, one may expect the latency of the first keystroke in the word to reflect the time required to retrieve the word from memory[55].

Cognitive control, also known as executive function, is a higher-level cognitive process that involves the ability to control and manage other cognitive processes that permit selection and prioritization of information processing in different cognitive domains to reach the capacity-limited conscious mind. Cognitive control coordinates thoughts and actions under uncertainty. It's like the "conductor" of the cognitive processes, orchestrating and managing how they work together. Information theory has been applied to cognitive control by studying the capacity of cognitive control in terms of the amount of information that can be processed or manipulated at any given time. Researchers found that the capacity of cognitive control is approximately 3 to 4 bits per second[32][33], That means cognitive control as a higher-level function has a remarkably low capacity.

Based on the above research we get:

  1. Maximum typing speed of human beings to be r=1/t=1/0,238=4.2 symbols per second
  2. Capacity of the cognitive control of the human brain to be approximately 3 to 4 bits per second. Since we assume one question equals one bit of information we get 3 to 4 questions per second.
  3. Asking questions is an effortful task and humans cannot type at the same time. If there was a symbol NOT typed then there was a question asked. That means the question rate equals the symbol rate, as explained here.
Since the question rate needs to equal the symbol rate we consider that 4.2 symbols per second is a rate higher than 3 to 4 bits per second. We need to get a symbol rate between 3 and 4 symbols per second.

In order to get a round value of maximum symbol rate N of 100 000 symbols per 8 hours of work we pick symbol duration time t to be 0.288 seconds. That is a bit larger than what the IKI research found but makes sense when we think of 8 hours of typing. Having t of 0.288 seconds makes a symbol rate r of 3.47 symbols per second. That is between 3 and 4 and matches the capacity of the cognitive control of the human brain.

We define CPH as the maximum rate of characters that could be contributed per hour. Since r is 3.47 symbols per second we get CPH of 12 500 symbols per hour. We substitute T = h and r=CPH and the formula for N becomes:

where h is the number of working hours in a day and CPH is the maximum number of characters that could be contributed per hour. We define h to be eight hours and get N to be 100 000 symbols per eight hours of work.

Total working time consist of four components:

  • Time spent typing (coding)
  • Time spent figuring out WHAT to develop
  • Time spent figuring out HOW to code the WHAT
  • Time doing something else (NW)

Let us assume an ideal system where the time spent doing something else TNW is zero. Using the new formula for N the formula for H becomes

Note, that since N is calculated per hour so S also needs to be counted in an hour.

We see that the more symbols of source code contributed during a time interval the less missing information was there to be acquired. We want to compare the performance of different software development processes in terms of the efficiency of their knowledge discovery processes. Hence we rearrange the formula to emphasize that.

Sh×CPH=11+H

(7)

The right hand part is the KEDE we defined earlier. Thus, we define an instance of the metric KEDE - the general metric that we introduced earlier. This version of KEDE is for the case of knowledge workers that produce tangible outcome in the form of textual content:

KEDE=Sh×CPH

(8)

KEDE from (8) contains only quantities we can measure in practice. KEDE also satisfies all properties we defined earlier. it has a maximum value of 1 and minimum value of 0; it equals 0 when H is infinite; it equals 1 when H is zero; it is anchored on a natural constraint—the maximum typing speed of a human being.

If we convert the KEDE formula into percentages then it becomes:

KEDE=Sh×CPH×100%

(9)

We can use KEDE to compare the knowledge discovery efficiency of software development organizations.

Testing Intelligence

Today all measure intelligence by the power of appropriate selection (of the right answers from the wrong). The tests thus use the same operation as is used in the theorem on requisite variety, and must therefore be subject to the same limitation. (D, of course, is here the set of possible questions, and R is the set of all possible answers). Thus what we understand as a man's “intelligence” is subject to the fundamental limitation: it cannot exceed his capacity as a transducer. (To be exact, “capacity” must here be defined on a per-second or a per-question basis, according to the type of test.)[3]

We can also use our model to the testing of human and AI intelligence. We infer this capacity from performance under variety — i.e., how many different problems a system or a person can solve correctly.

The dominant mathematical models for testing intelligence by the number of answered problems are benchmark datasets like MMLU, GSM8K, MATH, and FrontierMath. These models measure intelligence by the raw count or percentage of correctly solved problems, with more advanced benchmarks designed to minimize guessing and require deep reasoning.

From the knowledge-centric perspective:

  • The disturbances are the questions Q={q1,q2,q3,...qn}
  • The person gives responses R={r1,r2,r3,...rn}
  • The outcomes are E={e1,e2,e3,...en} {0,1}
So: Intelligence is the capacity to consistently produce 1s in E, despite the variety in D.

Several mathematical models and benchmark datasets are used to evaluate intelligence—especially artificial intelligence (AI)—by measuring the number and complexity of math problems answered correctly. These models serve as standardized tests for both AI and, by analogy, human intelligence[52].

Massive Multitask Language Understanding (MMLU):

  • MMLU is a widely used benchmark that tests AI models on a broad range of subjects, including mathematics at various levels (high school, college, abstract algebra, formal logic).
  • The test is typically formatted as multiple-choice questions, and performance is measured by the percentage of correct answers out of the total number of questions
  • For example, advanced AI models have achieved up to 98% accuracy on math sections of MMLU, indicating high proficiency in standard math tasks but not necessarily deep reasoning

Grade School Math 8K (GSM8K)

  • GSM8K is a dataset of 8,500 high-quality, grade school-level word problems designed to test logical reasoning and basic arithmetic skills.
  • Evaluation is based on exact match accuracy: the number of problems answered exactly correctly divided by the total number attempted
  • This benchmark is used to assess step-by-step reasoning and the ability to handle linguistic diversity in problem statements.

MATH (Mathematics Competitions Dataset)

  • MATH consists of problems from high-level math competitions (e.g., AMC 10, AMC 12, AIME), focusing on advanced reasoning rather than rote computation.
  • Performance is measured by the percentage of correct answers, with human experts (e.g., IMO medalists) providing a reference for top-level performance
  • The dataset is challenging for both humans and AI, with LLMs typically scoring much lower than expert humans.

FrontierMath[53]

  • FrontierMath is a new benchmark featuring hundreds of original, expert-level math problems spanning major branches of modern mathematics.
  • Problems are designed to be "guessproof" and require genuine mathematical understanding, with automatic verification of answers
  • The benchmark is used to assess how well AI models can understand and solve complex mathematical problems, similar to human performance.

In human intelligence testing, Psychometric models such as IQ tests or psychometric approaches also use the number of correctly answered problems as a key metric. These tests are standardized, and the raw score (number of correct answers) is often converted into a scaled score or percentile.

As an example we will use the Exact Match metric as the evaluation method[52]. Given that each question in our benchmark dataset has a single correct answer and the model produces a response per query, Exact Match ensures a rigorous evaluation by comparing the extracted answer to the ground truth.

Let ŷi represent the extracted answer from the model's outcome for the ith question, and let yi be the corresponding ground truth answer. The Exact Match accuracy is computed as:

Exact Match (%) = i=1 N 𝟙 ( normalize ( ŷi ) = normalize ( yi ) ) N x 100

where:

  • N is the total number of evaluated questions.
  • 𝟙() is the indicator function, returning 1 if the extracted model response matches the ground truth after preprocessing, and 0 otherwise.
  • normalize() is a function that standardizes formatting, trims spaces, and normalizes numerical values.

The knowledge discovery efficiency of an LLM can be calculated as:

Exact Match accuracy=SN=KEDE
where S is the number of correct answers and N is the total number of evaluated questions..

Let's pick the case of the performance of GPT-4o on the MATH benchmark, which achieved a significantly lower accuracy of 64.88%, lagging behind its peer models[52]. Now, we can calculate the average knowledge discovered H(X|Y).

H(X|Y)=1KEDE-1=10064.88-1=1.54 bits/problem

Basketball Game

We can also use this model to assess the performance of a basketball player.

  • Timeframe is a basketball game.
  • We observe N total shot attempts.
  • S of them are successful (shot made).
  • We record a binary outcome sequence
    E{0,1}N
  • The empirical success rate:
    θ=SN
    is our observed probability of success.

Interpretation using Ashby's Law

The basketball shot is a regulation problem: the player must control their body and respond to the game environment to produce the desired outcome. The player is faced with a series of disturbances (D) in the form of different shots to make under different conditions. The player responds with a selection, drawn from their internal skills (regulatory variety R) in the form of different shooting techniques. Each shot is uncertain whether it will be successful. The outcome E is whether the shot is made (1) or missed (0).

Over N shots, the success rate

θ=SN
reflects how often the player's internal variety is sufficient to absorb the variety in the environment — an operational measure of regulatory success.

In this case, θ becomes a practical proxy for how often the regulator (player) has sufficient internal variety to absorb the disturbance presented by the game. However, it is important to note that this is a simplified model and does not account for all the complexities of basketball performance. For example, the player may have different success rates depending on the type of shot, the position on the court, or the level of defense. These factors can all affect the player's ability to regulate their performance and should be considered when interpreting the results. Thus, as explained here θ is a useful heuristics for P(E=1), but the full picture includes the quality of mapping, not just quantity.

Applying the Model

NBA keeps track of field goal attempts and makes for each player. The most field goal attempts by a player in a single NBA game is 63, achieved by Wilt Chamberlain during his legendary 100-point game against the New York Knicks on March 2, 1962 We take this as the natural constraint so N=63. We can also take the number of successful shots S=36, which is the most field goals made in a single game by a player[13].

We can calculate the KEDE for this sequence of outcomes.

KEDE=SN=3663=0.571

We can also calculate the knowledge discovered H(X|Y) in bits of information.

H(X|Y)=NS-1=6336-1=0.75

That means that the player needed to absorb 0.75 bits of information on average to make the shot.

We've turned a real-world sequence of basketball shots into a fine-grained, time-based measurement of a regulatory capacity — effectively measuring how much variety the player needed to absorb.

We can also use this model to assess the performance of a basketball team. In this case the success rate coincides with the field goal percentage (FG%) of the team which is the percentage proportion of made shots over total shots that a player or a team takes in games. There is a statistical distribution for NBA field goal percentage (FG%) [10]. Analysts and researchers often study the distribution of FG% across players or teams to understand scoring efficiency and trends[11]. The NBA record for the highest FG% in a single game by a team is 69.3%, set by the Los Angeles Clippers on March 13, 1998, when they made 61 of 88 shots[12].

For example, in the 2023-24 season, team FG% ranged from about 43.5% (lowest) to 50.6% (highest), with the league average typically falling in the mid-to-high 40% range[11]. if we take the average FG% of 45% , we can calculate the average knowledge discovered H(X|Y).

H(X|Y)=1KEDE-1=1FG%-1=10.45-1=1.22

That means that a team needed to absorb 1.22 bits of information on average to make a shot.

Assembly Line

We can also use this model to assess the knowledge discovery efficiency of an assembly line.

The assembly line is a system that transforms raw materials into finished products. The assembly line has a set of disturbances (D) in the form of different raw materials, machines, and processes. The assembly line responds with a selection, drawn from its internal structure (R) in the form of different machines, processes, and workers.

From a knowledge-sentric perspective most of the knpwledge discovery happens in the design phase of the assembly line. This is the planning for design, fabrication and assembly. This activity has also been called design for manufacturing and assembly (DFM/A) or sometimes predictive engineering. It is essentially the selection of design features and options that promote cost-competitive manufacturing, assembly, and test practices[51]. Thus most of the disturbances D are already absorbed by the design of the assembly line. That means when the workers have most of the knowledge built into the assembly line and the operational procedures.

Assembly line efficiency (AE) is the ratio of the outcome to the maximum possible outcome, often expressed as a percentage.

The efficiency of the assembly line can be calculated as:

AE=SN=KEDE
where S is the actual outcome and N is the maximum possible outcome.

We can assume that an assembly line is designed to produce a certain number of successful products (S) with a maximum rate of N products per hour. So for example, a shoe manufacturer has an actual outcome of 100 shoes per day, and a maximum potential outcome of 120 shoes per day. Their production line efficiency would be 83%. Now, we can calculate the average knowledge discovered H(X|Y).

H(X|Y)=1KEDE-1=1AE-1=10083-1=0.2 bits/shoe

To optimize the AE, companies can apply DFA guidelines, such as minimizing the number and variety of parts, standardizing the fasteners and connectors, and simplifying the assembly sequence and orientation[51].

Interpreting the results involves a comprehensive analysis of the data to understand where and why inefficiencies occur. In general, the higher the AE, the better the design. On the other hand, AE close to 100% might indicate under-utilised capacity. It's essential to compare high efficiency with industry capacity standards to determine if an increase in production is feasible and beneficial.

If AE is consistently below industry benchmarks, this could highlight several potential issues:

  • Machinery: It may indicate that machines are outdated, malfunctioning, or not suitable for the required tasks.
  • Labour Skills: Low efficiency might be due to workforce training gaps.
  • Process Design: Sometimes, the workflow or layout of the production line itself causes inefficiencies.

Speed of Light in Medium

We can also use this model to support an interpretation of Ashby's Law of Requisite Variety to assess the speed of light in a medium where the medium acts as a disturbance to photon flow. Here's how this perspective aligns with the physics of light-matter interactions:

  • Disturbance: The medium's atomic/molecular structure introduces spatial and electromagnetic inhomogeneities (e.g., refractive index variations, turbulence).
  • Control Mechanism: Photons' ability to "counteract" disturbances through wavelength compression and phase synchronization.
  • Requisite Variety: Photons require sufficient adaptability (e.g., frequency range, polarization states) to navigate the medium's complexity without scattering or losing coherence.

The speed of light in a vacuum is 299,792,458 m/s. In a medium, the speed of light is reduced by a factor n, called the refractive index defined as:

n=cv
where c is the speed of light in vacuum and v is the speed of light in the medium.

The refractive index is a measure of how much the speed of light is reduced in the medium. The higher the refractive index, the more the speed of light is reduced.

For example, the refractive index of water is 1.33, which means that the speed of light in water is:

v=299 792 4581.33225 000 000 m/s

The knowledge discovery efficiency of the speed of light in a medium can be calculated as:

KEDE=vc=1n
where v is the actual speed of light in the medium and c is the maximum possible speed of light in vacuum.

Now, we can calculate the average knowledge discovered H(X|Y) by a photon in water:

H(X|Y)=1KEDE-1=11n-1=n-1=1.33-1=0.33 bits/photon

Appendix

What learning could also do (but we are explicitly excluding)

Not every form of learning improves regulation H(E|Y) in Ashby's sense. Other possibilities include:

  1. Expanding action variety without selectivity

    Learning might increase H ( X ) (more possible actions, tools, behaviors) without reducing H ( X | Y ) .

    • The system becomes more capable in principle
    • But still does not know which action to take
    • Regulation does not improve

    This violates Ashby's requirement that variety must be constrained, not merely expanded.

  2. Improving buffering instead of knowledge

    Learning might increase buffering capacity q (delay, slack, tolerance), so disturbances are absorbed without better action selection.

    • Outcomes may improve
    • But I ( X : Y ) does not increase
    • Regulation improves without learning the mapping

    This is explicitly separated from knowledge in Ashby's extended formulation.

  3. Changing goals or success criteria

    Learning could redefine what counts as success E .

    • Apparent performance improves
    • But the structural coupling (mapping) is unchanged
    • Information-theoretically, nothing about H ( X | Y ) need change

    This is semantic drift, not cybernetic learning.

  4. One-off adaptation without structural retention

    The system may succeed through exploration Z without storing the result.

    • Regulation succeeds this time
    • Next encounter repeats the same uncertainty
    • No accumulation of I ( X : Y )

    This is regulation, not learning.

Cumulative Knowledge To Be Discovered

Using

H ( S ) = N S - 1
from (5) with constant N, the cumulative residual variety (C) as a function of performance level (S) has a clean closed form.

Cumulative w.r.t. S
Choose a baseline S0 > 0.
Define:

C ( S ; S0 ) = S0 S H ( u ) d u = S0 S ( Nu - 1 ) d u = N ln SS0 - ( S - S0 ) .

Key properties:

  • dC dS = H ( S ) = NS - 1
  • d2C d2S = - NS2 < 0 C is concave in S.

Domain: S ∈ (0, N]. Since H(S) > 0 for S<N, C(S;S0) increases with S (for SS0) and is finite as long as S0>0.

Useful normalizations:
Dimensionless form with S^=SN: C(S^;S^0)N=lnS^S^0-(S^-S^0).

Total cumulative up to completion S = N: C ( N ; S0 ) = N ln N S0 - ( N - S0 ) .

This can be thought of as the total knowledge-effort curve or “cumulative residual variety as a function of performance level” i.e. how much “knowledge work” has been consumed to reach performance level S.

Fig.2 Total Knowledge-Effort Curve Here we see the cumulative residual variety as a function of performance level.

  • Blue curve: instantaneous 𝐻(𝑆)=𝑁/𝑆−1H(S)=N/S−1 (residual variety ratio).
  • Green dashed curve: cumulative residual variety C(S) as we accumulate uncertainty over growing performance level S.
Each point on the curve says “At performance level S, there are H(S) bits of uncertainty to be eliminated for perfect regulation.” We can see how H(S) declines hyperbolically, while C(S) rises concavely.

How to cite:

Bakardzhiev D.V. (2025) Knowledge Discovery Efficiency (KEDE) and Ashby's Law https://docs.kedehub.io/knowledge-centric-research/kede-ashbys-law.html

Works Cited

1. Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal. 1948;27(3):379-423. doi:10.1002/j.1538-7305.1948.tb01338.x

2. Ashby, W.R. (1956). An Introduction to Cybernetics; Chapman & Hall,

3. Ashby, W. R. (2011). Variety, Constraint, And The Law Of Requisite Variety. 13, 18.

4. MacKay, D. M. (1950). Quanta! aspects of scientific information. Philosophical Magazine; 41, 289-311;

5. Cover, T. M. and Thomas, J. A. (1991), Elements of Information Theory, John Wiley and Sons, New York. page.95 in 5.7 SOME COMMENTS ON HUFFMAN CODES

6. Wheeler, J. A. (1990). Information, physics, quantum: The search for links. In W. H. Zurek (Ed.), Complexity, entropy, and the physics of information (Vol. 8, pp. 3'-28). Taylor & Francis.

7. Yaneer Bar-Yam.(2004) Multiscale variety in complex systems. Complexity, 9(4):37{45,

8. Ashby, W.R. (1991). Requisite Variety and Its Implications for the Control of Complex Systems. In: Facets of Systems Science. International Federation for Systems Research International Series on Systems Science and Engineering, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-0718-9_28

9. Shannon, C. E. Communication theory of secrecy systems. Bell System technical Journal, 28, 656-715, 1949

10. Kubatko J, Oliver D, Pelton K, et al. A starting point for analyzing basketball statistics. J Quant Anal Sports 2007; 3: 1'-22.

11. Sports Reference LLC. "NBA League Averages." Basketball-Reference.com - Basketball Statistics and History. https://www.basketball-reference.com/leagues/NBA_stats_totals.html.

12. Bucks post highest single-game field-goal percentage by any team in 21st century https://sports.yahoo.com/article/bucks-post-highest-single-game-040313061.html

13. https://www.statmuse.com/nba/ask/most-field-goals-made-record-in-a-game-nba-player

14. Lewis, G. J., & Stewart, N. (2003). The measurement of environmental performance: an application of Ashby's law. Systems Research and Behavioral Science, 20(1), 31'-52. https://doi.org/10.1002/sres.524

15. Norman, J., & Bar-Yam, Y. (2018). Special Operations Forces: A Global Immune System? In Springer Unifying Themes in Complex Systems IX (pp. 486'-498). Springer International Publishing. https://doi.org/10.1007/978-3-319-96661-8_50

16. Norman, J., & Bar-Yam, Y. (2019). Special Operations Forces as a Global Immune System. In Springer Evolution, Development and Complexity (pp. 367'-379). Springer International Publishing. https://doi.org/10.1007/978-3-030-00075-2_16

17. O'Grady, W., Morlidge, S., & Rouse, P. (2014). Management Control Systems: A Variety Engineering Perspective. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2351099

18. Love, T., & Cooper, T. (2007). Digital Eco-systems Pre-Design: Variety Analyses, System Viability and Tacit System Control Mechanisms. 2007 Inaugural IEEE-IES Digital EcoSystems and Technologies Conference, 452'-457. https://doi.org/10.1109/dest.2007.372013

19. Love, T., & Cooper, T. (2007). Complex built‐environment design: four extensions to Ashby. Kybernetes, 36(9/10), 1422'-1435. https://doi.org/10.1108/03684920710827391

20. Bushey, D. B., & Nissen, M. E. (1999). A Systematic Approach to Prioritizing Weapon System Requirements and Military Operations Through Requisite Variety. Defense Technical Information Center. https://doi.org/10.21236/ada371943

21. Jones, H. P. (2018). Evolutionary stakeholder discovery: requisite system sampling for co-creation.

22. Grimm, D. A. P., Gorman, J. C., Robinson, E., & Winner, J. (2022). Measuring Adaptive Team Coordination in an Enroute Care Training Scenario. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 66(1), 50'-54. https://doi.org/10.1177/1071181322661074

23. Becker Bertoni, V., Abreu Saurin, T., & Sanson Fogliatto, F. (2022). Law of requisite variety in practice: Assessing the match between risk and actors' contribution to resilient performance. Safety Science, 155, 105895. https://doi.org/10.1016/j.ssci.2022.105895

24. Tworek, K., Walecka-Jankowska, K., & Zgrzywa-Ziemak, A. (2019). Towards organisational simplexity — a simple structure in a complex environment. Engineering Management in Production and Services, 11(4), 43'-53. https://doi.org/10.2478/emj-2019-0032

25. Chester, M. V., & Allenby, B. (2022). Infrastructure autopoiesis: requisite variety to engage complexity. Environmental Research: Infrastructure and Sustainability, 2(1), 012001. https://doi.org/10.1088/2634-4505/ac4b48

26. van der Hoek, M., Beerkens, M., & Groeneveld, S. (2021). Matching leadership to circumstances? A vignette study of leadership behavior adaptation in an ambiguous context. International Public Management Journal, 24(3), 394'-417. https://doi.org/10.1080/10967494.2021.1887017

27. Ulrik, S., & Isabella, A. (2023). Variety versus speed: how variety in competence within teams may affect performance in a dynamic decision-making task.

28. Bakardzhiev, D., Vitanov, N.K. (2025). KEDE (KnowledgE Discovery Efficiency): A Measure for Quantification of the Productivity of Knowledge Workers. In: Georgiev, I., Kostadinov, H., Lilkova, E. (eds) Advanced Computing in Industrial Mathematics. BGSIAM 2022. Studies in Computational Intelligence, vol 641. Springer, Cham. https://doi.org/10.1007/978-3-031-76786-9_3

29. Heylighen, F., & Joslyn, C. (2001). Cybernetics and Second Order Cybernetics. In R. A. Meyers (Ed.), Encyclopedia of Physical Science and Technology, Eighteen-Volume Set, Third Edition (pp. 155-170). Academia Press. http://pespmc1.vub.ac.be/Papers/Cybernetics-EPST.pdf

30. Schwaninger, M., & Ott, S. (2024). What is variety engineering and why do we need it? Systems Research and Behavioral Science, 41(2), 235'-246. https://doi.org/10.1002/sres.2964

31. AULIN‐AHMAVAARA, A.Y. (1979), "THE LAW OF REQUISITE HIERARCHY", Kybernetes, Vol. 8 No. 4, pp. 259-266. https://doi.org/10.1108/eb005528

32. Wu, T., Dufford, A. J., Mackie, M. A., Egan, L. J., & Fan, J. (2016). The Capacity of Cognitive Control Estimated from a Perceptual Decision Making Task. Scientific Reports, 6, 34025.

33. Abuhamdeh S (2020) Investigating the “Flow” Experience: Key Conceptual and OperationalIssues. Front. Psychol. 11:158.doi: 10.3389/fpsyg.2020.00158

34. Automatic Screw Tightening Machine and Its Hidden Features

35. Keating, C. B., Katina, P. F., Jaradat, R., Bradley, J. M., & Hodge, R. (2019). Framework for improving complex system performance. INCOSE International Symposium, 29(1), 1218-1232. https://doi.org/10.1002/j.2334-5837.2019.00664.x

36. S. Engell (1985). An information-theoretical approach to regulation.

37. K. Kijima, Y. Takahara, B. Nakano (1986). ALGEBRAIC FORMULATION OF RELATIONSHIP BETWEEN A GOAL SEEKING SYSTEM AND ITS ENVIRONMENT.

38. W. Kickert, J. Bertrand, J. Praagman (1978). Some Comments on Cybernetics and Control. IEEE Transactions on Systems, Man and Cybernetics.

39. S. Engell (1985). Information-theoretical bounds for regulation accuracy. IEEE Conference on Decision and Control.

40. Hui Zhang, Youxian Sun (2003). Bode integrals and laws of variety in linear control systems. Proceedings of the 2003 American Control Conference, 2003.

41. R. Conant (1969). The Information Transfer Required in Regulatory Processes. IEEE Transactions on Systems Science and Cybernetics.

42. S. Engell (1987). Analysis of Regulation Problems based on Real-Time Rate-Distortion Theory. American Control Conference.

43. Hui Zhang, Youxian Sun (2003). Information theoretic limit and bound of disturbance rejection in LTI systems: Shannon entropy and H/sub /spl infin// entropy. SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

44. N. C. Martins, M. Dahleh (2008). Feedback Control in the Presence of Noisy Channels: “Bode-Like” Fundamental Limitations of Performance. IEEE Transactions on Automatic Control.

45. Hui Zhang, Youxian Sun (2003). H/sub /spl infin// entropy and the law of requisite variety. 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

46. Tsuji, M., Crookshank, M., Olsen, M., Schemitsch, E. H., & Zdero, R. (2013). The biomechanical effect of artificial and human bone density on stopping and stripping torque during screw insertion. Journal of the mechanical behavior of biomedical materials, 22, 146'-156. https://doi.org/10.1016/j.jmbbm.2013.03.006

47. Akizuki, K., & Ohashi, Y. (2015). Measurement of functional task difficulty during motor learning: What level of difficulty corresponds to the optimal challenge point?. Human movement science, 43, 107'-117. https://doi.org/10.1016/j.humov.2015.07.007

48. Bootsma, J. M., Hortobágyi, T., Rothwell, J. C., & Caljouw, S. R. (2018). The Role of Task Difficulty in Learning a Visuomotor Skill. Medicine and science in sports and exercise, 50(9), 1842'-1849. https://doi.org/10.1249/MSS.0000000000001635

49. Akizuki, K., & Ohashi, Y. (2013). Changes in practice schedule and functional task difficulty: a study using the probe reaction time technique. Journal of physical therapy science, 25(7), 827'-831. https://doi.org/10.1589/jpts.25.827

50. Goldhammer, F.; Naumann, J.; Stelter, A.; Tóth, K.; Rölke, H.; Klieme, E.: The time on task effect in reading and problem solving is moderated by task difficulty and skill. Insights from a computer-based large-scale assessment - In: The Journal of educational psychology 106 (2014) 3, S. 608-626 - URN: urn:nbn:de:0111-pedocs-179679 - DOI: 10.25656/01:17967; 10.1037/a0034716

51. Boothroyd, G., and P. Dewhurst, "DESIGN FOR ASSEMBLY", Dept. of Mechanical Engineering, University of Massachusetts, Amherst, Massachusetts, 1983.

52. Jahin, A., Zidan, A. H., Bao, Y., Liang, S., Liu, T., & Zhang, W. (2025). Unveiling the mathematical reasoning in deepseek models: A comparative study of large language models. arXiv preprint arXiv:2503.10573.

53. FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

54. Francis Heylighen, Cybernetic Principles of Aging and Rejuvenation: The Buffering- Challenging Strategy for Life Extension, Current Aging Science; Volume 7, Issue 1, Year 2014, . DOI: 10.2174/1874609807666140521095925

55. Ostry, D. J. (1980). Execution-time movement control. In G. E. Stelmach & J. Requin (Eds.), Tutorials in motor behavior (pp. 457-468). Amsterdam: North-Holland.

56. Siegenfeld, A. F., & Bar-Yam, Y. (2025). A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety. Entropy, 27(8), 835. https://doi.org/10.3390/e27080835

57. Umpleby, S. A. (2009). Ross Ashby's general theory of adaptive systems. International Journal of General Systems, 38(2), 231–238. https://doi.org/10.1080/03081070802601509

Getting started