Association measure (entropy)

Uncertainty Coefficient Calculator — Theil's U

Theil's U (the uncertainty coefficient) is an entropy-based measure of how much knowing one categorical variable reduces uncertainty about another. It ranges from 0 (no association) to 1 (one variable completely determines the other).

Reviewed by the crosstabs.com methods team · Last updated

Run this on your own data — free, no signup

Upload a CSV or XLSX. Everything runs in your browser; your file never leaves your device.

Open the workspace →

What is the uncertainty coefficient?

The uncertainty coefficient comes from information theory. Each categorical variable has an entropy — a measure of how uncertain you are about its value before you look. When two variables are associated, learning one of them shrinks the uncertainty about the other; that shared information is the mutual information between them.

Theil's U expresses that reduction as a proportion: it divides the mutual information by the entropy of the variable you are predicting. The result is a clean 0-to-1 score that reads like “the fraction of uncertainty removed.” Unlike chi-square-based measures, it is directional, so U(X|Y) and U(Y|X) can differ.

Formula

Definition

U(X|Y) = ( H(X) − H(X|Y) ) / H(X) = I(X;Y) / H(X)

H(X)
= the entropy of X (uncertainty about X on its own)
H(X|Y)
= the conditional entropy of X given Y
I(X;Y)
= the mutual information shared by X and Y
symmetric U
= averages both directions, U(X|Y) and U(Y|X), into one figure

Worked example

Worked example

Take this 3×3 contingency table of two nominal variables:

[[20, 10, 5], [10, 20, 10], [5, 10, 20]]

Compute the entropy of each variable and their conditional entropies, combine them into the mutual information, then divide by the relevant entropy. Averaging both directions gives the symmetric uncertainty coefficient:

U = 0.101 (≈ 0.10).

In plain terms, knowing one of these variables reduces uncertainty about the other by about 10% — a modest association.

When to use it

Use it when

  • Both variables are nominal (unordered categories).
  • You want an information-theoretic, prediction-style measure of association.
  • You need an asymmetric, directional measure — how much one variable tells you about the other.

Not the right tool when

  • The categories have a meaningful order — use an ordinal measure like gamma, Kendall's tau, or Somers' d instead.
  • You specifically want a chi-square-based effect size — use Cramér's V.

How to interpret it

Rule of thumb

0 means the variables are independent; 1 means one fully determines the other. For example, U = 0.10 means about a 10% reduction in uncertainty about one variable once you know the other.

Frequently asked questions

Theil's U vs Cramér's V?
Both measure association between nominal variables on a 0-to-1 scale, but from different foundations. Theil's U is information-theoretic — it reports the fraction of entropy (uncertainty) removed and can be made directional. Cramér's V is chi-square-based and always symmetric. Use Theil's U for a prediction-style interpretation, Cramér's V when you want an effect size tied to a chi-square test.
Is the uncertainty coefficient symmetric?
Not by default. The basic uncertainty coefficient is asymmetric: U(X|Y), the share of X's uncertainty explained by Y, generally differs from U(Y|X). A symmetric version exists that averages the two directions into a single figure when you do not need a direction.
What does an uncertainty coefficient of 0.1 mean?
An uncertainty coefficient of about 0.1 means knowing one variable reduces uncertainty about the other by roughly 10% — a weak-to-modest association. It is far from 0 (independence) but well short of 1 (one variable fully determining the other).

References & further reading

Try it on your own data — free, no signup

Upload a CSV or XLSX. Everything runs in your browser; your file never leaves your device.

Open the workspace →

Related calculators

← All calculators & guides