Association measure (entropy)
Uncertainty Coefficient Calculator — Theil's U
Theil's U (the uncertainty coefficient) is an entropy-based measure of how much knowing one categorical variable reduces uncertainty about another. It ranges from 0 (no association) to 1 (one variable completely determines the other).
Reviewed by the crosstabs.com methods team · Last updated
Run this on your own data — free, no signup
Upload a CSV or XLSX. Everything runs in your browser; your file never leaves your device.
Open the workspace →What is the uncertainty coefficient?
The uncertainty coefficient comes from information theory. Each categorical variable has an entropy — a measure of how uncertain you are about its value before you look. When two variables are associated, learning one of them shrinks the uncertainty about the other; that shared information is the mutual information between them.
Theil's U expresses that reduction as a proportion: it divides the mutual information by the entropy of the variable you are predicting. The result is a clean 0-to-1 score that reads like “the fraction of uncertainty removed.” Unlike chi-square-based measures, it is directional, so U(X|Y) and U(Y|X) can differ.
Formula
Definition
U(X|Y) = ( H(X) − H(X|Y) ) / H(X) = I(X;Y) / H(X)
- H(X)
- = the entropy of X (uncertainty about X on its own)
- H(X|Y)
- = the conditional entropy of X given Y
- I(X;Y)
- = the mutual information shared by X and Y
- symmetric U
- = averages both directions, U(X|Y) and U(Y|X), into one figure
Worked example
Worked example
Take this 3×3 contingency table of two nominal variables:
[[20, 10, 5], [10, 20, 10], [5, 10, 20]]
Compute the entropy of each variable and their conditional entropies, combine them into the mutual information, then divide by the relevant entropy. Averaging both directions gives the symmetric uncertainty coefficient:
U = 0.101 (≈ 0.10).
In plain terms, knowing one of these variables reduces uncertainty about the other by about 10% — a modest association.
When to use it
Use it when
- Both variables are nominal (unordered categories).
- You want an information-theoretic, prediction-style measure of association.
- You need an asymmetric, directional measure — how much one variable tells you about the other.
Not the right tool when
- The categories have a meaningful order — use an ordinal measure like gamma, Kendall's tau, or Somers' d instead.
- You specifically want a chi-square-based effect size — use Cramér's V.
How to interpret it
Rule of thumb
0 means the variables are independent; 1 means one fully determines the other. For example, U = 0.10 means about a 10% reduction in uncertainty about one variable once you know the other.
Frequently asked questions
- Theil's U vs Cramér's V?
- Both measure association between nominal variables on a 0-to-1 scale, but from different foundations. Theil's U is information-theoretic — it reports the fraction of entropy (uncertainty) removed and can be made directional. Cramér's V is chi-square-based and always symmetric. Use Theil's U for a prediction-style interpretation, Cramér's V when you want an effect size tied to a chi-square test.
- Is the uncertainty coefficient symmetric?
- Not by default. The basic uncertainty coefficient is asymmetric: U(X|Y), the share of X's uncertainty explained by Y, generally differs from U(Y|X). A symmetric version exists that averages the two directions into a single figure when you do not need a direction.
- What does an uncertainty coefficient of 0.1 mean?
- An uncertainty coefficient of about 0.1 means knowing one variable reduces uncertainty about the other by roughly 10% — a weak-to-modest association. It is far from 0 (independence) but well short of 1 (one variable fully determining the other).
References & further reading
- Theil, H. (1970). On the Estimation of Relationships Involving Qualitative Variables. American Journal of Sociology.
- Uncertainty coefficient — Wikipedia
Try it on your own data — free, no signup
Upload a CSV or XLSX. Everything runs in your browser; your file never leaves your device.
Open the workspace →