In the English language, the probability of encountering the th most common word is given roughly by
for up to 1000 or so. The law breaks down for less frequent words, since the Harmonic Series diverges.
Pierce's (1980, p. 87) statement that for is incorrect. Goetz states the law as follows:
The frequency of a word is inversely proportional to its Rank such that
See also Harmonic Series, Rank (Statistics)
References
Goetz, P. ``Phil's Good Enough Complexity Dictionary.''
http://www.cs.buffalo.edu/~goetz/dict.html.
Pierce, J. R. Introduction to Information Theory: Symbols, Signals, and Noise, 2nd rev. ed. New York: Dover,
pp. 86-87 and 238-239, 1980.