Introduction to the Mathematical and Statistical Foundations of Econometrics available in Hardcover
![Introduction to the Mathematical and Statistical Foundations of Econometrics](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.10.4)
Introduction to the Mathematical and Statistical Foundations of Econometrics
- ISBN-10:
- 0521834317
- ISBN-13:
- 9780521834315
- Pub. Date:
- 12/20/2004
- Publisher:
- Cambridge University Press
- ISBN-10:
- 0521834317
- ISBN-13:
- 9780521834315
- Pub. Date:
- 12/20/2004
- Publisher:
- Cambridge University Press
![Introduction to the Mathematical and Statistical Foundations of Econometrics](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.10.4)
Introduction to the Mathematical and Statistical Foundations of Econometrics
Hardcover
Buy New
$147.00Overview
Product Details
ISBN-13: | 9780521834315 |
---|---|
Publisher: | Cambridge University Press |
Publication date: | 12/20/2004 |
Series: | Themes in Modern Econometrics |
Pages: | 344 |
Product dimensions: | 5.98(w) x 9.02(h) x 0.94(d) |
About the Author
Read an Excerpt
Cambridge University Press
0521834317 - Introduction to the Mathematical and Statistical Foundations of Econometrics - by Herman J. Bierens
Excerpt
1 Probability and Measure
1.1. The Texas Lotto
1.1.1. Introduction
Texans used to play the lotto by selecting six different numbers between 1 and 50, which cost $1 for each combination.1 Twice a week, on Wednesday and Saturday at 10 P.M., six ping-pong balls were released without replacement from a rotating plastic ball containing 50 ping-pong balls numbered 1 through 50. The winner of the jackpot (which has occasionally accumulated to 60 or more million dollars!) was the one who had all six drawn numbers correct, where the order in which the numbers were drawn did not matter. If these conditions were still being observed, what would the odds of winning by playing one set of six numbers only?
To answer this question, suppose first that the order of the numbers does matter. Then the number of ordered sets of 6 out of 50 numbers is 50 possibilities for the first drawn number times 49 possibilities for the second drawn number, times 48 possibilities for the third drawn number, times 47 possibilities for the fourth drawn number, times 46 possibilities for the fifth drawn number, times 45 possibilities for the sixth drawn number: The notation n!, read "n factorial," stands for the product of the natural numbers 1 through n: The reason for defining 0! = 1 will be explained in the next section. Because a set of six given numbers can be permutated in 6! ways, we need to correct the preceding number for the 6! replications of each unordered set of six given numbers. Therefore, the number of sets of six unordered numbers out of 50 is Thus, the probability of winning such a lotto by playing only one combination of six numbers is 1⁄15,890,700.2 1.1.2. Binomial Numbers In general, the number of ways we can draw a set of k unordered objects out of a set of n objects without replacement is These (binomial) numbers,3 read as "n choose k," also appear as coefficients in the binomial expansion The reason for defining 0! = 1 is now that the first and last coefficients in this binomial expansion are always equal to 1: For not too large an n, the binomial numbers (1.01) can be computed recursively by hand using the Triangle of Pascal: Except for the 1's on the legs and top of the triangle in (1.03), the entries are the sum of the adjacent numbers on the previous line, which results from the following easy equality: Thus, the top 1 corresponds to n = 0, the second row corresponds to n = 1, the third row corresponds to n = 2, and so on, and for each row n + 1, the entries are the binomial numbers (1.01) for k = 0, . . . n. For example, for n = 4 the coefficients of ak bn-k in the binomial expansion (1.02) can be found on row 5 in (1.03): (a+b)4 = 1× a4 + 4× a3b + 6× a2b2 + 4× ab3 + 1× b4. 1.1.3. Sample Space The Texas lotto is an example of a statistical experiment. The set of possible outcomes of this statistical experiment is called the sample space and is usually denoted by Ω. In the Texas lotto case, Ω contains N = 15,890,700 elements: Ω = {ω1, . . . ,ωN}, where each element ωj is a set itself consisting of six different numbers ranging from 1 to 50 such that for any pair ωi, ωj with i ≠ j, ωi ≠ ωj. Because in this case the elements ωj of Ω are sets themselves, the condition ωi ≠ ωj for i ≠ j is equivalent to the condition that ωi ∩ Ωj ∉ ω. 1.1.4. Algebras and Sigma-Algebras of Events A set {ωj1, . . . ,ωjk} of different number combinations you can bet on is called an event. The collection of all these events, denoted by ⃞, is a "family" of subsets of the sample space Ω. In the Texas lotto case the collection ⃞ consists of all subsets of Ω, including Ω itself and the empty set ∅.4 In principle, you could bet on all number combinations if you were rich enough (it would cost you $15,890,700). Therefore, the sample space Ω itself is included in ⃞. You could also decide not to play at all. This event can be identified as the empty set ∅. For the sake of completeness, it is included in ⃞ as well. Because, in the Texas lotto case, the collection ⃞ contains all subsets of Ω, it automatically satisfies the conditions where à = Ω A is the complement of the set A (relative to the set Ω), that is, the set of all elements of Ω that are not contained in A, and By induction, the latter condition extends to any finite union of sets in ⃞: If Aj ∈ ⃞ for j = 1, 2, . . . , n, then ∪nj=1Aj ∈ {⃞}. Definition 1.1: A collection ⃞ of subsets of a nonempty set Ω satisfying the conditions (1.5) and (1.6) is called an algebra.5 In the Texas lotto example, the sample space Ω is finite, and therefore the collection ⃞ of subsets of ω is finite as well. Consequently, in this case the condition (1.06) extends to However, because in this case the collection ⃞ of subsets of Ω is finite, there are only a finite number of distinct sets Aj ∈ ⃞. Therefore, in the Texas lotto case the countable infinite union ∪^∞j=1Aj in (1.07) involves only a finite number of distinct sets Aj; the other sets are replications of these distinct sets. Thus, condition (1.07) does not require that all the sets Aj ∈ ⃞ are different. Definition 1.2: A collection ⃞ of subsets of a nonempty set ω satisfying the conditions (1.5) and (1.7) is called a σ-algebra.6 1.1.5. Probability Measure Let us return to the Texas lotto example. The odds, or probability, of winning are 1⁄N for each valid combination ωj of six numbers; hence, if you play n different valid number combinations {ωj1, . . . ,ωjn}, the probability of winning is n/N: P(ωj1, . . . ,ωjn) = n/N. Thus, in the Texas lotto case the probability P(A), A ∈ ⃞, is given by the number n of elements in the set A divided by the total number N of elements in Ω. In particular we have P(Ω) = 1, and if you do not play at all the probability of winning is zero: P(∅) = 0. The function P(A), A ∈ ⃞, is called a probability measure. It assigns a number P(A) ∈ [0, 1] to each set A ∈ ⃞. Not every function that assigns numbers in [0, 1] to the sets in ⃞ is a probability measure except as set forth in the following definition: Definition 1.3: A mapping P: ⃞ → [0, 1] from a σ-algebra ⃞ of subsets of a set Ω into the unit interval is a probability measure on {Ω, ⃞} if it satisfies the following three conditions: Recall that sets are disjoint if they have no elements in common: their intersections are the empty set. The conditions (1.08) and (1.09) are clearly satisfied for the case of the Texas lotto. On the other hand, in the case under review the collection ⃞ of events contains only a finite number of sets, and thus any countably infinite sequence of sets in ⃞ must contain sets that are the same. At first sight this seems to conflict with the implicit assumption that countably infinite sequences of disjoint sets always exist for which (1.10) holds. It is true indeed that any countably infinite sequence of disjoint sets in a finite collection ⃞ of sets can only contain a finite number of nonempty sets. This is no problem, though, because all the other sets are then equal to the empty set ∅. The empty set is disjoint with itself, ∅ ∩ ∅ = ∅, and with any other set, A ∩ ∅ = ∅. Therefore, if ⃞ is finite, then any countable infinite sequence of disjoint sets consists of a finite number of nonempty sets and an infinite number of replications of the empty set. Consequently, if ⃞ is finite, then it is sufficient to verify condition (1.10) for any pair of disjoint sets A1, A2 in ⃞, P(A1∪ A2) = P(A1) + P(A2). Because, in the Texas lotto case P(A1∪ A2) = (n1 + n2)/N, P(A1) = n1/N, and P(A2) = n2/N, where n1 is the number of elements of A1 and n2 is the number of elements of A2, the latter condition is satisfied and so is condition (1.10). The statistical experiment is now completely described by the triple {Ω, ⃞, P}, called the probability space, consisting of the sample space Ω (i.e., the set of all possible outcomes of the statistical experiment involved), a σ-algebra ⃞ of events (i.e., a collection of subsets of the sample space Ω such that the conditions (1.05) and (1.07) are satisfied), and a probability measure P: ⃞ → [0, 1] satisfying the conditions (1.08)-(1.10). In the Texas lotto case the collection ⃞ of events is an algebra, but because ⃞ is finite it is automatically a σ-algebra. 1.2. Quality Control 1.2.1. Sampling without Replacement As a second example, consider the following case. Suppose you are in charge of quality control in a light bulb factory. Each day N light bulbs are produced. But before they are shipped out to the retailers, the bulbs need to meet a minimum quality standard such as not allowing more than R out of N bulbs to be defective. The only way to verify this exactly is to try all the N bulbs out, but that will be too costly. Therefore, the way quality control is conducted in practice is to randomly draw n bulbs without replacement and to check how many bulbs in this sample are defective. As in the Texas lotto case, the number M of different samples sj of size n you can draw out of a set of N elements without replacement is Each sample sj is characterized by a number kj of defective bulbs in the sample involved. Let K be the actual number of defective bulbs. Then kj ∈ {0, 1, . . . , min(n, K)}. Let Ω = {0, 1, . . . , n} and let the σ-algebra ⃞ be the collection of all subsets of Ω. The number of samples sj with kj = k ≤ min(n, K) defective bulbs is because there are "K choose k" ways to draw k unordered numbers out of K numbers without replacement and "N - K choose n - k" ways to draw n - k unordered numbers out of N - K numbers without replacement. Of course, in the case that n > K the number of samples sj with kj = k > min(n, K) defective bulbs is zero. Therefore, let and for each set A = k1, . . . , km ∈ {⃞}, let P(A) =∑mj=1P(kj). (Exercise: Verify that this function P satisfies all the requirements of a probability measure.) The triple {Ω, ⃞, P} is now the probability space corresponding to this statistical experiment. The probabilities (1.11) are known as the hypergeometric (N, K, n) probabilities. 1.2.2. Quality Control in Practice7 The problem in applying this result in quality control is that K is unknown. Therefore, in practice the following decision rule as to whether K ≤ R or not is followed. Given a particular number r ≤ n, to be determined at the end of this subsection, assume that the set of N bulbs meets the minimum quality requirement K ≤ R if the number k of defective bulbs in the sample is less than or equal to r. Then the set A(r) = {0, 1, . . . , r} corresponds to the assumption that the set of N bulbs meets the minimum quality requirement K ≤ R, hereafter indicated by "accept," with probability say, whereas its complement Ã(r)=r+1, . . . , n corresponds to the assumption that this set of N bulbs does not meet this quality requirement, hereafter indicated by "reject," with corresponding probability Given r, this decision rule yields two types of errors: a Type I error with probability 1-pr(n,K) if you reject, whereas in reality K ≤ R, and a Type II error with probability pr(K, n) if you accept, whereas in reality K > R. The probability of a Type I error has upper bound and the probability of a Type II error upper bound To be able to choose r, one has to restrict either p1(r, n) or p2(r, n), or both. Usually it is the former option that is restricted because a Type I error may cause the whole stock of N bulbs to be trashed. Thus, allow the probability of a Type I error to be a maximal α such as α = 0.05. Then r should be chosen such that p1(r, n) ≤ α. Because p1(r, n) is decreasing in r, due to the fact that (1.12) is increasing in r, we could in principle choose r arbitrarily large. But because p2(r, n) is increasing in r, we should not choose r unnecessarily large. Therefore, choose r = r(n¦ α, where r(n¦ α) is the minimum value of r for which p1(r,n) ≤ α. Moreover, if we allow the Type II error to be maximal β, we have to choose the sample size n such that p2(r(n¦ α),n) ≤ β. As we will see in Chapters 5 and 6, this decision rule is an example of a statistical test, where H0: K ≤ R is called the null hypothesis to be tested at the α × 100% significance level against the alternative hypothesis H1: K > R. The number r(n ¦ α) is called the critical value of the test, and the number k of defective bulbs in the sample is called the test statistic. 1.2.3. Sampling with Replacement As a third example, consider the quality control example in the previous section except that now the light bulbs are sampled with replacement: After a bulb is tested, it is put back in the stock of N bulbs even if the bulb involved proves to be defective. The rationale for this behavior may be that the customers will at most accept a fraction R/N of defective bulbs and thus will not complain as long as the actual fraction K/N of defective bulbs does not exceed R/N. In other words, why not sell defective light bulbs if doing so is acceptable to the customers? The sample space Ω and the σ-algebra ⃞ are the same as in the case of sampling without replacement, but the probability measure P is different. Consider again a sample sj of size n containing k defective light bulbs. Because the light bulbs are put back in the stock after being tested, there are Kk ways of drawing an ordered set of k defective bulbs and (N-K)n-k ways of drawing an ordered set of n - k working bulbs. Thus, the number of ways we can draw, with replacement, an ordered set of n light bulbs containing k defective bulbs is Kk(N-K)n-k. Moreover, as in the Texas lotto case, it follows that the number of unordered sets of k defective bulbs and n-k working bulbs is "n choose k." Thus, the total number of ways we can choose a sample with replacement containing k defective bulbs and n-k working bulbs in any order is Moreover, the number of ways we can choose a sample of size n with replacement is Nn. Therefore, where p = K/N, and again for each set A = k1, . . . , km∈ {⃞}, P(A) = ∑mj=1{P}(kj). Of course, if we replace P({k}) in (1.11) by (1.15), the argument in Section 1.2.2 still applies. The probabilities (1.15) are known as the binomial (n, p) probabilities. 1.2.4. Limits of the Hypergeometric and Binomial Probabilities Note that if N and K are large relative to n, the hypergeometric probability (1.11) and the binomial probability (1.15) will be almost the same. This follows from the fact that, for fixed k and n, Thus, the binomial probabilities also arise as limits of the hypergeometric probabilities. Moreover, if in the case of the binomial probability (1.15) p is very small and n is very large, the probability (1.15) can be approximated quite well by the Poisson(λ) probability: where λ = np. This follows from (1.15) by choosing p = λ/n for n > λ, with λ > 0 fixed, and letting n → ∞ while keeping k fixed: because for n→ ∞, and Due to the fact that (1.16) is the limit of (1.15) for p = λ/n ↓ 0 as n → ∞, the Poisson probabilities (1.16) are often used to model the occurrence of rare events. Note that the sample space corresponding to the Poisson probabilities is Ω = {0, 1, 2, . . . } and that the σ-algebra ⃞ of events involved can be chosen to be the collection of all subsets of Ω because any nonempty subset A of Ω is either countable infinite or finite. If such a subset A is countable infinite, it takes the form A = k1, k2, k3, . . . , where the kj's are distinct nonnegative integers; hence, P(A) = ∑∞j=1P(kj) is well-defined. The same applies of course if A is finite: if A = k1, . . . ,km, then P(A) = ∑mj=1P(kj). This probability measure clearly satisfies the conditions (1.08)-(1.10). 1.3. Why Do We Need Sigma-Algebras of Events? In principle we could define a probability measure on an algebra ⃞ of subsets of the sample space rather than on a σ-algebra. We only need to change condition (1.10) as follows: For disjoint sets Aj ∈ ⃞ such that ∪^∞j=1Aj ∈ {⃞}, P(∪^∞j=1 Aj) = ∑∞j=1P(Aj). By letting all but a finite number of these sets be equal to the empty set, this condition then reads as follows: For disjoint sets Aj ∈ {⃞}, j = 1,2, . . . , n < ∞, P(∪nj=1 Aj) = ∑nj=1P(Aj). However, if we confined a probability measure to an algebra, all kinds of useful results would no longer apply. One of these results is the so-called strong law of large numbers (see Chapter 6). As an example, consider the following game. Toss a fair coin infinitely many times and assume that after each tossing you will get one dollar if the outcome is heads and nothing if the outcome is tails. The sample space Ω in this case can be expressed in terms of the winnings, that is, each element Ω of Ω takes the form of a string of infinitely many zeros and ones, for example, ω = (1, 1, 0, 1, 0, 1, . . . ). Now consider the event: "After n tosses the winning is k dollars." This event corresponds to the set Akn of elements ω of Ω for which the sum of the first n elements in the string involved is equal to k. For example, the set A1,2 consists of all ω of the type (1, 0, . . . ) and (0, 1, . . . ). As in the example in Section 1.2.3, it can be shown that Next, for q = 1, 2, . . . ?>, consider the events after n tosses the average winning k/n is contained in the interval [0.5-1/q, 0.5+1/q]. These events correspond to the sets where [x] denotes the smallest integer ≥ x. Then the set ∩∞m = n Bq,m corresponds to the following event: From the nth tossing onwards the average winning will stay in the interval [0.5-1/q, 0.5+1/q]; the set ∪∞n=1 ∩∞m=n Bq,m corresponds to the event there exists an n (possibly depending on ω) such that from the nth tossing onwards the average winning will stay in the interval [0.5-1/q, 0.5+1/q]. Finally, the set ∩∞q=1 ∪∞n=1 ∩∞m=n Bq,m corresponds to the event the average winning converges to 1/2 as n converges to infinity. Now the strong law of large numbers states that the latter event has probability 1: P[∩∞q=1 ∪∞n=1 ∩∞m=n Bq,m]= 1. However, this probability is only defined if ∩∞q=1 ∪∞n=1 ∩∞m=n Bq,m ∈ ⃞. To guarantee this, we need to require that ⃞ be a σ-algebra.
© Cambridge University Press