Introduction to the Mathematical and Statistical Foundations of Econometrics

Introduction to the Mathematical and Statistical Foundations of Econometrics

by Herman J. Bierens
ISBN-10:
0521834317
ISBN-13:
9780521834315
Pub. Date:
12/20/2004
Publisher:
Cambridge University Press
ISBN-10:
0521834317
ISBN-13:
9780521834315
Pub. Date:
12/20/2004
Publisher:
Cambridge University Press
Introduction to the Mathematical and Statistical Foundations of Econometrics

Introduction to the Mathematical and Statistical Foundations of Econometrics

by Herman J. Bierens

Hardcover

$147.0
Current price is , Original price is $147.0. You
$147.00 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Overview

The focus of this book is on clarifying the mathematical and statistical foundations of econometrics. Therefore, the text provides all the proofs, or at least motivations if proofs are too complicated, of the mathematical and statistical results necessary for understanding modern econometric theory. In this respect, it differs from other econometrics textbooks.

Product Details

ISBN-13: 9780521834315
Publisher: Cambridge University Press
Publication date: 12/20/2004
Series: Themes in Modern Econometrics
Pages: 344
Product dimensions: 5.98(w) x 9.02(h) x 0.94(d)

About the Author

Herman J. Bierens is Professor of Economics at the Pennsylvania State University and part-time Professor of Econometrics at Tilburg University, The Netherlands. He is Associate Editor of the Journal of Econometrics and Econometric Reviews, and has been an Associate Editor of Econometrica. Professor Bierens has written two monographs, Robust Methods and Asymptotic Theory in Nonlinear Econometrics and Topics in Advanced Econometrics Cambridge University Press 1994), as well as numerous journal articles. His current research interests are model (mis)specification analysis in econometrics and its application in empirical research, time series econometrics, and the econometric analysis of dynamic stochastic general equilibrium models.

Read an Excerpt

Introduction to the Mathematical and Statistical Foundations of Econometrics
Cambridge University Press
0521834317 - Introduction to the Mathematical and Statistical Foundations of Econometrics - by Herman J. Bierens
Excerpt



1 Probability and Measure


1.1. The Texas Lotto

1.1.1. Introduction

Texans used to play the lotto by selecting six different numbers between 1 and 50, which cost $1 for each combination.1 Twice a week, on Wednesday and Saturday at 10 P.M., six ping-pong balls were released without replacement from a rotating plastic ball containing 50 ping-pong balls numbered 1 through 50. The winner of the jackpot (which has occasionally accumulated to 60 or more million dollars!) was the one who had all six drawn numbers correct, where the order in which the numbers were drawn did not matter. If these conditions were still being observed, what would the odds of winning by playing one set of six numbers only?

To answer this question, suppose first that the order of the numbers does matter. Then the number of ordered sets of 6 out of 50 numbers is 50 possibilities for the first drawn number times 49 possibilities for the second drawn number, times 48 possibilities for the third drawn number, times 47 possibilities for the fourth drawn number, times 46 possibilities for the fifth drawn number, times 45 possibilities for the sixth drawn number: Display matter not available in HTML version

The notation n!, read "n factorial," stands for the product of the natural numbers 1 through n: Display matter not available in HTML version

The reason for defining 0! = 1 will be explained in the next section.

Because a set of six given numbers can be permutated in 6! ways, we need to correct the preceding number for the 6! replications of each unordered set of six given numbers. Therefore, the number of sets of six unordered numbers out of 50 is Display matter not available in HTML version

Thus, the probability of winning such a lotto by playing only one combination of six numbers is 1⁄15,890,700.2

1.1.2. Binomial Numbers

In general, the number of ways we can draw a set of k unordered objects out of a set of n objects without replacement is Display matter not available in HTML version

These (binomial) numbers,3 read as "n choose k," also appear as coefficients in the binomial expansion Display matter not available in HTML version

The reason for defining 0! = 1 is now that the first and last coefficients in this binomial expansion are always equal to 1: Display matter not available in HTML version

For not too large an n, the binomial numbers (1.01) can be computed recursively by hand using the Triangle of Pascal: Display matter not available in HTML version

Except for the 1's on the legs and top of the triangle in (1.03), the entries are the sum of the adjacent numbers on the previous line, which results from the following easy equality: Display matter not available in HTML version

Thus, the top 1 corresponds to n = 0, the second row corresponds to n = 1, the third row corresponds to n = 2, and so on, and for each row n + 1, the entries are the binomial numbers (1.01) for k = 0, . . . n. For example, for n = 4 the coefficients of ak bn-k in the binomial expansion (1.02) can be found on row 5 in (1.03): (a+b)4 = 1× a4 + 4× a3b + 6× a2b2 + 4× ab3 + 1× b4.

1.1.3. Sample Space

The Texas lotto is an example of a statistical experiment. The set of possible outcomes of this statistical experiment is called the sample space and is usually denoted by Ω. In the Texas lotto case, Ω contains N = 15,890,700 elements: Ω = {ω1, . . . ,ωN}, where each element ωj is a set itself consisting of six different numbers ranging from 1 to 50 such that for any pair ωi, ωj with ij, ωi ≠ ωj. Because in this case the elements ωj of Ω are sets themselves, the condition ωi ≠ ωj for ij is equivalent to the condition that ωi ∩ Ωj ∉ ω.

1.1.4. Algebras and Sigma-Algebras of Events

A set {ωj1, . . . ,ωjk} of different number combinations you can bet on is called an event. The collection of all these events, denoted by ⃞, is a "family" of subsets of the sample space Ω. In the Texas lotto case the collection ⃞ consists of all subsets of Ω, including Ω itself and the empty set ∅.4 In principle, you could bet on all number combinations if you were rich enough (it would cost you $15,890,700). Therefore, the sample space Ω itself is included in ⃞. You could also decide not to play at all. This event can be identified as the empty set ∅. For the sake of completeness, it is included in ⃞ as well.

Because, in the Texas lotto case, the collection ⃞ contains all subsets of Ω, it automatically satisfies the conditions Display matter not available in HTML version

where à = Ω A is the complement of the set A (relative to the set Ω), that is, the set of all elements of Ω that are not contained in A, and Display matter not available in HTML version

By induction, the latter condition extends to any finite union of sets in ⃞: If Aj ∈ ⃞ for j = 1, 2, . . . , n, then ∪nj=1Aj ∈ {⃞}.

Definition 1.1: A collectionof subsets of a nonempty set Ω satisfying the conditions (1.5) and (1.6) is called an algebra.5

In the Texas lotto example, the sample space Ω is finite, and therefore the collection ⃞ of subsets of ω is finite as well. Consequently, in this case the condition (1.06) extends to Display matter not available in HTML version

However, because in this case the collection ⃞ of subsets of Ω is finite, there are only a finite number of distinct sets Aj ∈ ⃞. Therefore, in the Texas lotto case the countable infinite union ∪^∞j=1Aj in (1.07) involves only a finite number of distinct sets Aj; the other sets are replications of these distinct sets. Thus, condition (1.07) does not require that all the sets Aj ∈ ⃞ are different.

Definition 1.2: A collectionof subsets of a nonempty set ω satisfying the conditions (1.5) and (1.7) is called a σ-algebra.6

1.1.5. Probability Measure

Let us return to the Texas lotto example. The odds, or probability, of winning are 1⁄N for each valid combination ωj of six numbers; hence, if you play n different valid number combinations {ωj1, . . . ,ωjn}, the probability of winning is n/N: Pj1, . . . ,ωjn) = n/N. Thus, in the Texas lotto case the probability P(A), A ∈ ⃞, is given by the number n of elements in the set A divided by the total number N of elements in Ω. In particular we have P(Ω) = 1, and if you do not play at all the probability of winning is zero: P(∅) = 0.

The function P(A), A ∈ ⃞, is called a probability measure. It assigns a number P(A) ∈ [0, 1] to each set A ∈ ⃞. Not every function that assigns numbers in [0, 1] to the sets in ⃞ is a probability measure except as set forth in the following definition:

Definition 1.3: A mapping P: ⃞ → [0, 1] from a σ-algebraof subsets of a set Ω into the unit interval is a probability measure on {Ω, ⃞} if it satisfies the following three conditions: Display matter not available in HTML version

Recall that sets are disjoint if they have no elements in common: their intersections are the empty set.

The conditions (1.08) and (1.09) are clearly satisfied for the case of the Texas lotto. On the other hand, in the case under review the collection ⃞ of events contains only a finite number of sets, and thus any countably infinite sequence of sets in ⃞ must contain sets that are the same. At first sight this seems to conflict with the implicit assumption that countably infinite sequences of disjoint sets always exist for which (1.10) holds. It is true indeed that any countably infinite sequence of disjoint sets in a finite collection ⃞ of sets can only contain a finite number of nonempty sets. This is no problem, though, because all the other sets are then equal to the empty set ∅. The empty set is disjoint with itself, ∅ ∩ ∅ = ∅, and with any other set, A ∩ ∅ = ∅. Therefore, if ⃞ is finite, then any countable infinite sequence of disjoint sets consists of a finite number of nonempty sets and an infinite number of replications of the empty set. Consequently, if ⃞ is finite, then it is sufficient to verify condition (1.10) for any pair of disjoint sets A1, A2 in ⃞, P(A1A2) = P(A1) + P(A2). Because, in the Texas lotto case P(A1A2) = (n1 + n2)/N, P(A1) = n1/N, and P(A2) = n2/N, where n1 is the number of elements of A1 and n2 is the number of elements of A2, the latter condition is satisfied and so is condition (1.10).

The statistical experiment is now completely described by the triple {Ω, ⃞, P}, called the probability space, consisting of the sample space Ω (i.e., the set of all possible outcomes of the statistical experiment involved), a σ-algebra ⃞ of events (i.e., a collection of subsets of the sample space Ω such that the conditions (1.05) and (1.07) are satisfied), and a probability measure P: ⃞ → [0, 1] satisfying the conditions (1.08)-(1.10).

In the Texas lotto case the collection ⃞ of events is an algebra, but because ⃞ is finite it is automatically a σ-algebra.

1.2. Quality Control

1.2.1. Sampling without Replacement

As a second example, consider the following case. Suppose you are in charge of quality control in a light bulb factory. Each day N light bulbs are produced. But before they are shipped out to the retailers, the bulbs need to meet a minimum quality standard such as not allowing more than R out of N bulbs to be defective. The only way to verify this exactly is to try all the N bulbs out, but that will be too costly. Therefore, the way quality control is conducted in practice is to randomly draw n bulbs without replacement and to check how many bulbs in this sample are defective.

As in the Texas lotto case, the number M of different samples sj of size n you can draw out of a set of N elements without replacement is Display matter not available in HTML version

Each sample sj is characterized by a number kj of defective bulbs in the sample involved. Let K be the actual number of defective bulbs. Then kj ∈ {0, 1, . . . , min(n, K)}.

Let Ω = {0, 1, . . . , n} and let the σ-algebra ⃞ be the collection of all subsets of Ω. The number of samples sj with kj = k ≤ min(n, K) defective bulbs is Display matter not available in HTML version

because there are "K choose k" ways to draw k unordered numbers out of K numbers without replacement and "N - K choose n - k" ways to draw n - k unordered numbers out of N - K numbers without replacement. Of course, in the case that n > K the number of samples sj with kj = k > min(n, K) defective bulbs is zero. Therefore, let Display matter not available in HTML version

and for each set A = k1, . . . , km ∈ {⃞}, let P(A) =∑mj=1P(kj). (Exercise: Verify that this function P satisfies all the requirements of a probability measure.) The triple {Ω, ⃞, P} is now the probability space corresponding to this statistical experiment.

The probabilities (1.11) are known as the hypergeometric (N, K, n) probabilities.

1.2.2. Quality Control in Practice7

The problem in applying this result in quality control is that K is unknown. Therefore, in practice the following decision rule as to whether KR or not is followed. Given a particular number rn, to be determined at the end of this subsection, assume that the set of N bulbs meets the minimum quality requirement KR if the number k of defective bulbs in the sample is less than or equal to r. Then the set A(r) = {0, 1, . . . , r} corresponds to the assumption that the set of N bulbs meets the minimum quality requirement KR, hereafter indicated by "accept," with probability Display matter not available in HTML version

say, whereas its complement Ã(r)=r+1, . . . , n corresponds to the assumption that this set of N bulbs does not meet this quality requirement, hereafter indicated by "reject," with corresponding probability Display matter not available in HTML version

Given r, this decision rule yields two types of errors: a Type I error with probability 1-pr(n,K) if you reject, whereas in reality KR, and a Type II error with probability pr(K, n) if you accept, whereas in reality K > R. The probability of a Type I error has upper bound Display matter not available in HTML version

and the probability of a Type II error upper bound Display matter not available in HTML version

To be able to choose r, one has to restrict either p1(r, n) or p2(r, n), or both. Usually it is the former option that is restricted because a Type I error may cause the whole stock of N bulbs to be trashed. Thus, allow the probability of a Type I error to be a maximal α such as α = 0.05. Then r should be chosen such that p1(r, n) ≤ α. Because p1(r, n) is decreasing in r, due to the fact that (1.12) is increasing in r, we could in principle choose r arbitrarily large. But because p2(r, n) is increasing in r, we should not choose r unnecessarily large. Therefore, choose r = r(n¦ α, where r(n¦ α) is the minimum value of r for which p1(r,n) ≤ α. Moreover, if we allow the Type II error to be maximal β, we have to choose the sample size n such that p2(r(n¦ α),n) ≤ β.

As we will see in Chapters 5 and 6, this decision rule is an example of a statistical test, where H0: KR is called the null hypothesis to be tested at the α × 100% significance level against the alternative hypothesis H1: K > R. The number r(n ¦ α) is called the critical value of the test, and the number k of defective bulbs in the sample is called the test statistic.

1.2.3. Sampling with Replacement

As a third example, consider the quality control example in the previous section except that now the light bulbs are sampled with replacement: After a bulb is tested, it is put back in the stock of N bulbs even if the bulb involved proves to be defective. The rationale for this behavior may be that the customers will at most accept a fraction R/N of defective bulbs and thus will not complain as long as the actual fraction K/N of defective bulbs does not exceed R/N. In other words, why not sell defective light bulbs if doing so is acceptable to the customers?

The sample space Ω and the σ-algebra ⃞ are the same as in the case of sampling without replacement, but the probability measure P is different. Consider again a sample sj of size n containing k defective light bulbs. Because the light bulbs are put back in the stock after being tested, there are Kk ways of drawing an ordered set of k defective bulbs and (N-K)n-k ways of drawing an ordered set of n - k working bulbs. Thus, the number of ways we can draw, with replacement, an ordered set of n light bulbs containing k defective bulbs is Kk(N-K)n-k. Moreover, as in the Texas lotto case, it follows that the number of unordered sets of k defective bulbs and n-k working bulbs is "n choose k." Thus, the total number of ways we can choose a sample with replacement containing k defective bulbs and n-k working bulbs in any order is Display matter not available in HTML version

Moreover, the number of ways we can choose a sample of size n with replacement is Nn. Therefore, Display matter not available in HTML version

where p = K/N, and again for each set A = k1, . . . , km∈ {⃞}, P(A) = ∑mj=1{P}(kj). Of course, if we replace P({k}) in (1.11) by (1.15), the argument in Section 1.2.2 still applies.

The probabilities (1.15) are known as the binomial (n, p) probabilities.

1.2.4. Limits of the Hypergeometric and Binomial Probabilities

Note that if N and K are large relative to n, the hypergeometric probability (1.11) and the binomial probability (1.15) will be almost the same. This follows from the fact that, for fixed k and n, Display matter not available in HTML version

Thus, the binomial probabilities also arise as limits of the hypergeometric probabilities.

Moreover, if in the case of the binomial probability (1.15) p is very small and n is very large, the probability (1.15) can be approximated quite well by the Poisson(λ) probability: Display matter not available in HTML version

where λ = np. This follows from (1.15) by choosing p = λ/n for n > λ, with λ > 0 fixed, and letting n → ∞ while keeping k fixed: Display matter not available in HTML version

because for n→ ∞, Display matter not available in HTML version

and Display matter not available in HTML version

Due to the fact that (1.16) is the limit of (1.15) for p = λ/n ↓ 0 as n → ∞, the Poisson probabilities (1.16) are often used to model the occurrence of rare events.

Note that the sample space corresponding to the Poisson probabilities is Ω = {0, 1, 2, . . . } and that the σ-algebra ⃞ of events involved can be chosen to be the collection of all subsets of Ω because any nonempty subset A of Ω is either countable infinite or finite. If such a subset A is countable infinite, it takes the form A = k1, k2, k3, . . . , where the kj's are distinct nonnegative integers; hence, P(A) = ∑j=1P(kj) is well-defined. The same applies of course if A is finite: if A = k1, . . . ,km, then P(A) = ∑mj=1P(kj). This probability measure clearly satisfies the conditions (1.08)-(1.10).

1.3. Why Do We Need Sigma-Algebras of Events?

In principle we could define a probability measure on an algebra ⃞ of subsets of the sample space rather than on a σ-algebra. We only need to change condition (1.10) as follows: For disjoint sets Aj ∈ ⃞ such that ∪^∞j=1Aj ∈ {⃞}, P(∪^∞j=1 Aj) = ∑j=1P(Aj). By letting all but a finite number of these sets be equal to the empty set, this condition then reads as follows: For disjoint sets Aj ∈ {⃞}, j = 1,2, . . . , n < ∞, P(∪nj=1 Aj) = ∑nj=1P(Aj). However, if we confined a probability measure to an algebra, all kinds of useful results would no longer apply. One of these results is the so-called strong law of large numbers (see Chapter 6).

As an example, consider the following game. Toss a fair coin infinitely many times and assume that after each tossing you will get one dollar if the outcome is heads and nothing if the outcome is tails. The sample space Ω in this case can be expressed in terms of the winnings, that is, each element Ω of Ω takes the form of a string of infinitely many zeros and ones, for example, ω = (1, 1, 0, 1, 0, 1, . . . ). Now consider the event: "After n tosses the winning is k dollars." This event corresponds to the set Akn of elements ω of Ω for which the sum of the first n elements in the string involved is equal to k. For example, the set A1,2 consists of all ω of the type (1, 0, . . . ) and (0, 1, . . . ). As in the example in Section 1.2.3, it can be shown that Display matter not available in HTML version

Next, for q = 1, 2, . . . ?>, consider the events after n tosses the average winning k/n is contained in the interval [0.5-1/q, 0.5+1/q]. These events correspond to the sets Display matter not available in HTML version

where [x] denotes the smallest integer ≥ x. Then the set ∩m = n Bq,m corresponds to the following event: From the nth tossing onwards the average winning will stay in the interval [0.5-1/q, 0.5+1/q]; the set ∪n=1m=n Bq,m corresponds to the event there exists an n (possibly depending on ω) such that from the nth tossing onwards the average winning will stay in the interval [0.5-1/q, 0.5+1/q]. Finally, the set ∩q=1n=1m=n Bq,m corresponds to the event the average winning converges to 1/2 as n converges to infinity. Now the strong law of large numbers states that the latter event has probability 1: P[∩q=1n=1m=n Bq,m]= 1. However, this probability is only defined if ∩q=1n=1m=n Bq,m ∈ ⃞. To guarantee this, we need to require that ⃞ be a σ-algebra.



© Cambridge University Press

Table of Contents

Part I. Probability and Measure: 1. The Texas lotto; 2. Quality control; 3. Why do we need sigma-algebras of events?; 4. Properties of algebras and sigma-algebras; 5. Properties of probability measures; 6. The uniform probability measures; 7. Lebesque measure and Lebesque integral; 8. Random variables and their distributions; 9. Density functions; 10. Conditional probability, Bayes's rule, and independence; 11. Exercises: A. Common structure of the proofs of Theorems 6 and 10, B. Extension of an outer measure to a probability measure; Part II. Borel Measurability, Integration and Mathematical Expectations: 12. Introduction; 13. Borel measurability; 14. Integral of Borel measurable functions with respect to a probability measure; 15. General measurability and integrals of random variables with respect to probability measures; 16. Mathematical expectation; 17. Some useful inequalities involving mathematical expectations; 18. Expectations of products of independent random variables; 19. Moment generating functions and characteristic functions; 20. Exercises: A. Uniqueness of characteristic functions; Part III. Conditional Expectations: 21. Introduction; 22. Properties of conditional expectations; 23. Conditional probability measures and conditional independence; 24. Conditioning on increasing sigma-algebras; 25. Conditional expectations as the best forecast schemes; 26. Exercises; A. Proof of theorem 22; Part IV. Distributions and Transformations: 27. Discrete distributions; 28. Transformations of discrete random vectors; 29. Transformations of absolutely continuous random variables; 30. Transformations of absolutely continuous random vectors; 31. The normal distribution; 32. Distributions related to the normal distribution; 33. The uniform distribution and its relation to the standard normal distribution; 34. The gamma distribution; 35. Exercises: A. Tedious derivations; B. Proof of theorem 29; Part V. The Multivariate Normal Distribution and its Application to Statistical Inference: 36. Expectation and variance of random vectors; 37. The multivariate normal distribution; 38. Conditional distributions of multivariate normal random variables; 39. Independence of linear and quadratic transformations of multivariate normal random variables; 40. Distribution of quadratic forms of multivariate normal random variables; 41. Applications to statistical inference under normality; 42. Applications to regression analysis; 43. Exercises; A. Proof of theorem 43; Part VI. Modes of Convergence: 44. Introduction; 45. Convergence in probability and the weak law of large numbers; 46. Almost sure convergence, and the strong law of large numbers; 47. The uniform law of large numbers and its applications; 48. Convergence in distribution; 49. Convergence of characteristic functions; 50. The central limit theorem; 51. Stochastic boundedness, tightness, and the Op and op-notations; 52. Asymptotic normality of M-estimators; 53. Hypotheses testing; 54. Exercises: A. Proof of the uniform weak law of large numbers; B. Almost sure convergence and strong laws of large numbers; C. Convergence of characteristic functions and distributions; Part VII. Dependent Laws of Large Numbers and Central Limit Theorems: 55. Stationary and the world decomposition; 56. Weak laws of large numbers for stationary processes; 57. Mixing conditions; 58. Uniform weak laws of large numbers; 59. Dependent central limit theorems; 60. Exercises: A. Hilbert spaces; Part VIII. Maximum Likelihood Theory; 61. Introduction; 62. Likelihood functions; 63. Examples; 64. Asymptotic properties if ML estimators; 65. Testing parameter restrictions; 66. Exercises.
From the B&N Reads Blog

Customer Reviews