A Poll at the Forum

Posted: 27th February 2014 by seanmathmodelguy in Blog

Often I see polls saying that Rob Ford has a political base that is either made of concrete or completely at odds with the reality of what it means to be a responsible leader that represents the people. The most recent example is a tweet in #TOpoli today citing an article in the Toronto Star showing Rob Ford and Olivia Chow in a near dead heat.

One of the issues that has been bothering me recently is how many of these stories simply quote the numbers without question and appeal to the forums that generated them as being accurate. In fact the latest poll referred above states:

The Forum Research automated voice response telephone poll of 1310 residents is considered accurate within 3 percentage points, 19 times out of 20.

This statement on its own should not cause much concern but at the end of the article is the following statement.

Ford has fared worse in polls conducted by Ipsos Reid than polls conducted by Forum. A mid-November Ipsos Reid poll had Chow at 36 per cent, Tory 28 per cent, Ford 20 per cent, Stintz 13 per cent, Soknacki 3 per cent. In a mid-December Ipsos Reid poll, 61 per cent said they would not consider voting for Ford.

At this point I became curious and wanted to see just how reliable the numbers being quoted are and how it is possible that a poll that claims to be so accurate (within 3 percent) could be so at odds with other polling firms. Let’s find out!

Question #1: Where does this 3% come from?
For any poll, the population is asked a question. For example: “Will you vote for Rob Ford in the next Toronto mayoral election?” Say that there were 1310 people that you asked this question of and 406 of them said “Yes, yes I will vote for Mr Ford”. If this was the case then in the particular poll that you conducted, the proportion of responses that said “Yes” is 406/1310 or 0.3099. Let’s just call that 31 percent (percent is just a fancy way to say “divide by 100”). This is not really the true proportion since there is no possible way that every voter can be asked this question and so we must estimate the true proportion with a smaller sample and hope that this sample is large enough to give a reasonable estimate of the true proportion. Now suppose you do the poll three more times and get 32 percent, then 30 percent and finally 25 percent. Each time you poll, the randomness in selecting who is asked the question is reflected in the variability of observed proportion for that particular poll. However, all is not lost, and in fact doing the poll over and over will reveal a structure in the observed percentages. They will be centred at the true proportion but have some spread about that centre value. As it turns out, most of the variability in the observed 406 is between 406 – 36 = 370 and 406 + 36 = 442 and this number 36 is the square root of the sample size. So the proportion can be anything from 370/1310 = 0.2824 to 442/1310 = 0.3374 or about plus and minus 3% = 0.03 from the reported value of 31%.

For a poll with \(n\) people being sampled, the percentage error is simply \(100/\sqrt{n}\) so for a required accuracy of 1% one would need \(n = 10000\).

Question #2: How can polls differ by more than this “accuracy”
There are a number of ways that variability in polling results can occur and primarily these are due to sampling from a group that is not a faithful representation of the population as a whole. This is becoming a increasing concern with telephone polls that only sample from homes with landlines. In fact there is an alarming trend in individuals replacing their landline in favour of cell phone. A bit of digging reveals that 2000 about 97% of all households in Canada had a landline and this has been steadily decreasing (2006: 91%, 2010: 67%. Houses that only had a cell phone have correspondingly been on the increase (2008: 8%, 2010 13%). An ever more exacerbated situation has been observed in the US and there have been some effort to account for this effect. What is of concern is that this opting out of having a landline is falling along demographic lines with data from 2010 indicating that only 7% of households with a landline were in the age group 18-29. For those who consider numbers amongst their close and personal friends, the full report of Secondary Research into Cell Phones and Telephone Surveys is available.

Consider a toy model first with just six people that are polled: \(Y_1, Y_2, Y_3\) are young people, only one of them having a landline (say \(Y_1\)) and \(O_1, O_2, O_3\) are older, all of them having a landline. Consider a candidate for an election that is preferential to the older voters with support from \(Y_1, O_1\) and \(O_2\). Some other candidate is supported by \(Y_2, Y_3\) and \(O_3\). The table summarizes this situation.

Individual Has a landline Supports candidate
\(Y_1\) Yes Yes
\(Y_2\) No No
\(Y_3\) No No
\(O_1\) Yes Yes
\(O_2\) Yes Yes
\(O_3\) Yes No

The question is, how much support does that candidate have? If you were able to ask each of the six then three of them support the candidate and this gives 3/6 = 50% support. What would a landline based telephone survey reveal? Only four of the six candidates can be polled. And of these four, ‘poll-able’ individuals, three of them would support the candidate giving 3/4 = 75% support. The difference here lies in the coverage of the sampling and the fact that those not being sampled may have significantly different views than those being sampled.

What is really being measured is not the support of the candidate by the whole population, but rather the support of the candidate by that part of the population that has a landline. We could correct for this effect if two extra pieces of data are known. First, the proportion of people that do not have a landline and second, among these non-landline individuals, what proportion would support the candidate.

The probability of the candidate being supported comes from two sources, those that support them and have a landline and those that support them without a landline, each weighted by the probability of having a landline and not having a landline respectively. Suppose we use the 2010 estimate of 67% of households having a landline (it is most likely less than this in 2014) and let \(\alpha\) denote the probability of a candidate being supported by the individuals without a landline. In the most recent survey mentioned at the start of the blog, Rob Ford was said to command 31% of the support of the people polled. How does all this combine to find the true support of the candidate? By conditioning on having a landline,
\[
P(\textrm{candidate}) = P(\textrm{candidate}|\textrm{has landline})P(\textrm{landline})+P(\textrm{candidate}|\textrm{no landline})P(\textrm{no landline})
\] or in English: “The probability of support for a candidate is the probability they are supported by an individual with a landline weighted by the probability of having a landline together with the support by an individual without a landline weighted by the probability of not having a landline.”

Each of the ingredients to this recipe on the right hand side of this expression are:

  • \(P(\textrm{candidate}|\textrm{has landline}) = 0.31\) in this case and represents the proportion observed in its unfiltered form;
  • \(P(\textrm{candidate}|\textrm{no landline}) = \alpha\) is most likely unknown but could be significantly different than the 0.31 in the previous item;
  • \(P(\textrm{landline}) = 0.67\) is taken as 67% and is mostly likely less than this in 2014;
  • \(P(\textrm{no landline}) = 1-0.67 = 0.33\) is rising as individuals opt-out of having a landline.

Putting all of this information together gives the expression
\[
P(\textrm{candidate}) = 0.2077 + 0.33\alpha.
\]

There are three extreme cases that allow one to see just how much the reported numbers could vary.

  1. If everyone without a landline does not support the candidate then \(\alpha = 0\). In this case, \(P(\textrm{candidate}) = 0.2077\) or about 21% support.
  2. If everyone without a landline supports the candidate then \(\alpha = 1\). For this case, \(P(\textrm{candidate}) = 0.2077 + 0.33\) or about 54% support.
  3. If those without a landline just as likely to support the candidate then \(\alpha = 0.31\) and \(P(\textrm{candidate}) = 0.2077 + 0.1023 = 0.31\) reproducing the reported 31%.

What are the take aways here? If you believe that those individuals without a landline have the same average viewpoint as those that have landline then the raw telephone poll surveys will suffice. If you think that perhaps there may be a difference, then this effect will swamp the results. The simple example here gives a range of 21% to 54% depending on the unobserved probability \(\alpha\). With probability it is not only what you can measure that is important, but recognizing what you are not measuring that yields accurate results.

If you got this far then I have a surprise. Since we know what was reported in the Ipsos Reid poll we can compare it to the Research Forum values to extract the most likely voting demographic of this unobserved group. That in itself will prove to be quite interesting and moreover, it can be used to help compensate future polls from the Research Forum. Stay tuned, and hey, isn’t math fun?

Confirming a New Years resolution

Posted: 9th February 2014 by admin in Blog

Over the last year I have been taking note of some of the stories in the various science/tech news feeds with of hope of eventually finding the time to expound on them in a format such as this. There is really no perfect time to commence such an activity and over the winter break I made a promise to myself that I’d try to blog at least twice a month. This is the first of that series so let me take you on a short tour of some of the stuff over the last year that I found fascinating.

bbc4Our first stop is the mathematics of zero and a radio episode by Alex Bellos who travels to India in search of absolutely nothing, well the origins of zero in actual fact. I continue to find it fascinating that the ancient religions of Jainism, Hinduism and Buddhism contributed so much to mathematics and yet much of this history remains unattributed and not taught in mathematics classes in the western world.

youtubeIf I’ve still held your interest then you may also consider looking at the talk given by Robin Wilson of Gresham College entitled ‘Early Mathematics’ which reviews the time period from 2700BCE to 1100CE. This covers results from ancient Egypt, Mesopotamia, Greece, China, India, the Mayans, Islam and early results in Europe leading into the Renaissance period.

As I mentioned above, this is to be the first in a series of blogs that provide a window into what I find fascinating and where I see interesting cross-overs between mathematics and other disciplines. There are many blog entries under construction with topics as varied as “the mathematics of kid toys and carnival rides”, “predicting elections”, and “an insider view on what it is like as a mathematician to work with industry”.

I look forward to their imminent posting with anticipation.

Update: In the same theme of mathematics that has been lost and rediscovered, A Prayer for Archimedes describes a long-lost text by the ancient Greek mathematician showing that he had begun to discover the principles of calculus long before it was developed by Leibniz and Newton many centuries later.

Least squares and pseudo-inverses

Posted: 5th February 2014 by seanmathmodelguy in Lectures

To appreciate the connections between solutions of the system \(Ax=b\) and least squares, we begin with two illustrative examples.

Overdetermined systems:

\(A\) is \(m \times n\) with \(m > n\). In this case there are more equations than unknowns, \(A^{\top}A\) is \(n\times n\) and \(AA^{\top}\) is \(m\times m\). The connection with the pseudo-inverse is that
\[
x = (A^{\top}A)^{-1}A^{\top}b = A^+b
\] is the particular \(x\) that minimizes \(\|Ax-b\|\).
Example: Solve the system \(
(2\ 3\ 4\ 6)^\top(x_1) = (4\ 6\ 8\ 10)^\top.\)
Using \(A^{\top}A = (2\ 3\ 4\ 6)(2\ 3\ 4\ 6)^\top=65\),
\[
x = (A^{\top}A)^{-1}A^{\top}b = \frac{1}{65}\begin{pmatrix}2 &3 &4 &6\end{pmatrix}\begin{pmatrix}4\\6\\8\\10\end{pmatrix} = \frac{118}{65}.
\] Considering the least squares problem, \(\|Ax – b\|^2 = (2x-4)^2+(3x-6)^2+(4x-8)^2+(6x-10)^2\) which has a minimum at \(2(2x-4)2+2(3x-6)3+2(4x-8)4+2(6x-10)6 = 0\) so that
\[
x = \frac{8+18+32+60}{4+9+16+36} = \frac{118}{65}.
\]

Underdetermined systems:

\(A\) is \(m \times n\) with \(m < n\). In this case there are more unknowns than equations, and the connection with the pseudo-inverse is that \[ x = A^{\top}(AA^{\top})^{-1}b = A^+b \] is the particular \(x\) that minimizes \(\|x\|\) amongst all of the possible solutions. Example: Solve the system \((1\ -1\ 0)(x_1\ x_2\ x_3)^\top = (2).\)
In this case we find \(AA^{\top} = (1\ -1\ 0)(1\ -1\ 0)^\top = 2\) and
\[x = A^{\top}(AA^{\top})^{-1}b = \begin{pmatrix}1\\-1\\0\end{pmatrix}\frac{1}{2}2 = \begin{pmatrix}1\\-1\\0\end{pmatrix}.
\]

To see the connection with the least squares problem, the system is row reduced (it is already row reduced) and we can choose \(x_1\) as the pivot with \(x_2, x_3\) as free parameters. Letting \(x_2 = s\) and \(x_3 = t\) we have \(x_1 = 2+s\) so that the general solution is
\[x = \begin{pmatrix}2+s\\ s\\ t\end{pmatrix} = \begin{pmatrix}2\\0\\0\end{pmatrix} + s\begin{pmatrix}1\\1\\0\end{pmatrix} + t\begin{pmatrix}0\\0\\1\end{pmatrix}
\] for any \(s,t\in\mathbb{R}\). Considering \(\|x\|\), we have \(\|x\|^2 = (2+s)^2 + s^2 + t^2\) which has a minimum at \(2(2+s)+2s = 0\) and \(2t = 0\) or \(t = 0, s = -1\) giving \(x = (1\ -1\ 0)^\top\) as before.

Symmetric matrices:

Decomposing the structure of the matrix \(A\) can help understand the resulting solutions and in the case of a symmetric matrix, the eigenvectors form an orthogonal set which allows one to expand \(A = UDU^\top\) where \(U\) is the orthogonal matrix \((U^{-1}=U^\top)\) with the eigenvectors as columns and \(D\) a diagonal matrix with the corresponding eigenvalues as the diagonal elements. Also, the eigenvalues could be anything, but if we specify that we want a non-negative definite matrix then the eigenvalues must be greater than or equal to zero.

Example: Write \(A_1 = \begin{pmatrix}2 &1\\ 1 &2\end{pmatrix}\) in the form \(A_1 = UDU^\top\).
A quick calculation gives \(\lambda_1 = 3\) with corresponding eigenvector \(\mathbf{\xi}^{(1)} = \frac{1}{\sqrt{2}}(1\ 1)^\top\) and a second pair \(\lambda_2 = 1, \mathbf{\xi}^{(2)} = \frac{1}{\sqrt{2}}(1\ -1)^\top.\) This gives the decompositon \[A_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}\begin{pmatrix}3 &0\\ 0 &1\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}^\top.\] Using this decomposition the inverse of a matrix is easily computed by replacing the diagonal elements of \(D\) with their reciprocals so that for example \[A_1^{-1} = \frac{1}{3}\begin{pmatrix}2& -1\\ -1& 2\end{pmatrix} = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}\begin{pmatrix}{\small\frac{1}{3}} &0\\ 0 &1\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}^\top.\]

What happens if one of the eigenvalues is zero? This will not effect the decomposition of \(A\), in fact if we decompose \(A_2 = \begin{pmatrix}1&-1\\-1&1\end{pmatrix}\) with eigenvalues \(\lambda_1 = 2, \lambda_2 = 0\) and corresponding eigenvectors \(\xi^{(1)} = \frac{1}{\sqrt{2}}(1\ -1)^\top\), \(\xi^{(2)} = \frac{1}{\sqrt{2}}(1\ 1)^\top\) then \[ A_2 = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}\begin{pmatrix}2 &0\\ 0 &0\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}^\top.\] Notice that this matrix is not invertible since one of the eigenvalues is zero. But what if we took the reciprocal of all the nonzero diagonal elements to form \[A_3 = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}\begin{pmatrix}{\small\frac{1}{2}} &0\\ 0 &0\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}^\top = \frac{1}{4}\begin{pmatrix}1& -1\\ -1 &1\end{pmatrix}.\] Sadly, \(A_3\) is not the inverse of \(A_2\). This would be very surprising since \(A_2\) is not invertible. So what then is \(A_3\)? Well, \(A_2\) and \(A_3\) are pseudo-inverses. This can be generalized later to non-square matrices by constructing the SVD of a matrix. As a refresher please watch the following video:

Here is how the pseudo-inverse is connected to the solution of a least-squares problem. If the linear system \(Ax = b\) has any solutions, then they will have the form\[x = A^+ b + \left(I – A^+ A\right)\xi\] for some arbitrary vector \(\xi\). Multiplying by \(A\) on the left gives that condition that (\(A = A A^+ A\) is a property of \(A^+\))\[Ax = A A^+ b + \left(A – A A^+ A\right)\xi = A A^+ b = b.\] So for a any solution to exist we need the admissibility condition \(AA^+ b = b\). For linear systems \(A x = b\) with non-unique solutions as in the underdetermined case, the pseudo-inverse may be used to construct the solution of minimum Euclidean norm \(\|x\|\) among all solutions. If \(A x = b\) is admissible (\(AA^+ b = b\)), the vector \(y = A^+b\) is a solution, and satisfies \(\|y\| \le \|x\|\) for all solutions.

One final example should tie this all together.
Example: Consider finding the solution to \(A_2x = b\) that minimizes \(\|x\|\).
Row reducing \(A_2x = b, b = (b_1\ b_2)^\top\) reveals the \(b_2=-b_1\) to ensure a solution, and
\[
\begin{pmatrix}x_1\\x_2\end{pmatrix} = \begin{pmatrix}b_1\\0\end{pmatrix} + s\begin{pmatrix}1\\1\end{pmatrix}
\] for any \(s\in\mathbb{R}\). Continuing, \(\|x\|^2 = (b_1+s)^2+s^2\) which is minimized when \(2(b_1+s)+2s=0\) or when \(s = -\frac{b_1}{2}\) so that the solution is
\[
\begin{pmatrix}x_1\\x_2\end{pmatrix} = \begin{pmatrix}b_1\\0\end{pmatrix} – \frac{b_1}{2}\begin{pmatrix}1\\1\end{pmatrix}=\frac{b_1}{2}\begin{pmatrix}1\\-1\end{pmatrix}.
\] Using the pseudo-inverse of \(A_2\), \(A_3 = A_2^+\) gives the admissibility condition \(A_2A_2^+A_2b = b\) which simplifies to \(b_2 = -b_1\) and the solution
\[
\begin{pmatrix}x_1\\x_2\end{pmatrix} = A_2^+b = \frac{1}{4}\begin{pmatrix}1&-1\\-1&1\end{pmatrix}=\frac{1}{4}\begin{pmatrix}b_1-b_2\\-b_1+b_2\end{pmatrix}=\frac{b_1}{2}\begin{pmatrix}1\\-1\end{pmatrix}.
\]

Summary

At the beginning of this post, the Moore-Penrose pseudo-inverse generalized the idea of an inverse to non-square matrices and another notion of pseudo-inverse arose for symmetric matrices that have at least one zero eigenvalue. This second notion can be generalized (using the SVD) to non-square matrices and matrices that are not symmetric where the eigenvectors are not guaranteed to form an orthonormal set. In all cases, the pseudo-inverse is implicitly tied to the notion of finding solutions with minimal norm.

Intermediate Value Theorem – Limits and Continuity

Posted: 12th February 2013 by seanmathmodelguy in Lectures

Intermediate Value Theorem

To begin with, let’s start with the basic statement of the theorem.

Theorem

If \(f(x)\) is continuous on a closed interval \([a,b]\) and \(N\) is any number \(f(a) < N < f(b)\) then there exists a value \(c \in (a,b)\) such \(f(c) = N\).m0701

The illustration corresponding to the theorem is to the right and indicates that there may be more than one possible value for \(c\). The important restrictions are that

    • \(f(x)\) be continuous and
    • the interval \([a,b]\) is closed.

The primary purpose of this theorem is to indicate when numbers with various properties exist.

Steps

1. Make sure the function, \(f(x)\) is continuous.
2. Create a new function \(g(x)=f(x)-N\), replacing the function in this manner always makes the \(N\) in the theorem with respect to \(g\) equal to zero. So that \(g(c)=0\) when a correct value for \(c\) is determined.
3. Using 0 rather than the general \(N\), we need to find an \(a\) and a \(b\) so that either \(g(a)>0\) and \(g(b)<0\) or \(g(a)<0\) and \(g(b)>0\). The point is that the signs need to change.
4. Finding a change of sign confirms that there is a number \(c \in (a,b)\) that allows \(g(c)=0\) or \(f(c)=N\).

Examples

A. Suppose we have the function \(f(x) = x^2 – 4x\) and we wish to show there is a number \(x_*\) such that \(f(x_*) = 1\).
1. Notice that since \(f(x)\) is continuous, the intermediate value theorem can be used.
2. Let \(g(x) = f(x) – 1 = x^2 – 4x – 1\) so that \(g(x) = 0\) when the correct \(x_*\) is determined.
3. Choosing \(a = 4\) gives \(g(a) = -1 < 0\) and choosing \(b=5\) gives \(g(b) = 4 > 0\). There are of course many other possible value of \(a\) and \(b\). Note that \(a<b\).
4. Since a change in sign was found, there is a number \(c \in (4,5)\) such that \(g(c) = 0\) or equivalently, \(f(c) = 1\).

B. If \(f(x) = x^3-8x+10\), show there is at least one value of \(c\) for which \(f(c) = -\sqrt{3}\).

Since \(f(x)\) is continuous we just need to redefine the function (to make \(N = 0\)) and find values for \(a\) and \(b\). The new function is
\[
g(x) = f(x) + \sqrt{3} = x^3-8x+10+\sqrt{3}.
\] We need to find \(a\) and \(b\) so that \(g(x)\) changes sign. Let \(a=-4\) so that \(f(a) = -22+\sqrt{3} < 0\) and \(b=-3\) so that \(f(b) = 7+\sqrt{3} >0\). These choices for \(a\) and \(b\) are found by just trying different values in the function.

At any rate, using the intermediate value theorem we can conclude that there is a value \(c \in (-4,-3)\) such that \(g(c) = 0\) or \(f(c) = -\sqrt{3}\).