The Voters Dilemma

Posted: 20th September 2014 by seanmathmodelguy in Blog

For the last few years the city of Toronto has proved to be a very complex political arena. Attempting to predict election outcomes while quantifying the overlapping federal, provincial and municipal mandates has proven to be virtually impossible. Seemingly contradictory polls and the rhythmic beat of political rhetoric only succeed in obfuscating the issues for voters.

As an applied mathematician I found the issue of contradictory polls fascinating and by using the variation between polls, I was able to extract the most likely voting distribution of those not being polled. A key point in that analysis is realization that many rapid polls can skew the demographic of those polled, giving the false impression that views held by a precious few are applicable throughout Toronto.

The recent successful prediction of the US presidential election in contraction to many pundits is a testament to the predictive power of a well posed model and the luxury of a vast number of polls that can be systematically weighted according to their historically proven reliability. Unfortunately the mathematical theory of this approach falters when applied to the mayoral race in the city of Toronto due to a lack of data. With a significantly smaller number of polls, reconstruction of the true voting distribution is still possible but it must be done in a smarter way.

In my quest to attempt to build a prediction model for the mayoral race I have made some progress and had some insight as to some of the components that would be required. With respect to municipal politics in Toronto, one must contend with 44 virtually independent wards with their own unique set of issues. Prediction schemes that do not take this into account will simply not capture the multifaceted viewpoints presented at city council. If we also assume that voters are reasonable and only change their allegiance at isolated times then we nearly have a well-posed problem. What remains is to model a mechanism that instills changes in voting patterns. For this, inferring voter agency is key and is nuanced through how well an individual believes their issues are represented at city council contrasted with the block voting patterns of city councillors. The challenge is to treat the voting prediction as a hidden distribution that is simultaneously able to optimally recapture polling results while remaining faithful to the social-political reality of Toronto.

Direct evidence of just how contentious the voting public can be was made abundantly clear to me by watching a poll about policies be subverted into a conspiracy. Basically a textbook example of the politics of paranoia.

Searching for solutions that optimally resolve seemingly contradictory information rather than focussing on the contractions directly is a common theme in mathematics. With all mathematical models, they are only as good as the quality of the data they hope to model. By being well-informed of the issues and open to all sides of the debate the true voting distribution of Toronto can be revealed.

Please comment, I’d like to hear your thoughts on these issues.

Putting the Zero in P\(\emptyset\)LL

Posted: 29th August 2014 by seanmathmodelguy in Blog

Toronto voters are confused. Discussions concerning the veracity of landlines to cell phones in polling circles and questions about just what these numbers mean permeate the media. Strategic voting is being seriously discussed and individuals are literally twisting their logic into a variety of fascinating shapes to simultaneously make their vote count and stomach policies that do not resonate with their internal values.

The initial question I had this morning was to try to explain how polling firms can unintentionally influence voters through their proprietary weighting schemes. From a mathematical standpoint, the level of complexity inherited by the voting process when strategic voting is modelled destabilizes the voting process. One ends up with a situation there “the tail of polling” begins to “wag the dog of the voter”. With this analogy, it is safe to infer that the polling infrastructure itself is in crisis.

What I tried to do this morning was to get a sense of how people felt about the policies that they were seeing developed by each of the four front-running candidates for mayor. They were asked to consider the following:


Ignoring all polls and simply based on the content of the candidates platform, who would you vote for in Toronto’s mayoral election?


Candidates were listed in alphabetical order by last name and I asked everyone I could to spread the word and express what they thought. It was posted to Facebook, Reddit and Twitter. With each posting, it was asked to be reposted, shared, and re-tweeted. The poll remained open from 11:15-1:15 and only used a simple safeguard based on IP address to prevent repeat voting.

What was exceedingly interesting to me was not the final result, but how the final voting distribution developed, and I learned that it is not analysis or quantification of results that is driving voters. It is sociology.

In the first fifteen minutes of the vote Chow and Soknacki garnered nearly all of the votes, but with only about 30 votes shared between them (Chow 6, Ford 3, Tory 2, Soknacki 21, Undecided 1), this was not statistically significant (roughly accurate to 1 in 5). It took about 30 minutes for the internet to realize that this poll was out there and people started to take notice. I was struck by how the distribution, up to this point, was amazingly stable with Soknacki’s platform the clear leader being followed distantly by a smoothly decreasing distribution of Chow then Ford then Tory and finally Undecided. This is when things started to get interesting.

Voting distribution at 11:30 pm, 15 minutes after going live.

Voting distribution at 11:30 pm, 15 minutes after going live.

After posting my poll on various websites I was contacted by an ardent and publically well-known anti-Ford supporter. His concern was that other candidates were not included in my poll. Now, keep in mind that I had not yet expressed the purpose of the poll, and I disregarded the question. This tweeter, who was the leader of the anti-Ford “Shirtless Horde,” then went on to attack my poll, as Soknacki’s numbers rose. After about 90 minutes, when it was apparent that Soknacki was maintaining this lead something truly fascinating occurred.
jf1131 jf1243

The next tweet from this person was a link to the poll to a vocal and active pro-Ford group and at this very point, Ford’s numbers began to rise. Let me repeat this for emphasis.

An anti-Ford leader joined forces with a pro-Ford organization!
What I was witnessing was a voter, feeling first disenfranchised about a process which was purposely made unclear, who then reacted by ensuring at any cost that another candidate would not benefit from this process.

The fear of one person being in the lead, regardless of the fact that the numbers were inconsequential in my poll, was all that it took for this individual to ensure the retention of a mayor and in this action, inadvertently unravelling their months of hard work to prevent exactly this situation!

This single reactionary tweet caused a cascade in the polling where fear and a sense of disenfranchisement replaced a reflective comparison of the various platforms. Putting this into context, the candidates that were not included, are not included in mainstream polling, and most debates and what I had done was not dissimilar to recent polling efforts.


Rather than have another candidate that this person does not hold in high esteem lead a pointless poll, this person was willing to turn to a candidate who according to their actions is vastly different than their political position. That reaction, that path to a strategic voting plan, changed the voting outcomes of the poll.

The rest of the voting distribution remained invariant while Ford’s numbers started to rise. Concurrently, word of the poll started to get some traction on the twitter stream I was monitoring and as the tone became more divisive, Ford slowly closed on the substantial lead that Soknacki had built up over the previous hour. By 12:45 pm (90 minutes elapsed) it was essentially a dead heat. This time coincided with a link to the poll being posted to a pro-Rob Ford site. Soon after this Ford overtook Soknacki and never looked back. At 12:57 pm the counts were Chow 36, Ford 106, Tory 23, Soknacki 81 and Undecided 8 and the final distribution at 1:15 pm is displayed below.

Voting distribution at 12:45 pm, after 90 minutes at elapsed.

Voting distribution at 12:45 pm, after 90 minutes at elapsed.

The link found at 12:57 pm, 12 minutes after it had been posted.

The link found at 12:57 pm, 12 minutes after it had been posted.

Final voter distribution.

Final voter distribution.

At the end of the day what my experiment has revealed, is that for democracy to truly work, we need to allow ourselves to be active participants in democracy. This poll was presented to voters via social media with no true description of its purpose. Once the link was posted, it automatically triggered a strategic voting response to counteract fear, or sense of loss.

Democracy is not about reaction to a fear. Democracy is not a reaction to the supposed outcome as promoted. What democracy is, is an opportunity for you to ask elected leaders to represent your principles, to represent your vision of the future. To simply vote in reaction to polls which are malleable to various interpretations, requires you to step away from your democratic voice.

There is a very important historical reference to this. Rousseau, the last of the social contract theorists, believed that a democracy, such as that in England in his time, led the English people to believe they were free. He disagreed and felt they were greatly mistaken and that they were only truly free during an election of the members of Parliament. Once officials were elected, the populace was effectively enslaved by their choice. The English people only made paltry use of their moral civil freedom through politics, enacting them only in the brief moments of elections, and Rousseau believed this squandering of liberty warranted their ultimate loss of it.

Rousseau is asking us to use our democratic freedom, not simply in the act of voting, but through the act of being civically engaged. The person whom you vote for is not simply an \(X\) you put on a ballot; they are the person with whom you will work for the next four years to build the city, the province, and the nation in which you live. That person must be someone you can work with, and someone that will represent your voice in those instances when you cannot actively participate yourself.

Perhaps Rousseau is right, that we have a citizenry who only evokes their voice in the general assembly during elections, and are then enslaved through a loss of moral civil freedoms. Perhaps with making such little use of our liberties, we have effectively lost them. Personally, I would like to believe that with a resurgence of activism, of protest, both physically and in the virtual realm, that the voice of our moral freedom is on the rise. Through civil action, and through bringing together a multitude of voices into the public sphere, we are finding our liberty and moral freedom as we find our voice.

In a June 29, 2013 article of the Globe and Mail, Naheed Nenshi, mayor of Calgary, summed this up succinctly with the following

We as citizens have the power to take people from devastation to hope.


The story does not really end here for the poll this afternoon. It seems that in the wake of of the tipping point experienced in the polling exercise and in the afterglow of slaying imaginary dragons a conspiracy has taken root. As this is evolving by the minute, I leave it as an amusing homework—I’m a prof, I can’t help myself—to go check it out for yourself.

If I could just take one final moment of your attention. Reflect on my initial question in the poll. This is important, this will be on the test…


Ignoring all polls and simply based on the content of the candidates platform, who would you vote for in Toronto’s mayoral election?


Repeat it, repeat it to your kids, repeat it to your significant others, repeat it to yourself. When this is the question you ask yourself when you are staring at that blank voting sheet, and only then will you be practicing democracy. Think on that.


For more on the social and political analysis of what transpired this afternoon refer to the Philosopher of Write.

I have attempted here hide the underlying mathematics since let’s be honest, I’m in a minority for the joy I feel with respect to this. At any rate, there is a rich mathematical structure at play here that predicts that this poll would have ended up in a two-way race no matter how the participants behaved. If you would like to read more about this then consider the follow extra readings.

1) David P. Myatt (2001) Strategic Voting Incentives in a Three Party System
2) Mark Frey (2007) Duverger’s Law Without Strategic Voting
3) Ken Kawai and Yasutora Watanabe (2012) Inferring Strategic Voting

Poll at the Forum (part 2): under the hood

Posted: 24th March 2014 by admin in Blog

In a previous blog posting I considered that the disparity between the Ipsos Reid and Forum Research results may be due to the methodology in how this polling is done. In this blog I will detail how I arrived at the final result.

When we attempt to find a solution that matches the polling results exactly a solution is possible only if \(p=0\). To include other values of \(p\) we have to relax this notion and instead consider solutions that match as close as possible in some sense. That is, to find \(\vec{\alpha}\) such that some measure of distance between the Ipsos Reid and Forum Research results are minimized. Mathematically we can say that we are looking for\[\vec{\alpha}^* = \underset{\vec{\alpha}}{\operatorname{argmin}}\| \vec{b}_{\textrm{I}}-(p\vec{b}_{\textrm{F}}+(1-p)\vec{\alpha}\|,\]where \(\vec{b}_{\textrm{I}}=(0.36,0.20,0.28,0.13,0.03,0)^\top\) and \(\vec{b}_{\textrm{F}}=(0.31,0.31,0.27,0.16,0.02,0.03)^\top\) are the Ipsos Reid and Forum Research data from the table. The double vertical bars mean that we are taking a norm and there are a number of choices that could be made. For convenience we take the Euclidean norm. Other notions of distance such as the Manhattan norm or the max norm could be used. In a finite dimensional vector space (in our case 6-dimensional for the 6 lines in the table) all these these norms are equivalent but they are typically much more computationally intensive and may require the use of sub-differential algorithms (quantifying change at points of non-differentiability). Finally the notation \(\operatorname{argmin}\) denotes that we want to minimize something and that rather than know what this minimum is, we are interested in extracting where the minimum occurs.

However, there are also a number of constraints. In particular, each of the \(\alpha_i\) is a proportion and taken together, they should give 100%. This translates into the constraints that \(0 \le \alpha_i\le 1 \ \forall i\) and \(\alpha_1+\alpha_2+\alpha_3+\alpha_4+\alpha_5+\alpha_6=1\). Each of these constaints, all 13 of them give a separate condition. Defining \begin{align*}f(\vec{\alpha};p) &= \| \vec{b}_{\textrm{I}}-(p\vec{b}_{\textrm{F}}+(1-p)\vec{\alpha}\|_2^2,\end{align*}and\begin{align*}g_{1,i}(\vec{\alpha}) &= -\alpha_i \le 0,&g_{2,i}(\vec{\alpha}) &= \alpha_i – 1 \le 0,&
h(\vec{\alpha}) &= \sum_{i=1}^6\alpha_i – 1 = 0
\end{align*}allows us to state the problem in a standard form.

For any \(0\le p\le 1\) find \[\vec{\alpha}^* = \underset{\vec{\alpha}}{\operatorname{argmin}}f(\vec{\alpha};p)\] subject to\begin{align*}g_{1,i}(\vec{\alpha})&\le 0, & g_{1,i}(\vec{\alpha})&\le 0, & h(\vec{\alpha})&=0.
\end{align*}This is known as a nonlinear programming problem in the optimization literature and there are a number of algorithms to efficiently solve this problem numerically. In our case the function \(f\) is strictly convex (being a norm), \(g_{i,j}\) is linear and therefore convex and \(h\) is affine. These conditions ensure that for each value of \(p\) there is a unique optimal distribution that can be found.

The necessary condition for optimality are known as the Karush-Kuhn-Tucker (KKT) conditions and they take the following form. If \(\vec{\alpha}\) is a nonsingular optimal solution of our problem, then there exist multipliers \(\mu_{1,i}, \mu_{2,i}, \lambda\) such that \begin{align*}\nabla_{\vec{\alpha}}f(\vec{\alpha};p) + \sum_{i=1}^6\mu_{1,i}\nabla_{\vec{\alpha}}g_{1,i}(\vec{\alpha}) + \sum_{i=1}^6\mu_{2,i}\nabla_{\vec{\alpha}}g_{2,i}(\vec{\alpha}) + \lambda\nabla_{\vec{\alpha}}h(\vec{\alpha})&=0,\\ \textrm{for} \ i = 1,2,\ldots,6: \quad \mu_{1,i}g_{1,i}(\vec{\alpha})=0,\quad \mu_{2,i}g_{2,i}(\vec{\alpha})=0,\\ \textrm{for} \ i = 1,2 \ \textrm{and} \ j = 1,2,\ldots,6: \quad \mu_{i,j}\ge 0, \quad g_{i,j}(\vec{\alpha}) \le 0, \quad h(\vec{\alpha})&=0.\end{align*}The first line is a condition that will pick up any minimum and maximum values, the second line chooses which of the constraints are active and the third line filters out only those that are feasible. One of the roadblocks to a solution is the number of possible combinations of constraints that can be chosen. In this case there are \(2^{12} = 4096\) possibilities, although this can be reduced by using the structure of the problem. For example, if one of the \(\alpha_i = 1\) then all the other \(\alpha_i\) must be zero. A naive method would be to try all 4096 possibilities and then patch them together as \(p\) increases from 0 to 1.

Rather than sacrificing myself on this alter of 4096 possibilities, I first attempted to find a solution where \(0 < \alpha_i < 1\) (no equalities) so that \(\mu_{1,i}=\mu_{2,i}=0 \forall i\) but as was found earlier, this solution is only feasible if \(p=0\). What it also revealed is that for small positive \(p\), it was \(\alpha_6\) that became negative so I moved to the constraint \(\mu_{1,6}=0\) to force \(\alpha_6=0\) and deflate the problem to finding the remaining 5 \(\alpha_i\) values. This yields the partial solution \begin{align*}\vec{\alpha}(p) &= \frac{1}{1-p}\begin{pmatrix}0.36-0.316p\\0.20-0.316p\\0.28-0.276p\\0.13-0.066p\\0.03-0.026p\\0\end{pmatrix}, & 0\le p&\le \frac{0.20}{0.316}\simeq 0.6329,\end{align*} with \(\alpha_2=0\) being the terminating condition. This behaviour implied that the next patch should result from setting \(\mu_{1,2}=\mu_{1,6}=0\) to force \(\alpha_2=\alpha_6=0\) and continue the deflation process. Continuing,\begin{align*}\vec{\alpha}(p) &= \frac{1}{1-p}\begin{pmatrix}0.41-0.395p\\0\\0.33-0.355p\\0.18-0.145p\\0.08-0.105p\\0\end{pmatrix}, & 0.6329\le p&\le \frac{0.08}{0.105}\simeq 0.7619,\end{align*} with \(\alpha_5=0\) defining the upper extent of the domain,\begin{align*}\vec{\alpha}(p) &= \frac{1}{3(1-p)}\begin{pmatrix}1.31-1.29p\\0\\1.07-1.77p\\0.62-0.54p\\0\\0\end{pmatrix}, & 0.7619\le p&\le \frac{1.07}{1.17}\simeq 0.9145,\end{align*} with \(\alpha_3=0\) at the upper limit, \begin{align*}\vec{\alpha}(p) &= \frac{1}{1-p}\begin{pmatrix}0.615-0.625p\\0\\0\\0.385-0.375p\\0\\0\end{pmatrix}, & 0.9145\le p&\le \frac{0.615}{0.625}= 0.984,\end{align*} terminated by \(\alpha_1 \to 0\) and finally\begin{align*}\vec{\alpha}(p) &= \begin{pmatrix}0\\0\\0\\1\\0\\0\end{pmatrix}, & 0.984\le p&\le 1.\end{align*}

Concatenating all these cases together results in the figure displayed below. This mathematical technique is commonly used in inverse problem concerning deblurring, tomography and super-resolution. Essentially we can think of this as taking an x-ray of the total voting public and not just those that appear on the surface through their landline.


A Poll at the Forum (part 2)

Posted: 2nd March 2014 by seanmathmodelguy in Blog

In part one of this mini-series I talked about where statements like “this poll is considered accurate within 3 percentage points” come from and how it is possible that this is not at odds with the observed variability in the many polling results. For convenience let’s reconsider just how variable these polls are.

A mid-November Ipsos Reid poll has Chow at 36 percent, Tory 28 percent, Ford 20 percent, Stintz 13 percent, Soknacki 3 percent, undecided 0 percent. While in late February, a Forum Research poll has Chow at 31 percent, Tory 27 percent, Ford 31 percent, Stintz 6 percent, Soknacki 2 percent, Undecided 3 percent. Summarized in a table we have the following data.

Individual Ipsos Reid Forum Research
Olivia Chow 36% 31%
Rob Ford 20% 31%
John Tory 28% 27%
Karen Stintz 13% 6%
David Soknacki 3% 2%
Undecided 0% 3%

In part 1, I mentioned that what is really being measured in each one of these polls is not the support of the candidate by the whole population, but rather the support of the candidate by those that have been polled. If we suppose that the Ipsos Reid values represent the true distribution and that the Forum Research values are uncorrected due to an insufficient representation of individuals without a landline then a very interesting question to pose is

what is the voting distribution of this unpolled population?

Although much has happened in the intervening period, from mid-November to late February, I will assume that the underlying voting distribution of the population has remained essentially the same. Recall the recipe from part 1 to find the true proportion for a given candidate that
P(\textrm{candidate}) = P(\textrm{candidate}|\textrm{has landline})P(\textrm{landline})+P(\textrm{candidate}|\textrm{no landline})P(\textrm{no landline}).
\] (English translation: The probability of support for a candidate is the probability they are supported by an individual with a landline weighted by the probability of having a landline together with the support by an individual without a landline weighted by the probability of not having a landline.) With our suppositions, the ingredients to the recipe differ than what we had in part 1. For part 2 they are as follows:

  • \(P(\textrm{candidate}|\textrm{has landline})\) is the uncorrected (Forum Research) values in the table for a given candidate;
  • \(P(\textrm{candidate}|\textrm{no landline})\) is unknown proportion that we are attempting to extract;
  • \(P(\textrm{landline}) = p\) is taken as a variable \( 0 \le p \le 1\) and from the discussion in part 1, \(p\) is becoming smaller over time and is expected to be less than 0.67 in 2014;
  • \(P(\textrm{no landline}) = 1-p\) depends on the value of \(p\).

This gives 6 equations (one for each candidate) in seven variables so the solution has a single parameter which we choose to be \(p\). Denoting \(\alpha_1, \alpha_2, \alpha_3, \alpha_4, \alpha_5, \alpha_6\) as the probabilities of support, for those without a landline, for each of the candidates: Chow, Ford, Tory, Stintz, Soknacki, and undecided respectively, then a naive attempt is to simultaneously solve
0.36 &= 0.31p + \alpha_1(1-p)\\
0.20 &= 0.31p + \alpha_2(1-p)\\
0.28 &= 0.27p + \alpha_3(1-p)\\
0.13 &= 0.06p + \alpha_4(1-p)\\
0.03 &= 0.02p + \alpha_5(1-p)\\
0 &= 0.03p + \alpha_6(1-p)
\end{align*}for each \(\alpha_i\). Not all solutions are valid since we would like \(0\le p\le 1\) and \(0 \le \alpha_i\le 1\) for \(i = 1,2,\ldots,6\). With this constraints in place, the only solution is to set \(p=0\) corresponding to everyone having only a cell phone. In this degenerate case, the unobserved distribution is simply what the Ipsos Reid data indicates and the Forum Research results play no role. Essentially this degenerate solution is a result of asking for an exact match between the two polls.

The details of how I solved this problem (for the mathies) can be found elsewhere. For everyone else, let me tell you the solution. First we move from looking for an exact result to looking for a result that most closely matches the two polls while still observing all the constraints. The figure summarizes the results and there are two very interesting scenarios that show up.

Individual Ipsos Reid Forum Research \(p=76.2\%\) \(p=63.3\%\)
Olivia Chow 36% 31% 45.8% 43.6%
Rob Ford 20% 31% 0% 0%
John Tory 28% 27% 25.0% 28.7%
Karen Stintz 13% 6% 29.2% 24.0%
David Soknacki 3% 2% 0% 3.7%
Undecided 0% 3% 0% 0%

If we suppose that \(p=76.2\%\) of people are represented by a landline then the voting distribution of the non-polled that best approximates the Ipsos Reid results when combined with the Forum Research data is: Chow at 45.8 percent, Tory 25.0 percent, Ford 0 percent, Stintz 29.2 percent, Soknacki 0 percent, Undecided 0 percent. We suspect though that \(p\) is actually lower than this and taking \(p=63.3\%\) gives a slightly different result of: Chow at 43.6 percent, Tory 28.7 percent, Ford 0 percent, Stintz 24.0 percent, Soknacki 3.7 percent, Undecided 0 percent.

I was personally very surprised that there is no support at all for Rob Ford from the non-polled provided that \(p \ge 63.3\%\). I would have expected there to be some small residual but this is simply not borne out of the analysis. Of course for lower values of \(p\) (\(p < 63.3\%\)) support is found for Rob Ford but it is curious that this support is systematically included in the Forum Research data and not found at all within the optimal distribution of the non-polled until \(p\) is quite low. Furthermore, a low value of \(p\) simply confirms an inappropriate bias towards those with landlines in the Forum Research values. What does this mean? Well first, take the polling results with a grain of salt and second, it's fairly clear that the voting distribution of those with a landline and without a landline are significantly different especially concerning Rob Ford, Karen Stintz and Olivia Chow. The reduction in the Ford proportion in the non-polled is effectively evenly split between Stintz and Chow. Rob Ford in particular may have a Karl Rove moment where the numbers simply do not support the hype within the campaign bubbles. To those in the non-polled, get out and vote, your voice is clearly not being represented and is an integral part of the future of Toronto. Now let’s see how this optimal distribution could be found.

Fair warning, there be mathematics that lies below.