A simple election model that anyone(*) can use
       |           

Welcome, Guest. Please login or register.
Did you miss your activation email?
June 18, 2024, 07:06:06 AM
News: Election Simulator 2.0 Released. Senate/Gubernatorial maps, proportional electoral votes, and more - Read more

  Talk Elections
  Other Elections - Analysis and Discussion
  Congressional Elections (Moderators: Brittain33, GeorgiaModerate, Gass3268, Virginiá, Gracile)
  A simple election model that anyone(*) can use
« previous next »
Pages: [1]
Author Topic: A simple election model that anyone(*) can use  (Read 389 times)
GeorgiaModerate
Moderator
Atlas Superstar
*****
Posts: 33,264


Show only this user's posts in this thread
« on: October 10, 2018, 05:43:53 PM »

(*) Anyone who can run Python code, that is.

Below is a short program I cobbled together to calculate the combined probability of a given number of events happening, where each event has its own probability.  If they all had the same probability, this would be a simple binomial expansion, but with each having its own probability it becomes much less straightforward.  The algorithm is very brute force: for N events, it runs through every one of the 2N possible outcomes, calculates its probability, and accumulates the total probability for each possible number of events being true.  If anyone has a better algorithm, let me know.

For a practical example, this allows a very simplistic model of multiple Senate races, although you could use it for any other elections, football games, or whatever.  The events are defined in an input file called (imaginatively enough) "data".  This has one line per event, each with two fields.  The first is a free-form text label, and the second is a probability expressed as a decimal between 0 and 1, with as many decimal places as you like.  For example, to model the 12 closest (IMO) Senate races, my data file looks like this:

MO-McCaskill .55
TN-Bredesen .35
NV-Rosen .65
AZ-Sinema .65
FL-Nelson .60
IN-Donnelly .60
MT-Tester .75
WV-Manchin .75
TX-O'Rourke .40
ND-Heitkamp .35
WI-Baldwin .80
MS-Espy .30

I assigned McCaskill a 55% probably of winning, Bredesen 35%, etc.  Note that you should make your probabilities all be for a consistent outcome (i.e. all for the D winning, or all for the R winning,) in order to see the likelihood of a minimum threshold toward that outcome.  In this example, the Democrats need to win 9 of the 12 races for a majority.  With this input file, running the model yields:

There are 12 events assigned the following probabilities:

MO-McCaskill: 55.0%
TN-Bredesen: 35.0%
NV-Rosen: 65.0%
AZ-Sinema: 65.0%
FL-Nelson: 60.0%
IN-Donnelly: 60.0%
MT-Tester: 75.0%
WV-Manchin: 75.0%
TX-O'Rourke: 40.0%
ND-Heitkamp: 35.0%
WI-Baldwin: 80.0%
MS-Espy: 30.0%

Results:

Exactly 0: 0.0%, at least 0: 100.0%
Exactly 1: 0.0%, at least 1: 100.0%
Exactly 2: 0.3%, at least 2: 100.0%
Exactly 3: 1.8%, at least 3: 99.6%
Exactly 4: 6.0%, at least 4: 97.8%
Exactly 5: 13.6%, at least 5: 91.8%
Exactly 6: 21.6%, at least 6: 78.2%
Exactly 7: 24.1%, at least 7: 56.6%
Exactly 8: 18.7%, at least 8: 32.5%
Exactly 9: 9.8%, at least 9: 13.8%
Exactly 10: 3.3%, at least 10: 4.0%
Exactly 11: 0.6%, at least 11: 0.7%
Exactly 12: 0.1%, at least 12: 0.1%

So using these input values, the probability of a Democratic majority (they win at least 9 of the 12) is only 13.8%.  The nice thing about this is that you can tweak the probabilities and generate your own results.  Keep in mind that this is a very simplistic model!  For one thing, it treats all the events as independent with no correlation between them, which is somewhat unrealistic.

Also note that this algorithm is computationally intensive; it requires exponential time, specifically O(N*2N).  So this example with 12 events requires about 50,000 steps.  This is no problem; it completes in under a second on my test system (a fast Linux box).  But increasing it to 15 would require about 500,000 steps, and to 20 would be about 20 million.  All 35 Senate races would require about 1 trillion steps!

The Python code is below.  To play with it, save to a file called "model.py" or whatever (excluding the "begin code" and "end code" lines), create your input file "data" as defined above, and run it with "python model.py" (at least on Linux, I'm not sure on Windows.)  Constructive comments are welcome; nitpicks on my coding style are not. Wink

==== begin code
label = []
ptrue = []
pfalse = []

index = 0
with open("data","r") as f:
    data = list(f)
    for line in data:
        fields = line.split()
        label.append(fields[0])
        try:
            prob = float(fields[1])
        except:
            print("Invalid data line: {}".format(line))
        if (prob <= 0.0) or (prob >= 1.0):
            print("Invalid probability at line: {}".format(line))
            exit(1)
        ptrue.append(prob)
        pfalse.append(1-prob)

n = len(data)
print("There are {} events assigned the following probabilities:".format(n))
print("")

for lbl,p in zip(label,ptrue):
    print("{}: {:.1%}".format(lbl,p))

bitmask = [1<<i for i in range (n)]

pn = [0.0 for i in range (n+1)]

for index in range(1<<n):
    nset = bin(index).count("1")
    p = 1.0
    for bit in range(n):
        if index & bitmask[bit]:
            p *= ptrue[bit]
        else:
            p *= pfalse[bit]
    pn[nset] += p

print("")
print("Results:")
print("")

pcu = 1.0
for n in range(n+1):
    print("Exactly {}: {:.1%}, at least {}: {:.1%}".format(n,pn[n],n,pcu))
    pcu -= pn[n]
==== end code
Logged
Virginiá
Virginia
Administratrix
Atlas Icon
*****
Posts: 18,924
Ukraine


Political Matrix
E: -6.97, S: -5.91

WWW Show only this user's posts in this thread
« Reply #1 on: October 10, 2018, 05:54:09 PM »

Are you a programmer GM? Pacman
Logged
GeorgiaModerate
Moderator
Atlas Superstar
*****
Posts: 33,264


Show only this user's posts in this thread
« Reply #2 on: October 10, 2018, 06:00:28 PM »


Not as such, but I do some programming ancillary to my job.
Logged
DataGuy
Rookie
**
Posts: 217


Show only this user's posts in this thread
« Reply #3 on: October 10, 2018, 06:31:55 PM »

Interesting. I happen to be working on my own model that I will post shortly before Election Day. It would have predicted with almost frightening accuracy some of the 2016 Senate surprises, but that is of course no guarantee of similar success this year. Midterms can be strange.

My predictions will include probabilities, vote percentages, raw vote totals, and a range of possible outcomes.
Logged
Antonio the Sixth
Antonio V
Atlas Institution
*****
Posts: 58,526
United States


Political Matrix
E: -7.87, S: -3.83

P P
Show only this user's posts in this thread
« Reply #4 on: October 10, 2018, 07:03:44 PM »

One serious issue with your model is that it assumes Senate races are independent of each other. In fact, the past few cycles have shown that they tend to be highly correlated - if Democrats outperform the polls in one race, they are likely to outperform in others too (and vice versa). This makes the probability of extreme outcomes (either Democrats winning the Senate or Republicans picking up seats) a lot higher than what your numbers suggest.
Logged
GeorgiaModerate
Moderator
Atlas Superstar
*****
Posts: 33,264


Show only this user's posts in this thread
« Reply #5 on: October 10, 2018, 07:06:46 PM »

One serious issue with your model is that it assumes Senate races are independent of each other. In fact, the past few cycles have shown that they tend to be highly correlated - if Democrats outperform the polls in one race, they are likely to outperform in others too (and vice versa). This makes the probability of extreme outcomes (either Democrats winning the Senate or Republicans picking up seats) a lot higher than what your numbers suggest.

Oh, absolutely (and I noted that as a shortcoming in the OP).  But that would be much more difficult to account for, and all I was really aiming for was a simple bit of code that anyone could play with by tweaking the inputs and getting a quick result.
Logged
Antonio the Sixth
Antonio V
Atlas Institution
*****
Posts: 58,526
United States


Political Matrix
E: -7.87, S: -3.83

P P
Show only this user's posts in this thread
« Reply #6 on: October 10, 2018, 07:16:56 PM »

Fair enough. I just thought it'd be really cool if you could also add a correction variable for us to fiddle with. Tongue
Logged
America Needs a 13-6 Progressive SCOTUS
Solid4096
Junior Chimp
*****
Posts: 8,781


Political Matrix
E: -8.88, S: -8.51

P P P
Show only this user's posts in this thread
« Reply #7 on: October 10, 2018, 09:55:10 PM »

what are we supposed to use as the file extension for the data file?
Logged
America Needs a 13-6 Progressive SCOTUS
Solid4096
Junior Chimp
*****
Posts: 8,781


Political Matrix
E: -8.88, S: -8.51

P P P
Show only this user's posts in this thread
« Reply #8 on: October 10, 2018, 10:07:38 PM »

This is the 1st python file I ever managed to work. I am generally good with computer programming, but python is generally a blind spot.
Logged
💥💥 brandon bro (he/him/his)
peenie_weenie
Junior Chimp
*****
Posts: 5,545
United States


Show only this user's posts in this thread
« Reply #9 on: October 10, 2018, 10:29:06 PM »

Fair enough. I just thought it'd be really cool if you could also add a correction variable for us to fiddle with. Tongue

I don't follow poli-sci literature, so maybe this is super basic, but is there any consensus correlation structure between the states? I.e., have people found a way to cluster states together into patterns of similar voting? I'm sure this could be done by looking at underlying demographic data (e.g., percentage of the population white and college educated) but this could also be done probably with raw vote percentages/vote counts (I bet an off-the-shelf k-means cluster could do a servicable job of finding clusters of co-occurring states).

Which reminds me... you posted a model a while back, and you responded to a question I asked and I meant to follow up. I will dig up that thread one of these days and try to re-engage if you are still working on that model at all.



Is there an Atlas data science group? Could be interesting to get a super group together to build some serious models. I'd pitch in some... I'm too busy with my own stats and modeling (for my own studies, which are not politics) to go rummage for data and look into other people's models very closely, but with several people working together I'm curious what people on this board could put together. First serious debate would have to be whether to use R or Python though.
Logged
Antonio the Sixth
Antonio V
Atlas Institution
*****
Posts: 58,526
United States


Political Matrix
E: -7.87, S: -3.83

P P
Show only this user's posts in this thread
« Reply #10 on: October 10, 2018, 10:33:45 PM »

Feel free to bump that thread, yeah. I'm happy to answer more questions about my model, although it might take me more time now that classes have started.

I'm actually not sure myself how to model correlations. That's why I didn't include a specific probability of Dems winning the Senate in that thread - only an expected amount of seats, which is much easier to calculate and doesn't depend on correlations. I'm still out of my depth there. Definitely happy to pool resources, though.
Logged
adrac
adracman42
Jr. Member
***
Posts: 722


Political Matrix
E: -9.99, S: -9.99

Show only this user's posts in this thread
« Reply #11 on: October 11, 2018, 12:35:40 AM »

Fair enough. I just thought it'd be really cool if you could also add a correction variable for us to fiddle with. Tongue

I don't follow poli-sci literature, so maybe this is super basic, but is there any consensus correlation structure between the states? I.e., have people found a way to cluster states together into patterns of similar voting? I'm sure this could be done by looking at underlying demographic data (e.g., percentage of the population white and college educated) but this could also be done probably with raw vote percentages/vote counts (I bet an off-the-shelf k-means cluster could do a servicable job of finding clusters of co-occurring states).

I'm not informed as to the statistics, but Nate Silver discussed such a thing in their 2016 model.

This is one article where they discussed it: https://fivethirtyeight.com/features/election-update-north-carolina-is-becoming-a-backstop-for-clinton/

There's also some discussion of it in their 2016 model documentation: https://fivethirtyeight.com/features/election-update-north-carolina-is-becoming-a-backstop-for-clinton/.
Logged
💥💥 brandon bro (he/him/his)
peenie_weenie
Junior Chimp
*****
Posts: 5,545
United States


Show only this user's posts in this thread
« Reply #12 on: October 11, 2018, 08:20:32 AM »

Fair enough. I just thought it'd be really cool if you could also add a correction variable for us to fiddle with. Tongue

I don't follow poli-sci literature, so maybe this is super basic, but is there any consensus correlation structure between the states? I.e., have people found a way to cluster states together into patterns of similar voting? I'm sure this could be done by looking at underlying demographic data (e.g., percentage of the population white and college educated) but this could also be done probably with raw vote percentages/vote counts (I bet an off-the-shelf k-means cluster could do a servicable job of finding clusters of co-occurring states).

I'm not informed as to the statistics, but Nate Silver discussed such a thing in their 2016 model.

This is one article where they discussed it: https://fivethirtyeight.com/features/election-update-north-carolina-is-becoming-a-backstop-for-clinton/

There's also some discussion of it in their 2016 model documentation: https://fivethirtyeight.com/features/election-update-north-carolina-is-becoming-a-backstop-for-clinton/.

This is exactly what I had in mind, thank you. I don't have time to read it now but will try to read it some time later this week. Smiley

Feel free to bump that thread, yeah. I'm happy to answer more questions about my model, although it might take me more time now that classes have started.

I'm actually not sure myself how to model correlations. That's why I didn't include a specific probability of Dems winning the Senate in that thread - only an expected amount of seats, which is much easier to calculate and doesn't depend on correlations. I'm still out of my depth there. Definitely happy to pool resources, though.

Yes I may not bump it until after the election because I too am swamped with classes and paper revisions. Tongue

The more I thought about it last night I think the correlations I was talking about were different from what you were pointing out -- I was pointing out correlations in how states vote on a macro-level, but you were talking about correlations in polling error/how undecideds break in a single election, no? I bet correlations in polling error are well studied (although they seem so dependent on the context of a given election that it may be hard to pull out any meaningful results).

Off the top of my head I don't know a good parametric way to model correlations either, but I do think something that does k-means clustering (dividing states into k different groups which have a mean defined over some space which are close together) with some temporal autocorrelation (like an autoregressive model) that allows states to drift in and out of the groups over time seems like a good place to start. I don't know of those techniques exist but it seems simple enough that someone would have come up with something like it in the past.
Logged
Pages: [1]  
« previous next »
Jump to:  


Login with username, password and session length

Terms of Service - DMCA Agent and Policy - Privacy Policy and Cookies

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Page created in 0.249 seconds with 12 queries.