(*) Anyone who can run Python code, that is.
Below is a short program I cobbled together to calculate the combined probability of a given number of events happening, where each event has its own probability. If they all had the same probability, this would be a simple binomial expansion, but with each having its own probability it becomes much less straightforward. The algorithm is very brute force: for N events, it runs through every one of the 2
N possible outcomes, calculates its probability, and accumulates the total probability for each possible number of events being true. If anyone has a better algorithm, let me know.
For a practical example, this allows a very simplistic model of multiple Senate races, although you could use it for any other elections, football games, or whatever. The events are defined in an input file called (imaginatively enough) "data". This has one line per event, each with two fields. The first is a free-form text label, and the second is a probability expressed as a decimal between 0 and 1, with as many decimal places as you like. For example, to model the 12 closest (IMO) Senate races, my data file looks like this:
MO-McCaskill .55
TN-Bredesen .35
NV-Rosen .65
AZ-Sinema .65
FL-Nelson .60
IN-Donnelly .60
MT-Tester .75
WV-Manchin .75
TX-O'Rourke .40
ND-Heitkamp .35
WI-Baldwin .80
MS-Espy .30
I assigned McCaskill a 55% probably of winning, Bredesen 35%, etc. Note that you should make your probabilities all be for a consistent outcome (i.e. all for the D winning, or all for the R winning,) in order to see the likelihood of a minimum threshold toward that outcome. In this example, the Democrats need to win 9 of the 12 races for a majority. With this input file, running the model yields:
There are 12 events assigned the following probabilities:
MO-McCaskill: 55.0%
TN-Bredesen: 35.0%
NV-Rosen: 65.0%
AZ-Sinema: 65.0%
FL-Nelson: 60.0%
IN-Donnelly: 60.0%
MT-Tester: 75.0%
WV-Manchin: 75.0%
TX-O'Rourke: 40.0%
ND-Heitkamp: 35.0%
WI-Baldwin: 80.0%
MS-Espy: 30.0%
Results:
Exactly 0: 0.0%, at least 0: 100.0%
Exactly 1: 0.0%, at least 1: 100.0%
Exactly 2: 0.3%, at least 2: 100.0%
Exactly 3: 1.8%, at least 3: 99.6%
Exactly 4: 6.0%, at least 4: 97.8%
Exactly 5: 13.6%, at least 5: 91.8%
Exactly 6: 21.6%, at least 6: 78.2%
Exactly 7: 24.1%, at least 7: 56.6%
Exactly 8: 18.7%, at least 8: 32.5%
Exactly 9: 9.8%, at least 9: 13.8%Exactly 10: 3.3%, at least 10: 4.0%
Exactly 11: 0.6%, at least 11: 0.7%
Exactly 12: 0.1%, at least 12: 0.1%
So using these input values, the probability of a Democratic majority (they win at least 9 of the 12) is only 13.8%. The nice thing about this is that you can tweak the probabilities and generate your own results. Keep in mind that this is a very simplistic model! For one thing, it treats all the events as independent with no correlation between them, which is somewhat unrealistic.
Also note that this algorithm is computationally intensive; it requires exponential time, specifically O(N*2
N). So this example with 12 events requires about 50,000 steps. This is no problem; it completes in under a second on my test system (a fast Linux box). But increasing it to 15 would require about 500,000 steps, and to 20 would be about 20 million. All 35 Senate races would require about 1 trillion steps!
The Python code is below. To play with it, save to a file called "model.py" or whatever (excluding the "begin code" and "end code" lines), create your input file "data" as defined above, and run it with "python model.py" (at least on Linux, I'm not sure on Windows.) Constructive comments are welcome; nitpicks on my coding style are not.
![Wink](https://talkelections.org/FORUM/Smileys/classic/wink.gif)
==== begin code
label = []
ptrue = []
pfalse = []
index = 0
with open("data","r") as f:
data = list(f)
for line in data:
fields = line.split()
label.append(fields[0])
try:
prob = float(fields[1])
except:
print("Invalid data line: {}".format(line))
if (prob <= 0.0) or (prob >= 1.0):
print("Invalid probability at line: {}".format(line))
exit(1)
ptrue.append(prob)
pfalse.append(1-prob)
n = len(data)
print("There are {} events assigned the following probabilities:".format(n))
print("")
for lbl,p in zip(label,ptrue):
print("{}: {:.1%}".format(lbl,p))
bitmask = [1<<i for i in range (n)]
pn = [0.0 for i in range (n+1)]
for index in range(1<<n):
nset = bin(index).count("1")
p = 1.0
for bit in range(n):
if index & bitmask[bit]:
p *= ptrue[bit]
else:
p *= pfalse[bit]
pn[nset] += p
print("")
print("Results:")
print("")
pcu = 1.0
for n in range(n+1):
print("Exactly {}: {:.1%}, at least {}: {:.1%}".format(n,pn[n],n,pcu))
pcu -= pn[n]
==== end code