[EM] Identifying ranked ballot sets

Discussion:

John

2018-10-01 23:18:03 UTC

[Note: I'm not on-list; please CC me on replies]

Somebody suggested a two-ballot set to me with three candidates, and
it turns out you can mutate this set and achieve the same pairwise
votes.

BAC
CAB

LOSE
A B C
W A _ 1 1
I B 1 _ 1
N C 1 1 _

You'll note this set also derives from BCA ACB.

This expands to larger sets:

BACD = BDCA
DCAB = ACDB

LOSE
A B C D
W A _ 1 1 1
I B 1 _ 1 1
N C 1 1 _ 1
D 1 1 1 _

If we clear the first votes:

LOSE
A B C D
W A _ . . .
I B . _ . .
N C 1 1 _ 1
D 1 1 1 _

(B)DCA = (B)CDA
(A)CDB = (A)DCB

I'm trying to identify a concise representation of a set of ranked
ballots. At this time, I believe this may be:

1. The pairwise race matrix (Win-Lose)
2. The matrix of appearances at each ordinal rank

For BACD and DCAB, #2 is as such:

1 2 3 4
A 0 1 1 0
B 1 0 0 1
C 0 1 1 0
D 1 0 0 1

It probably doesn't matter if you rank the truncated ballots by
counting all unranked as ties at that rank or by not counting them at
that rank.
----
Election-Methods mailing list - see http://electorama.com/em for list info

Kristofer Munsterhjelm

2018-10-04 10:56:05 UTC

Permalink

Post by John
[Note: I'm not on-list; please CC me on replies]
Somebody suggested a two-ballot set to me with three candidates, and
it turns out you can mutate this set and achieve the same pairwise
votes.

[...]

Post by John
I'm trying to identify a concise representation of a set of ranked
1. The pairwise race matrix (Win-Lose)
2. The matrix of appearances at each ordinal rank
1 2 3 4
A 0 1 1 0
B 1 0 0 1
C 0 1 1 0
D 1 0 0 1
It probably doesn't matter if you rank the truncated ballots by
counting all unranked as ties at that rank or by not counting them at
that rank.

If I understand correctly, you want to find a way of representing
ballots that takes less space than just listing the ballots directly.

Matrices like the positional matrix (how many ballots have A as first, B
as first, ..., D as fourth) and the pairwise matrix have the property
that the number of bits you need to store the data is a polynomial of
the number of candidates and the logarithm of the number of voters.

However, a pigeonhole argument shows that a data structure that
unambiguously defines a ballot set can't be polynomial in both the
number of candidates and log(number of voters). There must thus exist
two ballot sets that are different but look the same by pairwise and
positional matrices.

If we assume no truncation or equal-rank ballots and that there are c
candidates, then there is a mapping from an individual ranked ballot to
a number between 0 and c!, since there are c! possible ways to list c
candidates in order. (If we allow truncation and equal-rank, then there
will only be more ballots, so this is a conservative assumption.)

Recording a single ballot then requires log(c!) = O(c log c) bits. (All
logarithms from this point on are base-2 logarithms.) If the order of
the candidates and the voters matter, and there are v voters, then the
best you can do is to record each ballot, which will require v * c *
log(c) bits (up to some constant factor). This is exponential in log(c),
and thus neither positional nor pairwise matrices can accomplish this.

Another possibility is to have a c!-tuple, where each element counts how
many voters voted that way. Such a structure would require c! * log(v)
bits; so it's possible to get something that is polynomial in log(v),
but at the cost of no longer being polynomial in c. (Strictly speaking,
the log(v) term can be reduced by noting that not all the elements need
the full bit length because the sum over all of them is supposed to be v.)

If the order of the candidates don't matter, then that's equivalent to
assuming that the first ballot is A>B>C>D>... (you can always relabel
the candidates so that this is the case). That makes the first ballot
redundant and so the required space is (v-1) * c log c, but that's still
not polynomial in log(v).

If the order of the voters don't matter, then we have a stars-and-bars
problem. We have a c!-tuple (one nonnegative integer for each possible
ordering) whose elements sum up to v, so the total possible number of
ballot sets is v+c!-1 choose v. The number of bits required is the
base-2 logarithm of this. As far as I recall, the number of bits is
still exponential in log(v), although I don't have the exact
calculations available at the moment.

Here's a quick attempt to prove what I said. My logic or calculations
might be wrong; check them if you want to be sure:

First use Stirling's approximation: log(N choose K) ~= N log(N/K) +
(N-K) log(N/(N-K)), or N * (log2(N) - log2(K)) + (N-K) * (log2(N) -
log2(N-K)).

Inserting v+c!-1 for N and v for K, and letting x=c!-1 gives

f(x) = (v+x) * (log(v+x) - log(v)) + x * (log(v+x) - log(x))

Taking the limit as x goes to infinity of f(x)/(x * log(x)) gives a
result of 1. This means that f(x) is O(x log x). Thus there's a way of
storing the data that requires O(c! log c!) space, which is not very
useful since it's not polynomial in c.

Alternatively, v+c!-1 choose v = v+c!-1 choose c!-1. Inserting v+c!-1
for N and c!-1 for K, and letting x = c!-1 as before gives

g(v) = (v+x) * (log(v+x) - log(x)) + v * (log(v+x) - log(v))

Since this is the same as f(x) only with x and v flipped, g(v) is O(v
log v).

So we can either have a data structure that is exponential in c (being
O(c! log c!)) or one that's exponential in log(v) (being O(v log v)),
but we can't have one that's polynomial in both c and log(v).
----
Election-Methods mailing list - see http://electorama.com/em for list info

John

2018-10-04 14:15:36 UTC

Permalink

On Thu, Oct 4, 2018 at 6:56 AM Kristofer Munsterhjelm

Post by Kristofer Munsterhjelm

[...]

Post by Kristofer Munsterhjelm
Recording a single ballot then requires log(c!) = O(c log c) bits. (All
logarithms from this point on are base-2 logarithms.) If the order of
the candidates and the voters matter, and there are v voters, then the
best you can do is to record each ballot, which will require v * c *
log(c) bits (up to some constant factor). This is exponential in log(c),
and thus neither positional nor pairwise matrices can accomplish this.

Order of voters doesn't matter; only that the set of ballots is
mathematically-identical.

Post by Kristofer Munsterhjelm
Another possibility is to have a c!-tuple, where each element counts how
many voters voted that way. Such a structure would require c! * log(v)
bits; so it's possible to get something that is polynomial in log(v),
but at the cost of no longer being polynomial in c. (Strictly speaking,
the log(v) term can be reduced by noting that not all the elements need
the full bit length because the sum over all of them is supposed to be v.)

[...]

Post by Kristofer Munsterhjelm
So we can either have a data structure that is exponential in c (being
O(c! log c!)) or one that's exponential in log(v) (being O(v log v)),
but we can't have one that's polynomial in both c and log(v).

Damn, okay, the math folks were right: I can only get to NP-hard and
unlikely, not perfect and recoverable.

There's one other saving throw then.

We can generate an SHA-512 hash from a representation constructed by
strict, consistent rules. Some states of this representation would be
illegal, so many collisions are eliminated.

Take a simple list of positional ballots (c! variations). You might
indicate 23 voters A>B>C and 17 C>D>B. If your representation as such
includes more than 64 bits of information, then you have SHA512
collisions.

If all bit combinations are possible, then any collision is valid. If
vote counts are 48 bits, then there are obviously a lot of 0 bits
because there aren't 300 trillion voters; likewise, if candidates are
8 bits, there obviously aren't 255 different candidates. The
non-modifiable bits impact the hash, reducing the likelihood of a
state having a collision with a valid state.

You can further reduce collisions by specifying the data:

* Pairwise results (polynomial);
* Number of times each candidate appears in each position (polynomial);
* Number of times each candidate is truncated (linear);
* Total number of ballots (non-scaling);
* SHA512 hash (non-scaling)

That gives you 2c^2+c+1+1 values to store, although the values aren't
all the same size (the hash is obviously bigger than the total number
of ballots).

2c^2 + c. If you want, you can use the number of times each candidate
is in a position by truncation (A>B>C, candidates D and on are in
position 4 by truncation), which is polynomial. 3c^2.

The likelihood of having similar mathematical relationships rapidly diminishes.

Of course, you could also just provide SHA2, SHA256, SHA512, and MD5,
since at this point we're relying on simultaneous collision as a
defense.

Still, for 10 candidates, with 65,000 voters per polling place, in
binary mode, a 10x10 matrix is 200 bytes. You're looking at 306 bytes
(assuming you don't eliminate the 20 bytes for the X vs. X positions
in the matrix, always zero) to store all five of those things, or QR
18 error correction level H (30% loss recoverable), 16 Q (25%), 13 M
(15%). Dense enough to display MANY on a 55-inch 4K display for a few
minutes while people take cell phone pictures, even with the text
SHA512 hash below.

Thanks for clarifying. I'll have to make some concessions, but this is fine.
----
Election-Methods mailing list - see http://electorama.com/em for list info