*Post by John*[Note: I'm not on-list; please CC me on replies]

Somebody suggested a two-ballot set to me with three candidates, and

it turns out you can mutate this set and achieve the same pairwise

votes.

[...]

*Post by John*I'm trying to identify a concise representation of a set of ranked

1. The pairwise race matrix (Win-Lose)

2. The matrix of appearances at each ordinal rank

1 2 3 4

A 0 1 1 0

B 1 0 0 1

C 0 1 1 0

D 1 0 0 1

It probably doesn't matter if you rank the truncated ballots by

counting all unranked as ties at that rank or by not counting them at

that rank.

If I understand correctly, you want to find a way of representing

ballots that takes less space than just listing the ballots directly.

Matrices like the positional matrix (how many ballots have A as first, B

as first, ..., D as fourth) and the pairwise matrix have the property

that the number of bits you need to store the data is a polynomial of

the number of candidates and the logarithm of the number of voters.

However, a pigeonhole argument shows that a data structure that

unambiguously defines a ballot set can't be polynomial in both the

number of candidates and log(number of voters). There must thus exist

two ballot sets that are different but look the same by pairwise and

positional matrices.

If we assume no truncation or equal-rank ballots and that there are c

candidates, then there is a mapping from an individual ranked ballot to

a number between 0 and c!, since there are c! possible ways to list c

candidates in order. (If we allow truncation and equal-rank, then there

will only be more ballots, so this is a conservative assumption.)

Recording a single ballot then requires log(c!) = O(c log c) bits. (All

logarithms from this point on are base-2 logarithms.) If the order of

the candidates and the voters matter, and there are v voters, then the

best you can do is to record each ballot, which will require v * c *

log(c) bits (up to some constant factor). This is exponential in log(c),

and thus neither positional nor pairwise matrices can accomplish this.

Another possibility is to have a c!-tuple, where each element counts how

many voters voted that way. Such a structure would require c! * log(v)

bits; so it's possible to get something that is polynomial in log(v),

but at the cost of no longer being polynomial in c. (Strictly speaking,

the log(v) term can be reduced by noting that not all the elements need

the full bit length because the sum over all of them is supposed to be v.)

If the order of the candidates don't matter, then that's equivalent to

assuming that the first ballot is A>B>C>D>... (you can always relabel

the candidates so that this is the case). That makes the first ballot

redundant and so the required space is (v-1) * c log c, but that's still

not polynomial in log(v).

If the order of the voters don't matter, then we have a stars-and-bars

problem. We have a c!-tuple (one nonnegative integer for each possible

ordering) whose elements sum up to v, so the total possible number of

ballot sets is v+c!-1 choose v. The number of bits required is the

base-2 logarithm of this. As far as I recall, the number of bits is

still exponential in log(v), although I don't have the exact

calculations available at the moment.

Here's a quick attempt to prove what I said. My logic or calculations

might be wrong; check them if you want to be sure:

First use Stirling's approximation: log(N choose K) ~= N log(N/K) +

(N-K) log(N/(N-K)), or N * (log2(N) - log2(K)) + (N-K) * (log2(N) -

log2(N-K)).

Inserting v+c!-1 for N and v for K, and letting x=c!-1 gives

f(x) = (v+x) * (log(v+x) - log(v)) + x * (log(v+x) - log(x))

Taking the limit as x goes to infinity of f(x)/(x * log(x)) gives a

result of 1. This means that f(x) is O(x log x). Thus there's a way of

storing the data that requires O(c! log c!) space, which is not very

useful since it's not polynomial in c.

Alternatively, v+c!-1 choose v = v+c!-1 choose c!-1. Inserting v+c!-1

for N and c!-1 for K, and letting x = c!-1 as before gives

g(v) = (v+x) * (log(v+x) - log(x)) + v * (log(v+x) - log(v))

Since this is the same as f(x) only with x and v flipped, g(v) is O(v

log v).

So we can either have a data structure that is exponential in c (being

O(c! log c!)) or one that's exponential in log(v) (being O(v log v)),

but we can't have one that's polynomial in both c and log(v).

----

Election-Methods mailing list - see http://electorama.com/em for list info