Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we add a warning about entropy? #40

Open
JensenPaul opened this issue Jan 24, 2020 · 1 comment
Open

Should we add a warning about entropy? #40

JensenPaul opened this issue Jan 24, 2020 · 1 comment

Comments

@JensenPaul
Copy link

In many cases entropy is a useful way to calculate how identifying fingerprintable surfaces are, but it can be misleading when applied to certain distributions. For example, in a population of a billion people where half are in one bucket and half are in singleton buckets, half are uniquely identifiable with only 16 bits of entropy.

I'd recommend adding a warning about this and mentioning it's always good when using entropy to check the percentage of the population in buckets of size less than n. You could also mention that differential privacy can be used to offer stronger guarantees of anonymity.

@npdoty
Copy link
Collaborator

npdoty commented Aug 5, 2020

Absolutely, we don't want a single "number of bits" to be the only consideration of entropy as that can be very misleading. Currently the document notes this re: entropy:

Consider both the possible variations and the likely distribution of values.

Could you suggest something with more detail about how we should think about entropy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants