How do we define anonymity? statistics --> can't pinpoint who they are external data compromises supposedly anonymous databases (statistics) How do you formalize external data? How do we prove anything is actually semantically secure? If we can't tell the difference between protocol and garbage message passing (within a small e) then it is semantically secure. Can we do something like that in anonymity? k-anonymity: Know "Andy" is in set of k records but can't figure out which one of k "Andy" is. If our anonymity is not diverse, can use external knowledge to figure out some more about data set. l-diversity: Can only publish data if more than l records have a term... i.e. given healthy, healthy, healthy, flu, flu, aids and l = 2 could publish records with healthy but not flu or aids records m-invariants: Add counterfeit information. Dummy records to confuse attacker. Allows you to publish more. Newest approach: ex - Molly is shorter by 2 inches compared to the avg. height of ____ population. Whether or not she participates, this knowledge does not compromise her anonymity. If distribution does not change before and after subjects participation then no anonymity is endangered by her participation. Whenever we have statistical correlation between two datasets combine to provide better anonymity.