How to make money without putting data at risk
The huge reserves of personal data records of Internet browsing, credit card purchases or information shared through social networks are becoming increasingly valuable asset for companies. These data can be analyzed to determine trends that guide business strategies, or sold to other companies for a small profit. But as personal data are analyzed and exchanged increases the risk that might reveal who we are and produce an unwanted invasion of privacy.
new mathematical technique developed at Cornell University (USA) could make large sets of personal data to be shared and analyzed to ensure that not compromised the privacy of any individual.
“We want to enable Facebook or the U.S. Census Bureau data analyzed without leaking sensitive information about people,” said Michael Hay , assistant professor at Colgate University (USA), who created the art while a researcher at Cornell and his colleagues John Gehrke, Edward Lui and Rafael Pass. “We also aim to help: we want the analyst to learn something.”
Companies often try to mitigate the risk that personal data held by it can be used to identify individuals, but these measures are not always effective. Both Netflix and AOL experienced it first hand when they published data supposedly “anonymous” for anyone to analyze. Several researchers showed that both sets of data could emerge from anonymity compared to other benchmarks in several places.
“In practice techniques are being used quite adequate” to protect the privacy of users included in these data sets, says Hay. These techniques include removing the names and social security numbers or other data types.“People want to provide real protection,” says Hay, adding that those responsible for the data in some government agencies fear that submitted claims for failing to protect private information. “After talking with others in statistical agencies know that there is a demand for fear of privacy violations.”
In recent years, several researchers have worked on developing mathematical ways to ensure privacy.However, the most promising approach, known as differential privacy, has proved difficult to implement and usually requires add noise to a data set, which makes this less useful.
The Cornell group proposes an alternative approach called privacy based on the mix in the crowd ( crowd-blending privacy ). Is to limit how they can be analyzed a data set to ensure that any individual record can not be distinguished from a multitude of different records, and deleting records from the analysis if this can not be guaranteed.
This is not necessary to add noise to a data set as the set is large enough group showed that the mixture into the crowd approaches the statistical power of differential privacy. “Since the crowd is a mix of privacy less stringent standard, we expect it possible to write algorithms satisfying,” says Hay. “This could create new uses for the data,” he adds.
The new technique “provides an alternative definition of privacy is interesting and potentially very useful,” saysElaine Shi , assistant professor at the University of Maryland, College Park (USA), which is also investigating ways to protect privacy data sets. “Compared to the differential privacy, privacy based on the mix in the crowd would sometimes a useful tool got a lot more by introducing little or no noise,” he explains.
Shi added that in the future, research to ensure privacy should allow the responsibility for the protection of user data is no longer in the hands of software developers and their managers. “The underlying system architecture itself protect privacy, even when the code supplied by application developers can not be trusted,” he says. The research group is working on a Shi cloud computing system on that basis. It houses sensitive personal data and allows access but also carefully monitors the software that makes use of it.
Benjamin Fung , an associate professor at Concordia University (Canada), states that the mix of crowds is a useful idea, but consider that differential privacy can still be viable. His group has worked with a carrier of Montreal to implement a version of the differential privacy in a data set of traces of geolocation. Fung suggests that research in this area has to move to the implementation phase that approaches such as mixing and the other can be compared directly and finally put into practice.
There is agree that it is time to take action but also notes that the privacy protection will not prevent other practices that could be considered unpleasant. “You can meet these constraints and still get predictive correlations,” he says. This could result, for example, premiums for car insurance were established based on information about a person without any apparent relationship with your driving. “As the techniques to ensure privacy are adopted could be other concerns,” says Hay.