Cloud Computing Experts Detail Big Data Security and Privacy Risks
The information security practitioners at the Cloud Security Alliance know that big data and analytics systems are here to stay. They also agree on the big questions that come next: How can we make the systems that store and compute the data secure? And, how can we ensure private data stays private as it moves through different stages of analysis, input and output?
It’s the answers to those questions that prompted the group’s latest 39-page report detailing 10 major security and privacy challenges facing infrastructure providers and customers. By outlining the issues involved, along with analysis of internal and external threats and summaries of current approaches to mitigating those risks, the alliance’s members hope to prod technology vendors, academic researchers and practitioners to collaborate on computing techniques and business practices that reduce the risks associated with analyzing massive datasets using innovative data analytics.
“People are working on these challenges, the technical people and the academics. But they haven’t talked to each other” as much as they should, said Arnab Roy, one of the report’s contributors, who works as research staff member at Fujitsu Laboratories of America in Sunnyvale, Calif. For example, Roy said, until recently, data encryption experts have not been communicating with experts at infrastructure companies. “People are realizing now that new solutions are needed, solutions that integrate the aspects of big data, to come up with comprehensive solutions,” he added.
“Comprehensive” is the operative word here. “As big data expands through streaming cloud technology, traditional security mechanisms tailored to securing small-scale, static data on firewalled and semi-isolated networks are inadequate,” the report states. Important changes in the computing environment include:
Multiple infrastructure tiers, both storage and computing, required to process big data.
New elements such as NoSQL databases that speed up performance but “have not been thoroughly vetted for security issues.”
Existing encryption technologies that don’t scale well to large datasets.
Real-time system monitoring techniques that work well on smaller volumes of data but not very large datasets.
The growing number of devices, from smartphones to sensors, producing data for analysis.
General confusion “surrounding the diverse legal and policy restrictions that lead to ad hoc approaches for ensuring security and privacy.”
The report calls out the need to secure the infrastructure of big data systems, from the infrastructure where computing and data storage occurs, to securing the data itself and ensuring that applications that access different, large, distributed datasets maintain proper access controls and the privacy of the data itself. There are also calls for ongoing system monitoring. (See chart at the top of this article for more. See “Spelling Out Privacy Risks in Data Analytics,” at the end of this article, for an excerpt from the report.)
Wilco van Ginkel, senior strategy at Verizon based in Amherst, Nova Scotia, is a co-chairman of the Cloud Security Alliance Big Data Working Group. He said the report released June 17 builds on work done last year to identify the top 10 concerns and is designed to spur action.
“What we hope for is that the vendors out there will step up to the plate,” he said. “We see encryption is difficult on a large scale. How can we change that for big data?”
There is a tension for practitioners, he added. Before the big data movement, most of the datasets companies used were siloed. The business owner of each dataset was compliant with data management and regulatory policies. That new big data systems opens up those siloes and creates connections among different datasets creates a new dynamic.
The combination of all the data puts it in a different perspective. The fact that you have access to all that data does not mean you are entitled to or must use all the data” for sentiment analysis or another use case, he said. “The way we can access the data, and correlate the data. That is really the ticking time bomb.