Big data has become an increasingly scary phrase for all stakeholders in data protection. For privacy advocates, it often means loss of control, asymmetrical power and hidden discrimination. For regulators, it often means regulatory round pegs in operational holes of different sizes, in constantly moving locations, with mismatches that begin with vocabulary. For companies, it often means uncertainties at every dimension, with regulatory uncertainty just one of the most visible.
Over the past two weeks, I had the honour of participating in a roundtable discussion on big data and enforcement co-hosted by the European Data Protection Supervisor (EDPS) and the Bureau Européen des Unions de Consommateurs (BEUC), the European-wide consumer organisation, and in the Privacy Commissioner of Canada’s first stakeholder meeting on consent and its alternatives. In both cases, the room was full of very smart people, trying to figure out how to enforce consumer and human rights in an environment of more touch points, data, math and applications. After spending nine years trying to figure out how to legitimately utilise the knowledge and productivity that is associated with big data, while still maintaining a space where individuals are not completely defined by their digital tracks, I am increasingly frustrated by the confusion of data with processing. Clarity is necessary since big data processes are needed to foster the full range of fundamental rights as well as economic growth. Big data done badly raises issues for all parties. Progress requires breaking the issues apart so they may be fully understood. Thinking about big data as an ecosystem has been helpful to me.
First, big data and humongous amounts of data are not the same thing. Nor is big data the same as faster, more detailed and more diverse data. Rather, big data is the process by which humongous amounts of data, coming from various, often unstructured sources, are brought together to generate insights and increase the efficiency of processes. When I say efficiency, I am not making a value judgement. The efficiency may improve healthcare or facilitate unfair practices. Either way, efficiency constitutes the goal.
Second, big data is advanced math conducted on very powerful systems. Some of that math and processing is world class while some processing is mediocre at best. The quality of the knowledge created is directly proportional to the skills of the of data scientists and programmers.
Lastly, knowledge may be applied in a manner that is legal, fair and just, or it may not be applied in such a manner.
I suggest that for all parties to have a sense of how to approach control of a big data ecosystem, big data might be looked at in the following manner:
- A Means to Observe: Big data is feed by observational data. Observational data is data that comes from all circumstances where behaviour may be observed. The speed has picked up with smart phones and will continue to accelerate with the Internet of Things, smart cars and even smart medicine. The observation itself is becoming increasingly necessary for things and systems to work. For example, mobile phones cannot operate unless their locations are constantly observed, and smart brakes are not effective if their use is not monitored. Unlike traditional collection concepts, observation requires a governance layer that goes beyond individual control.
- A Means to Gain Knowledge: Data must be brought together, be prepared for processing and then run against advanced analytic processes in order to generate knowledge. Often this means interim steps where questions are refined by the correlations from the last step. Data must be matched again and again. Questions of obscurity to protect individuals come up against accuracy to, in the end, protect individuals.
- A Means to Apply Knowledge: Once one has knowledge, one needs to decide whether the knowledge makes sense, is applicable, is legal, fair and just. None of these decisions should be made in a vacuum. They are based on the law, commitments made, and societal standards. What makes sense in one context, may not make sense in another context.
Too often, when discussing big data, participants jump from one part of the ecosystem to the next and the next without considering the analysis which is necessary for each part. The IAF has separated “thinking with data” – the means to gain knowledge – and “acting with data” – the means to apply knowledge in developing assessment processes. (For more information, see overview of IAF Big Data Initiative.) Also, the IAF made the case that policy governance needs to be tied to data’s origin in “The Origins of Personal Data and its Implications for Governance” written for an OECD 2014 discussion on data taxonomy. I believe the data taxonomy work, particularly as it relates to the IoT, needs to be revisited and better integrated with ongoing work on big data governance.
In late September, I blogged about the re-establishment of the accountability dialog. (Link to the blog post.) I hope that dialog will encourage discussion on how the three means view of a big data ecosystem might be taken.