This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.
>> Good morning everyone.
Thank you for coming.
Apologize for the slight delay.
We're having a little difficulty getting the remote participants hooked into the actual remote system, though I can see them here on my computer.
So my name is not Carolyn Nguyen. Unfortunately she was unable to be here.
My name is Paul Mitchell. I work at Microsoft and run the policy part of the technology policy group.
We have had some other travel delays and challenges as well. So our panel is a little smaller than was originally identified.
But I'm delighted to have some real experts here on a very exciting topic.
To my left we have Amparo Ballivian, and Linnet Taylor, and final Alan Markus.
I'll have each of them give their own biographical interest did you go shun.
We have two microphones and we'll do some microphone dances.
Also on the remote participation, when we get it connected, we also have William Hoffman, I think at the moment is limited only to typing to me.
We'll somewhat happens.
So, Amparo, go ahead.
>> Just my bio or get into the -- okay, I work at the statistics department of the World Bank.
My job is basically about helping developing countries open data initiatives.
Because open data is multi dimensional subject, then we have a working group that includes several units within the bank, including technology, training, et cetera.
Prior to that I am an economy
ist by training, so I work in several regions of bank, especially Latin America and Africa.
But I also have work outside the bank as a government e official, in the private sector in academics and diplomacy.
That is my background.
>> My name is Linnet Taylor
I work at the Oxford Internet institute where we have a project funded by the shown foundation to look at the uses of big data in social sciences, basically how is it being why used is to study human behavior.
We have a year left to go, working for a year.
It's basically an ethnography of big data. Conducted about 120 views worldwide, the way they see big data and how it's being used.
Within that I have ra stream on developing countries and big data in development. Within that we're doing a project to bring together people at the Balagio center next year around big data and social change around the developing world.
If people have thoughts on that, please come talk to me.
Hi, Alan Markus, I work at the world economic forum, responsible for information and communication technologies kind of through two lenses, I suppose.
One looking at the largest information communication technology companies and governments that are focused on these challenges, and getting them engaged in global dialogue.
On the other side, looking at the key issues represented by those stakeholders and how they affect kind of bigger global concerns.
At moment we're particularly working on what we call hyper connectivity. We call it that because this is essentially moving away from just the Internet as we understand it today, toward looking at everything from machine to machine, to new kind of sensor networks, to looking at what I guess the difference between online and off line and how essentially the world becomes digitized.
We see a number of particular areas of concern, one clearly a data and what this topic is about.
So really looking at how data will be used in the new digital future and the global governance gaps and policy issues that go around that.
>> PAUL MITCHELL: Thanks. Just how we are going to work the session today.
I'll give just a little scene setting in a moment. I hope for this to be a conversation, which means I really welcome your input and involvement and your questions to any of the panelists.
Just to set the stage, we're talking about data. The world is awash in it. It's becoming increasingly moreso.
There's a digital deluge estimated to grow at about 50 percent a year conservatively. The availability of this data holds extraordinary potential for societal benefits and economic growth.
But at the same time it creates growing concerns for individual loss of control and privacy, potential impacting their human rights.
Balancing these needs will be essential and require thoughtful policy processes that can approach these issues holistically.
So in our panel today we have a variety of experts from different perspectives in the discussion.
And interestingly, data is one of those things that, as I just mentioned, can be tremendously beneficial, and it can create the challenges.
It's recognized, though, as one of the fastest growing economic drivers in the world.
Especially in developing countries.
And similar to the democracyization effect of the Internet, data has the ability to unleash lots of innovation
I think we can identify many examples where access to data has spurred development, spurred solutions for social challenges or for health challenges or for economic challenges.
Data analyst ics will being integrated into governments, global agencies and other develop. Organizations around the world. And tools to enable and improve evident based policy.
City planning, academic tracking, disaster preparedness, economic forecasting, for example.
But these areas can introduce new risks for individuals, as their data flows across global networks, people are increasingly concerned about a loss of control.
A growing reliance on technology that impact their lives in ways they don't understand.
It's a small wonder that regulators are concerned about an imbalance between industry and individuals, and they are moving to protect citizens from risks posed by a data driven economy.
Just yesterday the European parliament approved amendments to their data protection directive, which I'm sure some of you will have some thoughtful comments on.
These discussions about data use are frequently carried out in parallel and separate forums, and this is a problem with the policy makers not talking to the technologists and often vice versa.
So today we wanted to tackle the challenge of big data and a user centric data ecosystem.
How can we enable the economic value from the data while protecting the users.
So I'm going to open this up with a few questions, key questions, just to kick it off.
We'll go just straight down the panel.
And the first question I'm going ask the panel to address is some specific examples of how big data and open data deliver social benefits and economic growth.
And as part of answering this question, I'd actually like each of them to define both big data and open data from their perspective.
So I'll start with Amparo.
>> Thank you very much, Paul.
I'm going to talk mostly about open data because in the bank we have some experience with it, and just a little bit about big data at the end
I think that you wanted me to talk for about three minutes, so I'll try stick to that.
The bank itself was the first multilateral institution to have an open data portal and open data policy.
So as a result of the success of that open data initiative, now some of the developing countries that are clients of the World Bank have requested our technical assistance because they want to have their own open data initiatives.
In the developing world, this is something that is just starting to emerge. I wouldn't even call it a baby. It's almost like an fetus yet.
There are very very few countries in the developing world that are tackling open data seriously.
But I have to say that we're very proud that we have supported most of them. Not all of them. There are some doing it on their own.
We have a series of technical assistants and training tools to help them
I brought the brochure that is a very quick summary of what we do, and so I ask you to take one at the end of the presentation.
One of the ways that we try to sell open data to our client countries is precisely by showing them the examples of how they can benefit
I personally think that we are in the wrong business when we try to advocate for open data
I think that rather than advocating for open data, it's better to advocate for the applications that open data allows.
So if you show a minister of health the aps being used in other more developed countries where open data is more mature for health, the minister of health for any country would be interested about that, and then the open data is the consequence of that
From that point of view, what we're trying to do now is to identify a series of applications that can be transferred across borders and used in different places.
Will also improve the efficiency of the efforts about creating applications instead of devoting resources to recreating them in different places, try to find a very good application but with all its technical specification and data needs so that they can be reused in other countries.
So we're trying to move from a data reuse model to an ap reuse model.
About big data, we have, we don't have a big data program yet in the bank
I think that the territory about using big data for development is still very very new.
Linette here said she was going to tell me what is going on in other places, and I'm eager to hear her.
We have just dipped or toe in the pool a few months ago by doing a big data event at the bank. But it's very experimental yet.
The other thing that I wanted to mention, because it could be considered big data, is that we have also on an experimental basis engaged in a program to collect prize data worldwide using mobile phones
I can tell you more about that. It's I think the only thing the bank is doing really in putting its resources into the big data scene.
>> Thanks, okay, definition of big data.
The one that we're using for our project is that it's data which is of unprecedented size and proportions in relation to a given phenomenon. It depends on who you are and your standpoint.
And open data, I guess it's data where metadata is freely available and you can know as much about the data as possible
I would add that to Amparo's definition rather than replacing it.
In terms of potential benefits, I have mainly been looking at benefits in and about developing countries.
I'm talking about low and middle income countries, but mainly low income countries. At the bottom ends of the income scale and educational scale. What big data is being produced, what is being used, and by whom
I see a way to divide, data about developing countries and data for and within countries.
About countries, there's a huge amount of policy research that can be done using mobile data in particular, traces, activity, financial activity conducted through mobile phones like the program in Kenya can tell us a lot about economic dynamics in developing countries, about population growth, population dynamics, mobility and migration, which is very important.
All of these are areas where we don't currently have good statistics. Nobody currently, not just us, whoever we are.
On the poorest countries.
So it's important in developing and use statistical perspective on issues like openization in low and middle income countries in particular, the need for services, where to place different, you know, different infrastructure.
So it's about developing statistical capacity, and it's about understanding scenarios of development in new ways.
There's also the emergency and humanitarian category, again, mow Dale data and social media data. I would give the example of the Haiti earthquake where was possible to do epidemiological work using mobile traces and work on what was needed and where based on SMS communications with both of those were in the context of Haiti big data.
Third I would say there's a very important dimension currently going very much underresearched around the potential of big data to promote rights, voice and participation and citizenship in developing countries.
Again like producing good statistics on places where have few are available, producing information on who is doing what and where and why can be very important for rights and participation
I would give the examples of for instance the election violence aps being used in Kenya and Uganda and also electoral registration aps recently around elections.
Global pulse is going to be using social media to help define resilience in places, actually in Indonesia among others
I would say there's huge potential there if local government can be involved. If national government can be involved and the flow doesn't go simply from the technology user to the international communities, there's potential to inform rights and participation on a local level
I will pass on on that point.
>> Keep going, that's okay.
>> PAUL MITCHELL: .
>> Interesting this definition, there's so many variations, I'm not going to add to the confusion, and just call it all data.
A couple of observations, and we certainly heard from the panelistsrs the notion of the mobile data.
That is how we got into this conversation at the economic forum. It started with talking to telecom mobile operators around the world and recognizing the phone, though it's funny because more and more people don't actually make calls with it, but this device has changed the entire world and gave us access to individuals in ways that no one ever dreamt possible.
Now that there's more phones than human beings on the planet, more sensor data is being collected in a movement of activities, more mobile kinds of activities, and this is changing the way we can look at how the world works.
So if we want to call that big data, that is fine.
Certainly the notion of publishing data, I think that is an important point, so often there are businesses that collect data. Maybe much more so than even the government. And just recently the U.S. government shutdown.
During that time no statistics were being collected, there was no one to do it. But there are very large organizations, one I know I use, Concur for expenses that tracked exactly how the expense rates changed during the shutdown.
This is data that take the government a year to publish, but they already have it. Do they have the right to hold on to it? Should they be publishing it? I think these are excellent questions, and night sure we actually have a collective answer
I think these are really important kind of dynamics on the notion of publishing data and looking from a social good.
The other observation I want to put out there is the notion of how data used to be collected before the advent of machines and memory and all the stuff highly powerful or the orders of magnitude difference in size to call it big data
I think this is an important point. The scientific method always talked about statistical sampling, you this had a theory, collected enough data, either proved or disproved your theory, and I think that was an important part of history.
Today sampling is not something we need do. Complete data sets in ways no one ever thought about exist. They are proving in fact many of our theories no longer hold. That is a good thing. But also means complete data sets matter. And complete adapt sets unfortunately create the challenge of can I opt out.
This notion of what is good for the many versus the good of the one, and do complete data sets matter.
So there's loads of great examples out there
I like to, what is going on now in the united Arab Emirates, diabetes is run rampant, the fastest growth in terms of diabetic people, obviously not good.
Through the notion of big data, open data, recognizing trends and patterns, they have been able to absolutely increase the effectiveness of treatment and how people are using it. Unfortunately it wasn't an opt in or out thing but definitely gave them a data set that allowed individuals to have a much more customized treatment and opportunities for themselves and it's improved dramatically in terms of diabetic control in large populations like that.
>> PAUL MITCHELL: All three of you have proposed basically very positive benefits from this explosion of data
I wonder if you could talk a little bit about the flip side. Maybe we'll start with Linnet in the middle, just sort of what is the flip side. How do we, what are some of the risks and blind spots that industry and society may be running into.
Well, think about this mainly from the perspective of the worst and most marginalized people.
I guess function creep is a very big deal, as it is with all sources, all very powerful sources of information about human activities and behavior.
The sort of a continuum going on with the use of big data from the emergency crisis humanitarian, through so surveillance.
And at some point along that continuum, some sort of auditing permission, consent has to kick in.
At the moment we don't have a good ethical or practical framework for where that kicks in, how it kicks in, and who manages it
I find that very problematic, but I think that we're edging our way towards thinking more clearly about that.
I don't think that anybody is trying to prevent it kicking in necessarily
I think we just don't have the structures in place in terms of governance yet to do that.
The example is what I mentioned before, the Haiti earthquake.
At one independent of the spectrum you have flow minder collecting information on people's mobile traces to figure out where people are and how to prevent them catching cholera.
This for me is not problematic. By all means use my personal information, no need for consent.
Suppose I move into a refugee camp and you're still getting a lot of my mobile data, my whereabouts and calling data.
You can tell my social networks, who I'm talking to, who those people are talking to, the patterns in my movement and communication and possibly economic activity within the camp.
Suddenly it turns into a state where if my government had that information and saw me as a political adversary or a dissident or somebody causing trouble in other ways, then that could be highly problematic.
Alternatively if somebody saw me as somebody who needed to be quarantined and came and took me away, that would be very problematic too.
Very quickly you go from a situation where data should be free, openly available, and there's no threat to the individual, to a situation where the data is out there, the horse has left the stable, no way to get it back, and you don't know who has access. And whether a potential untrust worthy actor has access.
A lot of the data I'm interested in is fragile and state building developmental context, not all actors or uses are trust worthy, and yet it's important to have the data be available under certain circumstances.
It's a real tension. It's not a drawback, it's a profound tension, I think.
>> I will answer your question, but then I also want to refer to two points that were made by Alan a moment ago.
So the drawbacks.
I'm not sure if these are drawbacks, but I can tell what you people in develop, country governments tell us why they don't want to do open data.
And some of the usual concerns are the ones that you mentioned about privacy, and also some mentioned national security, many of them mentioned the quality of the data.
Quality is bad and we don't want to lose face, therefore we don't open up.
Some of them mentioned loss of project revenue from sale of data.
So we have a standard list of excuses and a list of how we reply to the excuses.
And privacy is a theme that permeates through this conference and in the discussions about open data and big data.
Surprising is to me, I work in the statistics department, though not a stat tis tissuen myself, I'm surrounded about them. They know about organization techniques for deck, using the aanonymousization techniques.
But the open data movement don't include many stat tis tickishens
I think it would be good to rely more on things that are not perfect, because there are ways to deaanonymousizing adapt.
But it's like hackers, right? You go through more security in a software and then somebody cracks it and you go to the next iteration and so forth.
I want to take issue with Alan's point that statistical sampling is no longer needed
I beg to differ.
Let me give you a concrete example.
How many of you know how the world calculates poverty rates in countries?
I'll summarize it for you.
The statistical agency in a country goes every X number of years and does household survey of consumption or of income.
These are surveys done by enumerator going physically to the house with a questioner.
In many countries the questionnaire is on paper, some are moving into PDAs.
They collect the data, it's a long questionnaire, sometimes repeated questionnaire depending how sophisticated you want to get.
They bring back the data and compile it, and we all use it, including the bank, to calculate poverty rates.
The problem is this service is very expensive and therefore not done very often.
Why do you need to do something for this service?
Because you need to have a statistical inferences about the population. And without a sample that is representative of the entire population, you cannot do it.
Now, in developed countries where you have really a permeation of sources, it's erroneous to say even in developing countries the cell phone ownership, in some countries more cell phone subscriptions than population.
Yes, but that does the mean everybody has a cell phone.
I mean there are many people that have two or three.
And data about coverage of cell phone signals is very difficult to use.
That is one issue.
The other issue is that most of this data we're talking about is data that is collected through smart phones.
If you talk about the distribution of smart phones in developing countries, it's a very tiny percentage of the population because Internet signals are not there yet.
So for all of these reasons, I think that we still have several years to go with statistical sampling, and it's still going too be needed.
And the third very brief point is about using other sources of data.
That I concur with Alan.
We have to explore using other sources of data than the usual methods. That is parts of the difficulty in this cross dialogue with the statistical community
I think the examples that we are just start to go see, but I'm sure that we will hear more about what has happened during the U.S. shutdown where the statistics is very interesting and perhaps a blessing in disguise because people are looking at other sources of data than the government, and that is going to lead to a very interesting discussion.
>> I concede my point from statistics.
She is from the department of statistics
I should have known.
To the point, obviously fair point.
As a technology focused person, I'm always thinking of where we should be, and you are quite right on where we actually are. That gap needs to be closed.
But you did point out something that I think is important.
Aspirationally, survey data is complex, it's expensive, and certainly through technology, particularly mobile technology, those prices, those costs can come down and we can actually collect and understand things in ways that might become more important in the future.
But indeed, that could be way in the future
I certainly don't want to repeat what was said because that is silly, but I do want to pick up on one point, and that is this notion of trust.
That is something we talk a lot about with our communities.
Trust. It's a very strange word.
Just sort of a personal anecdote because I think it kind of makes the point of trust
I was tickling my five‑year‑old son, and he loves it. He thinks it's really funny.
And every time I go up to him, any time I go up to him, he starts to immediately become protective.
And I said, what are you doing?
He said well, I'm afraid.
What are you afraid of?
I'm afraid you're going to tickle me
I said, I'm not going to tickle you.
But I don't trust you
I said what does that mean?
He says, you say you're not going to, but then you do
I think that is really important. Because trust is paramount to the use of data.
We trust people with our data to do what they say they are going to do.
The problem is they don't.
They do something different.
This latest prism thing, we might as well bring it up. Edward snowden showed, clearly shows what what is happening.
Not so much they are surveilling, and maybe they shouldn't, but they said they weren't and they were. That is a trust problem.
How do we trust institutions and hold people accountability.
One of the real challenges I see is even where really good law is put into place, good policy where accountability is well defined, enforcement is not there. In fact enforcement becomes very difficult.
We certainly see this in criminal issues arounds data theft, around use of data for criminal activity.
Who enforces it? Who is responsible?
Even though there's accountability that it is against law in many places, there's no enforcement.
So we don't have trust in the system.
Which means a lot of bad things could happen. Not just a few, but a lot of bad things.
And we're seeing that, I think, every day.
Certainly surveillance, certainly the notion of developing countries wanting to use the data in unproductive ways, I think, are great examples.
But we see it even in developed countries where insurance companies get access to data, maybe they shouldn't, or marketing countries, we can call those minor harms perhaps, but nonetheless when people start to knock on your door and sell you something, you get a bit annoyed.
A lot of those types of harms. I think it's the ability to enforce accountability in a systemic way, and the recognition that trust is paramount to the system, and we need better ways to develop a trust framework.
>> PAUL MITCHELL: Great point. Let's pivot in that direction.
You mentioned trust and accountability.
And to date our managing accountability has basically been legal frameworks and regulators trying to apply the legal frame worse pretty much as fast as they can in a world in which the data is coming far more quickly than they can adapt, and in which the news uses that are imagined and not suddenly put into place happen faster than they can react and faster than there can be a hearing, a public hearing on them.
So how can we, can we use what we know now to both inform our policy making and, more importantly, to try to create systems and put systems in place that actually enhance the accountability and the trust.
Anyone who wants to, go first.
>> All right, it's only fair.
Fairness I think is another important principle
I think for me the challenge starts with the way I view regulatory regimes policy and legal frameworks, they tend to be very black and white.
And that is probably a good thing.
I'm not a legal expert, so I'll start with that.
That is probably a good thing.
But in the area of data usage, I think there's a lot of gray.
And gray means you need more adaptability.
Making it just sort of black and white, I think, becomes problematic.
If we say, you know, very primitive example, that data inferred about you can never be used for anything other than what you stipulate up front, we might miss loads of great opportunities that open data big data has shown to be, to create.
And so how do we become kind of adaptive, how do we look at this not so much as a hard rule black and white issue, but think more in terms of adaptation.
Now, one of the things we have noticed is in the financial sort of environment and financial networks.
This notion of kind of adapting contract law.
So although there's some clear black and white regulatory regimes on which you need operate on a global basis in the movement of financial assets, you also can stipulate between parties, including multi party stipulations, on certain aspects of how that would get used under a jurisdiction that exists someplace in the world.
So you leverage the notion of contract law. That gives you a little bit more flexibility on kind of how to use this, and a system or a scheme on how to enforce that. Because you stipulate exactly what enforcement starts to look like.
I'm not suggesting this is a scalable solution on a global basis, but again, maybe it is. Not being a legal scholar, I couldn't say
I think the notion of adaptation is really important as we think through this.
>> I guess I have been thinking a lot about country participation and where it's important and where it's no the.
What I said earlier about how data can sometimes flow from the user directly to the international level without passing through any authorities on the way
I think we need participatory structures for consent and for data sharing that work across national contexts
I think that is incredibly hard to figure out
I think the country level governments have to have a say in the enforcement of data protection, which is not always happening right now.
They are not a little even aware that the citizen data is being used in ways which may really impact the national well being
I think that we also need an independent authority to govern data sharing
I would offer the example of a search warrant
I was saying to alan earlier, someone wants to search my house they have to go to a judge and say I have reason to believe Linnet has done something wrong.
They get to right to come in and have a look around.
This is not true of my personal data. I would appreciate in there was some independent international authority which could look at large data sharing exercises like the one starting to go on about developing countries. My particular example right now is the data for development initiative which happened this year.
Where data was released about goduwar to many teams of international researchers without the consents of anyone in the country at all. Arguably for very good purposes, but you get problems on the country level because the country was never involved in the first place on use of release of data.
If you have an independent authority, they can connect countries, even the local level to international data users and Charrers
I think that is an important part of governance that is currently completely missing.
>> This is very difficult.
It's very difficult because, partly because we are seeing the tip of the iceberg.
Institutions collect data even in spite of themselves.
And to try to foresee all the possible uses of that data by the collecting institution or other institutions, I think it's impossible because it's such a large spectrum
I do think one of the guiding principles should be where does the burden of the proof lie.
Do I have to prove that I'm not invading your privacy? Or do you have to prove that I am?
I think that is a.
>> #ER: Way to go.
And we're going to, we, by way I mean societies are going to be learning about things that are, that can be legislated because they can be identified.
But that is going to change as we move because it's impossible to identify all the ways that usage and collection of data could be deemed to invade privacy.
So I think that that is part of the answer.
And the other part of the answer, and I'm sorry because this is an economist's bias, I think markets work. And the markets are also going to play a discriminatory role. Let me give an example
I get lots of mail offering insurance and offering me credit cards and things. I live in the United States.
And I'm probably being targeted because the credit card company thinks I have good credit record or whatever.
But I simply throw the envelopes in the garbage. I don't even open them. I just tear them and throw them in the garbage.
At some points they are going to get tired of sending me these things. If they don't and they want to spend some of their money on a wasted potential client, fine. But there is going to be a learning process and markets are going to have to play a role.
>> Thank you, I'm going ask one more question, then I'm going to open it up to the room here.
Linnet, you in your points here a moment ago, you sort of talked about data and the sort of issues around it from a national perspective.
And on the Internet here we are dealing with something that is globally interconnected and where data is more or less freely flowing from country to country, not really caring too much about the borders.
And we have in that global universe a number of governments at all levels and societies, with a wide range of views on what privacy is or should be
I just wonder how you think about this challenge of globally free flowing data on global ecosystem, and at the same time what it sounded like local management of the data itself.
>> (Internet connection problem with Skype.
Trying to get the call back).
>> (Answer in progress).
Linnet: I am however advocating an international authority for data sharing. I think that is important
I think it should be both part of encouraging countries to pass, enforce and understand laws better, but I think there's really a need for international governance on this.
Precisely because companies are acting in places where the laws of their only countries are not enforceable.
When you look at major data releases in developing countries, they are usually conducted by companies situated in industrialized countries who have no explicit legal responsibility towards the citizens of the countries where they are operating, just like the U.S. has no responsible not to invade the privacy of EU citizens, and we have seen some problems with that recently.
Something similar is going to between large corporations based in the industrialized countries in general and developing country citizens.
>> Ms. Ballivian: Okay, I want to contest the basic premise of the entire session about there business a data dehuge.
In the developed world there may be, but in developing countries, which is our concern, you have to see also the flip side of that.
There isn't enough data, basic data.
I mean we have trouble, by we I mean not just the word bank, but the U.N. system and all the international development agencies, to collect the data for the development goals.
We have lots of trouble to have poverty data. Haiti supposed to be a poor country. You know what is the last time Haiti did a household survey to calculate poverty? Twelve years ago. Twelve years ago.
So how do we know what is poverty in Haiti? How do we know all this indicators.
So what I'm saying is nothing new. If some of you have read the report of the high level panel that was put together by the secretary general of the U.N., a report shared by Hamni caras, they are talking about a data revolution, the need to collect more data in developing countries.
And the bank in particular are of the view if it's going to be a revolution, it means doing something different. A revolution is an evolution. It has to be using new technique, news sources of data.
That is where I agree with the title of the session, the multi stakeholder, we need to go beyond government, to data collected by civil SOA society actors, education, new ways to get the developing countries at least two stages above where they are right now. Believe me, they are very far from a data revolution.
>> PAUL MITCHELL: Thank you. Anyone in the audience here who would like to ask a question.
Don't be shy.
All right, if no audience takers, I have another one following along there.
No, I see remote on my screen.
Following on this sort of dichotomy between too much or the deluge of data, and not enough data, one of the thoughts that your comments sparked is that there's perhaps a taxonomy of data types that we haven't addressed.
You addressed data on poverty, for example, on data that would help us achieve millennium goals or what kind of programs would achieve those goals, and on the other hand we have data about your shopping habits and credit history and perhaps your preferences at the grocery store
I think perhaps those are two very different categories of data problem or opportunity
I wonder if anyone would like to sort of take a stab at the idea of a data taxonomy applied to data management.
>> Mr. Markus. First I think in any kind of research, any kind of trying to understand something, I think there are still data gaps
I think there are many things where we may have more than enough data, but of course there are many things where there are data gaps
I think we do just need to recognize
I think the same point that Amparo made before, there's the aspiration of forward thinking that says in the future maybe we can have complete data sets, but there's reality of where we are today
I think that is important.
Now, with that said, I think we should also recognize that there's huge amounts of data that many organizations are sitting on that is not available to a lot of people. Not available to many any people in some cases.
Part of that is just out of fear due to exactly the challenges I think we have been discussing. Who is accountable, you know, how do I know I'm not harming someone, do I have rights to use the data in the way there is.
So these might actually be good actors sitting on huge amounts of data in fear that by using it in some way, they are going to violate something.
I'll talk about the Telecom community which I'm very familiar with.
They are sitting on huge amounts of data that they just don't share at all.
And not because they are trying to be stingy, because they really are afraid of what it might do. Reputational harm, legal issues, international concerns.
So I think part of solving these problems actually is to get access to data that could be used for a positive sense
I think that is just something to kind of think about
I think, sorry, the rest of the question?
So one of the things I think that could be useful here, there could be a technology solution to some of this.
Linnet talked about kind of metadata and actually understanding not just the data but everything about it.
If we could create a framework, a technical solution around metadata use, that is where the data itself also includes its taxonomy, also includes ownership rights, which might be multi tenant, also includes usage rights, than the system, the technical system that we might call the Internet or something even greater actually knows how to enforce that through the appropriate regulatory regimes, then we might solve some of this stuff.
So do we need a taxonomy to get there, not necessarily, but taxonomy is going to be important.
To me it goes back to the big data versus open data versus regular data, whatever that means.
There's so many opinions that I don't know how productive or fruitful it would be to have a taxonomy too soon unless we have a framework that can leverage that taxonomy.
>> Ms. Taylor.
I'd like to pick on something that Amparo said, that academics may have a role in analyzing this data, particularly about the developing world.
To me a distinction is data that is undisturbed and where the meaning is considered relatively stable, as in Internet things type data, and data that relates to human behavior, where data might be unstable, where we don't understand what it means, and we need who social contextualization to understand and use the data.
I don't think that divide is being observed well enough by anyone thinking about data infrastructure, data sharing capacity, data sharing frameworks.
Right now I don't think we have a stable definition of what constitutes personal data
I like the world economic forum's definition that it's as broad as possible
I think that is the best to go because it makes us think the hardest
I think the instability of meaning is really important particularly when we come to complementing or possibly replacing things like household surveys
I think statistical stability and representativeness is hugely important, and we do not understand these new sources of data yet. It's not they are not useful. We don't have enough information about them
I would say almost metaphorically, the metadata doesn't exist yet. We have to build the metadata around them.
It's partly social, partly invisible unless you have country understanding and country knowledge. That is why I say it's important data shouldn't skip the country level.
My mobile data shouldn't go directly from me to Alan, I don't think he's going to tickle me right now, but still I don't know.
>> Ms. Ballivian: I'm going to be very simple minded if I haven't been already
I think there's a very basic taxonomy in my mind.
One is data that is supposed to be statistically representative of the entire population, and that is basically surveys and censuses, all types of surveys, not just household but also enterprise, et cetera, et cetera. Then we have records.
Part of the problem in my view is that administrative records that exist in every developing country are not being used.
Countries, even the poorest village, they record the students' exam results, the teachers assistants, they have records on the kal issues.
So administrative records is a humongous potential of data.
The problem is it's not business used basically for two reasons.
One is in many cases it's not digitalized.
You would be surprised how even middle income countries don't digitalize all their data.
So there are new companies and models to digitalize data more efficiently, and I think that is a good thing.
The other thing is the metadata. You don't have metadata on administrative records. Creating it is relatively easy
I hate to blow my own horn, but I think that I have contributed to that. A few years ago, we were asked by the state of Yukatan in Mexico to help them with statistical collection. We're realizing the national level is very strong institution. More employees than the World Bank.
It's very very large, very powerful.
But at the state level, they are seeking loads of records they are not using.
We search about ways to compile metadata an administrative records around the world. We search in the United States, in Europe, there was nowhere to be found.
We will to create. We created something maybe very precarious, very unsophisticated, but it's something. And it's something to build on
I am completely convinces of the power of administrative records in doing things even in aspirationally thinking, having to do away with the other type of data. For instance in Finland, the only country today that doesn't do a census in the way that the United States or all other countries do it. It's the only country in the world that that is its national census completely based on administrative records
I think that is the way to go completely.
>> PAUL MITCHELL: Interesting both of those are in Europe.
So switching to Europe for just a moment.
It has been said that Europe is the leading edge of privacy regulation and privacy rights management, and just yesterday the European parliament adopted some substantial changes to the privacy directive from many years ago
I wonder if any of you would like to comment both on the European approach and specifically what has been going on there over the last year or so, and its applicability to the rest of the world.
For those who don't know, the Europeans have had a data protection directive that goes back many many years.
For the last few years there's been a process underway to revise the data protection directive to try to bring it up to 2013, to try to account for in many ways the issues that we have been talking about.
It is still a legal frame work, and it had lots of controversy along the way with over 4000 proposed changes, amendments, you know, quirks and edits. In the runnup they have managed to have public comments periods, they have published various versions of the documents, they have had rapateures go through the documents and issue reports and recommendations.
In the European context this all has to happen between the council and the parliament and the commission. So it's a complicated structure.
It would appear that it's moving towards some conclusion, at least for this round.
If nobody has specifics, we can move on.
>> Ms. Taylor: I actually was involved in a discussion about this. I thought you meant they actually passed something
I was interested in the fact the EU is trying to define harms to privacy proactively. And the debate this is causing, I was just talking about this between them and the U.S., where U.S. companies are saying they are uncomfortable because in the U.S. you sue and your constitutional freedoms are ink impinged upon P.
The EU is saying you justify your use and companies are saying they are not doing this when they are collecting
I would say the more paternalistic EU approach is probably better because it forces people to think out in advance what they object to and what they don't.
There is has to be a framework within win companies are operating. In the U.S. it seems tore completely reprospective and reactive.
That for me is problematic.
>> Mr. Markus what is interesting, let me stay a step back.
First I agree with Linnet, certainly on the differences between the U.S. and Europe.
I would also say that most of the world, if not all the world, certainly most of the world is watching quite intently on this U.S. European debate.
Even to Linnet's point, a lot of developing countries really don't know what to do and they are looking to the big Internet giants, known as Europe and the U.S., to get a sense of what that means, and maybe adapt from there. Which is not necessarily a bad strategy.
This debate matters not just to Europe and the U.S.
I also agree there's still a lot of differences, though I think a lot of that is closing despite the fact that U.S. companies may not like it exactly
I think that even the American administration, certainly the last FTC chair was recognizing that there might be some benefits in the longer term play by being more aligned rather than not aligned.
But it is interesting because when the commissioner put forward the changes, the proposed changes, the first, let's not call it the first draft, let's call it the first finalized draft submitted to parliament, was at the end of 2010, if it's not mistaken.
What was interesting, they spent two years really bringing in stakeholders, business leaders, American business, European business, international discussions.
They brought in all the stakeholders.
They did it all the right way.
But imagine what happened during the ten years in terms of data and what we know.
The world changed dramatically.
From 2010 to now, while it's still being debated in parliament, it changes daily.
So just kind of shows that the way regulatory regimes work, it's hard to keep pace with the change
I think that itself is something to be feared, to think I'm going to draw the line and say tomorrow, to Amparo's point, we learn about a whole new set of things we need to allow for, and now we created an allow for the next 10‑20 years that is impossible to change and disrupts innovation and opportunity
I think that is one of the things Europe is struggling with as even Europeans stand up and say, okay, but does it have to be so black and white in order do things that are different.
There are some important points they are making, one, that you should have a say. I think this is is really good thing. People should have a say. Should have the tools necessary to make informed decisions.
They may not like the choice of decisions, and we should argue should there be more, but the directive points to the decision there should be tools, people should be informed, and should be able to make choices based upon that
I think also definitely focused more on usage and not just collection
I know there's a big fear in Europe on collection, and Prism is not helping in recent term.
But the real issue with collection is the trust and security. If we have trust and security, then collection isn't a problem.
As Linnet points out, Europeans are looking at the harm. So usage is the model. There is a lot about the directive that has some positive output we could learn from.
This opportunity for innovation, looking beyond what we know today, the unknowns, unforeseeable, things we don't know yet, if we stifle that because of fear, we really are removing innovation.
A study we worked with BCG recently, just reading the numbers, by 2020 the value created through digital identity in Europe is estimated to be one trillion Euro.
Now, depending on where this directive goes will have a dam tick effect on that economic opportunity, and we know Europe is struggling to bring back innovation
I think there's going to be a lot more debate around this issue.
>> Ms. Balliva I think there are two issues.
One is instead of trying to understand the ways that this unlawful or criminal activity is happening or could happen, why don't we concentrate the efforts in defining what is privacy.
It's like if you were trying to legislate against theft by trying to foresee all the possible ways people may steal, instead of defining theft.
Why don't we try and concentrate on defining privacy and defining breaches to privacy
I think that is a much better way to go about it.
Second thing is about where the burden of the proof lies.
It's not about companies proving that you are not, it's about you're innocent until proven guilty.
The burden of the prove is those who feel they have breached their privacy.
>> PAUL MITCHELL: Thank you, questions or comments?
>> Thank you, in the heading of the workshop, you talk about user terms data commons, but none of you have addressed that explicitly
I think if you talk about creative comments, the idea of data commons is a way to have a better discourse is what actually is public discourse, open to anyone, and what isnot, instead of discussing open data and big data.
And regarding this international authority, I was just wondering, I mean, there are so many countries in the world who just don't bother about international standards and regulations.
And they would always say, why do you interfere in our secret data in this country.
I'm working a lot in Southeast Asia, and we have for instance, this recent very interesting case between Indonesia, Malaysia, and Singapore about the Hague.
I don't know how many of you know about this case.
But it's typical multilateral problem which would not exist if there was open data policies.
Because some companies Indonesia were burning down the forest, causing Singapore, Malaysia, to be in the dark for weeks, causing enormous health problems, and it was actually the data was there, but it was not disclosed.
So how would the international agency say to the Singapore government, please give us the company data to see who is investing in Indonesia, or to end nicha, please give us the information about the land usage.
There are so many problems we can show if information would be open, there would be a lot less problems and costs in some cases, particularly in the environmental field. And corporate fraud of course.
>> I was wondering what are your thoughts on the possibility of fragmentation of the Internet following the NSA revelations.
If you think that, to what extent do you think the multi stakeholder approach to Internet governance is at stake, and how you think that will affect the use of data by businesses as well as for development, big data especially.
>> PAUL MITCHELL: One more.
>> Yes, I was interested to know what exactly the World Bank did with respect to administrative data in Yucatan Mexico: I think the sentence was finished.
>> PAUL MITCHELL: Data commons, international regulation, Prism, et cetera.
Who wants to go first?
>> Ms. Ballivian: .
Let me take in reverse order.
The initiative data tool, what we did is create a way to collect the metadata about administrative records. Data about the frequency, who collects it, with what delay is it published, if it is at all, and so on.
And so that is to our knowledge the only metadata initiative records that we could find.
Except there is Euro stat methodology that exists as well, but it's extremely sophisticated and impractical for very poor countries to try to use that
I can share more details with you after the event if you are interested.
I'm not sure I understand everything about the fragmentation of the Internet. So I'll leave that to my colleagues
I do want to agree to talk about the data commons
I have to humbly acknowledge, I'm not sure what it's meant. If it's meant creating common licenses to data, that is in the definition that I said in the beginning of the session a given.
Think more importantly on substance, data that is produced by the public sector, using public resources of the tax payer to be produced, with some exceptions should be public
I also want to say that it is such an easy and cheap step for going from public to open, it is a very misunderstood concept in developing countries, and does not involve any additional risk to privacy or national security.
So there's a lot of data that is already public. Making it open is really low hanging fruit, so we're trying to go for that.
Data that is not produced by the public sector, there's a lot of room for debate about the ownership of that data.
Private sector collects data about me, do they own it or I own it?
The government collects data about me, do they own it or I own it?
I don't have a clear view on that.
>> Ms. Taylor: I'm going to be quick because none is my focus of the expertise.
My response to the first two questions, what we have been talking about already, we need to have a better taxonomy, a better understanding of privacy, but that is contextual unfortunately, so you can't apply a single rule or definition of privacy
I think we need to allow data sharing to be complicated, because it is
I think there's way too many searches for a single terminology, a single way of treating data.
And data is not unitary
I think it's very dangerous to treat it as if it is.
And so I think there needs to be a dialogue about it.
I think we need a framework which can react contextually with contextual knowledge with the challenges of sharing different types of data in different data sets.
Within that you can have a creative commons principle, a data commons principle
I think you need contextual ty and to recognize it's complicated.
>> For someone not her field, that was very good. It was perfect actually.
Let's just start with that.
It matters all the time
I think the notion of or, you know,even the point Amparo made, who owns the data, we have to be rid of these exclusivities.
Most data is not created by an individual. It's created by a collection.
If we start to think about, that is important.
The notion of data common, certainly the way we look at it, which is very much like a creative commons thinking, it just says let's agree that there's data that if everyone uses it in a certain way, feel free.
If you want to use it in a different way, pay a fee.
And start too look at again, not an exclusive completely open and free for all, but for certain aspects, it's open and free, and for others it's about a commercialized opportunity.
It's okay, it can be and
I agree with Linnet, to do that, taxonomy matters and we don't have an answer to that question
I think that is definitely an area that I think requires a lot more effort and we need to focus on.
The fragmentation, as I like to call it the Balkanization, I think others like that term as well, I will say simply this, data is worthless unless it is traded.
Just that simple.
If you put a bunch of data, you write stuff, create stuff, put a whole bunch of brain power in a bunch of work and stick it on your hard drive and it sits in your home or business, it's absolutely worthless. It has no value whatsoever. Except perhaps to yourself.
It only becomes valuable when it gets used, traded, combined. What happens if we Balkanize the Internet we're removing the most valuable part, the tradability
I example of Singapore, having lived there many years a long time ago, we had the problem even then. There could be some data opportunities, but I look at it this way. I say it doesn't matter how much Singapore puts good environmental air pollution control in if Indonesia is going to light fires. Just doesn't matter.
It doesn't stop at the border
I think the Internet is the same.
It does the stop at the border.
Doesn't matter what controls you want to put on in the your own nation.
There's a college. Unless you completely cut off.
In which case then you have the opposite challenge of nontradeable data, and that is a problem.
Now Europe has an interesting proposal with the shengen kind of trade of data. Something interesting about that. At asean ministerial meeting in Singapore there is an interesting call out to have a Breton woods of the Internet, can we think about it, a tradeable security asset around data, and maybe that makes or helps in creating some taxonomy.
There's a lot of opportunity still left for discussion.
But the Balkanization, the whole value the Internet has created on the world is due to the trait trading of information.
It was built so researchers can exchange ideas.
It's been expanded so that business can create value.
The trillion Euro figure in terms of digital identity doesn't come from Balkanization, it comes from an open internet
I think we can't lose sight of that.
Countries that do will find out the hard way they are going to be challenged with keeping up with economic progress.
>> PAUL MITCHELL: Thank you.
We have covered a lot of ground today, from what is open data and big data, through to critical issues like trust and accountability, touching on the borderlessness of the Internet versus the localism of culture and societies and the potential for controlling at the local level.
We have talked about citizenship.
We have provided examples of economic value, social value, as well as discuss some of the harms, identification of defining the crime proactively, discussed different regulatory environments, the United States contrasted with Europe. And we brought in the idea of technology being part of the solution potentially through the idea of having a system that can respect the policies that a user might select.
Hopefully that has given you lots of things to think about as you explore the rest of the week
I thank you for coming and being our Ginny pig first session of IGF 2013, if you are tweeting, hash tags are IGF 213 and 4 D for this particular session.
I'm going to give the panelists each one minute to give a final thought or statement, then it's coffee time.
>> Ms. Balivain: My choice of financial statement is going to be about big data which so something we don't do yet.
My concern is big data analytics.
Rather than defining big data, what is interesting is new ways of analyzing that data that are coming up.
I have a concern because most of this about finding correlations of data, which can be very useful for some purposes, but for public policy we also need to acknowledge the limitations of big data analytics.
Let me give you a concrete example.
You find extremely close correlation between the number of times the word Malaria appears in millions of twitter messages and the actual cases in a country or region.
Now you are the minister of health. How are you going do that? How are you going to prevent Malaria?
Are you going to say don't tweet about it? Not likely to do anything, right?
So I think that that doesn't mean it's useless
I think that the minister of health of that country will find it useful in foreseeing where is the trend in a certain parts or certain groups of the population, et cetera, and that is useful knowledge
I think it's important also to know the limitations, at least as far as we know today, and I acknowledge that some people may know more than I do. But that this analytical method circumscribed to correlations can be dangerous.
>> Ms. Taylor: I agree about the correlation problem.
We'll talk about that separately. A fight somewhere in the corner
I think there's something really important from development field perspective, which is data science is conducted remotely at global centers of industrialization and technology. And development happens locally on the ground in poor places with poor information generally.
The two have not necessarily connected. They are not necessarily going to connect. The problems that we're seeing with data in development are the problems we already see with development, economic growth in poor countries.
The statistical problems are not necessarily going to change, as Amparo has pointed out just because we have digital data. We're digitizing the same problems.
How do we avoid the mistakes of ICD for D, for instance, where we had people in developing countries, international organizations, injecting technology into developing country situations and saying know everything will be better.
How do we get past determination and make it a tool for enriching communication between the citizen and the government, and also international citizenship because I want should connect to the commons of the Internet, the commons of data and data science should be connected to the international commons of discussion and freedom that there is online.
I don't see that happening. I see us talking about us as data possessors and analysts and them as the subjects of development
I think it is important to consider that as we have this conversation.
>> Mr. Markus. My final thoughts, data is an asset and it can be leveraged to create both socio and economic value. And the value is growing at both unprecedented level microally and macroally.
We need a balanced ecosystem, greater transparency, greater trust, and better control than we have today.
And that needs to be both flexible and adaptability.
We need to understand there's some differences in the way we will look at this.
The objective should be really focused on the trusted flow of data and not just locking down the data itself.
So this in many ways may require more principle based governance rather than kind of an absolute, so that locally we can look at the norms that are there and let them apply, but really to an international or macroally kind of principle based governance system, might be the way to go, which means focusing policies on how data is used rather than the data itself.
>> PAUL MITCHELL: Thank you very much for your attendance.
And thank you to Bill Hoffman, who was another panelist, who has been able to listen on line but unable to talk, which is why we didn't actually pass the mike to him.
Thanks very much
I hope you enjoy the rest of IGF.
(Session ended at 10:34 a.m.)
This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings.