Open Data and Data Publishing Governance in Big Data Age

4 September 2014 - A Workshop on Other in Istanbul, Turkey

Also available in:
Full Session Transcript

 

***
This is the output of the real-time captioning taken during the IGF 2014 Istanbul, Turkey, meetings.  Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but should not be treated as an authoritative record.
***
 

>> XINMIN GAO:  Ladies and gentlemen, we are now starting our workshop.  Welcome to workshop number 70, Open Data and Data Publishing Governance in Big Data Age.  My name is Gao Xinmin from the China Internet Society, the Vice Chairman.

This workshop is hosted by Consultive Committee of the United Nations Information Technology under the China Association for Science and Technology.

You know everyone nowadays, the Internet has started into the area in the era of Big Data.  What is the meaning of Big Data?  It has different definitions.  But we all know since the 1990s the data volume has increased tremendously.  And also we pay attention to a lot of applications of the Big Data technology.  It is a newly developed, developed very fast.  Nowadays, almost every day 2.5 exabytes are created.

But Big Data is not the meaning the size of the data.  But also the meaning of the type of data.  There is not a single text data, but also other audio and structural data including video.  Also Big Data means the speed of the change of the data.  Data in and data out.

So Big Data is very important technology, and the application nowadays particularly in the Internet cycles.

Open data.  This concept is not new.  But the formulative definition of the open data is a new one.  Because the open data follows the open trends, open source, open hardware, open data.

In recent years, a lot of governments launch the initiatives:  Open data governments.  It started from 2008.  A lot of governments including in Developed Countries and the Developing Countries have launched a lot of initiative of open data from governments, so‑called the data.gov.

But in the Big Data era and the open data movement, we still suffer some problems, particularly the policy side, governance side.  In this workshop we will focus on this side.  Not Big Data technology or either open data technology.  But we focus on the application aspects, particularly how to design the policy to promote open data applications in the Big Data era.

Our workshop, we invited a lot of distinguished panelists let me first of all introduce them to everyone.  On my left is Professor Zhou Xiang.  He is a researcher in the China Department of science, institute of remote census and the digital earth.  He will make some presentation concerning the use of open data and Big Data technology in the remote census data areas.

Next to Mr. Zhou Xiang is the distinguished panelist Ms. Ana Neves.  She comes from Portugal.  She is very famous specialist in policy making in Internet Governance.  We have a lot of cooperation with her, and she joined us almost at every workshop in the IGF process.  Thank you.

Then on my right side is Professor Chuang Liu.  She comes from the China Academy of Science.  She is a researcher in the Institute for Internet of Geography Science and Natural Resources.  Ms. Chuang Liu is also involved in the CODATA Forum.  It is an important organisation in the area of open data and promoting Big Data applications.

Then on my right hand is Professor Svetlana ‑‑ it is difficult for me to pronounce the Russian names.  She is coming from Russia National Research University, Higher School of Economics.  She is also very researching in the Big Data and open data areas.  She will tell us about the application of the Big Data in education and the business sector in Russia.

Now I want to suggest our workshop will be divided into two parts.  The speakers, panelists will make their presentation.  Then we will open discussion and comments from the audience and remote participants.

If you agree with me, then I would like to invite Ms. Ana to make the first presentation.  Now you have the floor and you can start.

>> ANA NEVES:  Thank you very much.  Good afternoon.  Well, I see my Power Point is a bit far.  So I think I am going to see ‑‑ well, I have a Power Point.  Okay.  Thank you.

So it is a bit, I was saying I'm a bit far from the screen.  So it is a bit difficult to read what is there.  I am going to read my notes because I don't know by heart my presentation.  So, sorry about that.

So the vision of my presentation is the following.  The vision is scientific infrastructure that supports seamless access, use, reuse, and trust of data.  In a sense, the physical and technical infrastructure becomes invisible, and the data themselves become the infrastructure.  So this is key.

Available asset on which science technology, the economy and society can advance.  So this is, I am quoting the European Commission in this publication "Riding The Wave."

I will tackle some issues about this open data and data publishing governance in the Big Data age issue.  So I am going to talk about what means Big Data age.  What does it imply?  The change of the paradigm, science will be organized differently.  What does that mean?  The benefits of scientific open data.  The cross‑cutting obstacles.  What could be a new model with open data in Europe?  And to wrap up, I will present the case of Portugal and what is going on there on open access on data.

So here you have what we have now in life.  What it means, Big Data.  Because this is the reality.  So now Big Data, you know that it refers to large amounts of data produced very quickly by a high number of diverse sources.  Data can either be created by people or generated by machines.  Whether it is geographical information, statistics, weather data, research data, sport data, energy conception data, health data, the need to make sense of Big Data is leading to innovations in technology, development of new tools and new skills.

So you can see here that now we live in a Big Data age, meaning the digitisation of media, the migration of social and economic activities to the Internet, the increasing deployment and interconnection of sensors through mobile and fixed networks.  So Internet of things.  Widespread availability of infrastructure, convergence of devices, functionality and capacity of devices and networks combined with mobility and participatory Web.  It is huge, what I said.  So this new Big Data age is really means a lot.  It means a big difference.

So here, what I want to present is really what it means, this Big Data age regarding the data change and lifecycle.  So you see here a generation of data, the collection of data, the storage, the processing, the distribution and analytics of data.

So it means that innovation will be further driven by open section to data, which also enables consumers to be better informed.  It is not only for researchers and companies.

In addition, social benefits are also expected from collection and analysis of data.  For example, when addressing aging society and natural disasters.  However, the use of data and analytics come with policy challenges including but not limited to the promotion of trust among individuals and consumers and the development of data analytic skills if not supplied could lead to missed opportunities.

Now I am going to talk about the change of the paradigm.  So with all these changes we are facing a new paradigm because exponential growth of open data affects the whole research cycle and its stakeholders.  For example, globalization and growth of the scientific community implies ongoing transition to new ways to perform research.  Research is collaboration.  The increase of knowledge collection which is shared much more easily.  Therefore, science will be organized differently.  Success of the use of digital technologies demand interpretable infrastructure.  Digital technologies and mobile technologies permit great benefits from networking and scale.  But data‑driven research and innovation technology must be efficient to be effective.

So towards a more open interoperability and sharing of scientific information, the importance of open data.  What does open data mean?  So open data.  Well, it means that data that can be freely used, shared, and built on by anyone, anywhere, for any purpose.  They are digital online, they are online, free of charge, and free of most copyright and licensing restrictions.  It will change the way research is undertaken and communicated globally.

So this is something that is not happening.  So what I said, it doesn't exist.  But open data means what I said.

The access to research data will radically transform research disciplines, but we have to find a common understanding, not only of what the application is, but what data is as well.  So in this Big Data age, we have to encourage open data and data publishing governance across society, which implies access to and the reuse of data to maximize economic and social value of the data that we have.  The public sector becomes a source of data and an important national data stock which can be better exploited.  So this already exists with the eGovernment policies.  To promote access to private sector data with the increasing number of private and public initiatives.  It promotes across disciplinary infrastructures, projects, and other repositories and the need of protecting the privacy, but do not forget the utility of Big Data to the society in the future.

So protect but not in a bad way.

So now about the benefits.  Well, maybe you cannot read anything.  So the point here is that the benefits for citizens, funders and policymakers, researchers, enterprise and industry.  So the benefits for citizens are mainly appreciate the results and benefits arising from research and feel more confident in how ‑‑ sorry.  In how their money is spent.  Find their own answers to important questions based on real evidence.  Part of knowledge and experience to others and make a contribution to the knowledge society beyond their immediate circle and life spans.  For funders and policymakers, what are the benefits?  So they will make evidence‑based decisions; eliminate unnecessary duplication of work.  Get greater return of investment.  Benefits for the researchers.  Have all data and tools easily available, increasing the productivity, cross disciplinary boundaries, gaining new insights and producing new solutions and stand on the shoulders of giants.  That is a phrase that I like.

Enterprise and industry.  They use the best available information from the research and development.  They will create new knowledge markets and job opportunities.  Provide a strong industrial and economic base for European prosperity.  Of course, I'm talking about Europe because I am European and I am really working on the Europe perspective and vision.  But this is really a worldwide perspective.

And it will increase opportunities for knowledge change.  What I was saying about Europe is really true because we are trying to follow best practices all around the world.  We are trying to work with Asia.  We are trying to work with North America.  We are trying to work with all countries to see which best practices they have in open data.

And I can see that we are more or less all on the same page.

So the cross‑cutting obstacles.  So it is the other part of the slides that you have there.  Lack of long‑term investment in critical components such as persistent identification.  Lack of preparation.  Lack of willingness to cooperate across disciplines, funders.  Lack of published data.  Lack of trust, not enough data experts.  The infrastructure is not used.  Too complex to work.  Lack of coherent data description allowing reuse of data.

So this is a problem not only of the publishers of data, but of the researchers that they don't want to open their data.

So which are the barriers to change data publishing governance?  So the main obstacle here is to change data publishing governance.  The current reality is scientific results are shared through literature without access to data.  So you have access to scientific publications, but you don't have access to the data used.  So no uniform mechanisms to publish data.  Some communities are advanced in making that available, but it is the opposite in others.  Not for all data.  Not for all disciplines.  I don't know.  So we are discussing this.  It raises several issues about privacy, security, and lack of property rights.

The barriers to long‑term preservation.  So another obstacle.  And finally, the last obstacle that I have here is the different legal and regulatory frameworks within different countries and across continents.

What is happening?  Nothing is there.

Ahh, okay.  Everything is there now.  Ahh, more.  No, too much.  Oh, my God.

Okay.  I've got it.  So again it is a pity you cannot really see what is there.  So it is a new thing that is coming from the European Commission and that I thought it could be very interesting for these workshops.  So the European Commission is proposing Europe Science 2.0, science in transition, areas and issues.  So what we have there is a possibility for a model.  If we use and if we have access to the data.  So if we have a new data policy and so we will have a different data publishing governance in Europe in Big Data age.  So what we have there in the blue part is infrastructure.  We need to invest a lot on infrastructure for Big Data and data flows, of course.  So we need the citizen engagement, citizen science and crowdsourcing.  Open access to result results and processes, evidence‑based policy making, global systems science.

So the point here is accompanying change in culture.  So now to finalize my presentation, okay.  I will say something what is going on in Portugal, what is happening.  So it was not possible to launch yet a policy on open access, on open data access.  But incentive to openness of data and sharing of research data from scientific publications, from publicly funded research.  We recently adopted open access to publications so we have a new policy.  The core of the Portuguese regulations, the Foundation for Science and Technology, FCT, the funded research.  All publications of research output subject to peer review or another form of scientific review should be deposited in one of the open access repositories in hookup, as soon as possible preferably immediately upon acceptance for publication.  This is really good for the open access to publications, but the open access to data, it was not possible.  The researchers didn't allow us.

So the policy on management and sharing of data and other results arising from the National Research Council Foundation for Science and Technology and research is to encourage now research to share primary data and other data within the scientific community by placing the data in open access database within the shortest time possible.  And finally an example of the open access repository where we could deposit data.  So we operate this open access through the national research and education network which has ten gigabits and it is scalable.  We have beyond knowledge library online which is possible, so it deals with the publishers.  And we have a recap that is open access scientific repository of Portugal which presently serves all the universities and all the researchers in Portugal.

And we have a cooperation with Brazil on this area, which we plan to expand to all countries that speak Portuguese.

These are my main lines where we are on open access data in the Big Data age.  I wanted to give you some lines of the difficulty of the long way that we still have to have open data access.  I wanted to give you what a new model could be if this data will be open.  Thank you.

(Applause.)

>> XINMIN GAO:  Thank you, Ms. Ana, for your interesting presentation, particularly you clearly defined big data and open data concept very clearly.  Also you gave us some examples of Europe Science 2.0.  The framework is based on the open data concept.  Also you shared with us the experience, practical experience from Portugal, particularly the open access of the library and other scientific data.  Thank you very much, Ms. Ana, for your contribution.

Now I want to invite Mr. Zhou Xiang to make a presentation.  I think maybe he will give us some ideas and advice on the Big Data application in science and research areas particularly in the remote sensor data, and how to use Big Data technology.  Now, Mr. Zhou Xiang you have the floor, please.

>> XIANG ZHOU:  Thanks, Mr. Gao.  Good afternoon, ladies and gentlemen.  Glad to meet you here.  Today I am going to talk about research on big data, some issues about data in cyber space, data science, and the research Big Data challenges and opportunities, especially on data sharing.

So as we know, cyber space is filled with data.  As the space is filled with matter.  It is an artificial field since it was born.  And there have been various data‑driven activities in cyber space every moment.  So if we notice from the right graph, there has been a steep increase in data volume in recent years.  In 2011, the amount of data being created and replicated worldwide is about 1.8 and the global amounts of data will expect to reach 40 data bytes in 2020.  So it is a very, very huge increase.

So the computers switch routers, fiber optic tables, wireless, satellites, even satellites.  All these components are connected in the network.  And all the researchers as commercial users, individuals and decision makers are important contributors and beneficiaries in the Big Data age.  So not only emerging network and technology, but also open data, all of us are contributors and beneficiaries.  So based on my research background, taking earth, as you know, satellites provide views of the planet which are essential to enable wide public service, accurate weather forecast and also satellite navigation which benefits scientific research.  That's the point I am going to talk about.  So more and more satellites with diverse features will be launched.  So you may know some open data service provided by the United States, also Chinese data.  There are also special satellite designs launched for scientific experiments.  Also Skybox is very, very fantastic project.  It will launch 48 small satellites to acquire tons of data which will be used in different fields.

So satellite data not only supports routine public service, but also benefits scientific research.  So in a simple way, data cycle includes data creation and transmission, data processing and data application.  All these are important stages of open data.  Here you see is the elevator site of Chinese methodology satellites.  There are many, many satellites from different countries.  These satellites circulate the earth continuously and accumulate massive data with wide spatial and temporal distribution.  The data from these satellites will be important input for climate change research.  Also wiser forecasts, and so on.  This is the satellite for 2013.  This means we have accumulated data from different sources.  All this data, the modest data, inside the data they provide open data service to researchers and the public can download and use this data freely.

This is an example captured by 3A.  This is an image of an inside flight.  So you can see the river, the various streams of the river.  Actually in 2013 it experienced severe flooding in northeast China.  This is the image after the disaster.  You can see the ecology of the flooding.  But if you see the other images before the flooding.  So it is very different.  So this is only the data resolution of 250 meters.  If we get more data of high resolution, we get much more information and much more useful knowledge for this decision making, and also for the public.

Generally speaking there will be more open data with higher resolution in cyber space.  It has become indispensable, starting a resource to support scientific research and various application services.

 In Big Data, research and science are a popular topic.  What will we do for scientists to allow them to convert data to techniques in many fields like what we are familiar is data wearing house and high performance computing.  Data science is merging to meet all challenges of processing very large data size and also unstructured or semi‑structured Big Data.  Why am I talking about Big Data, especially in scientific processing?  Because the computing capacity.  Not only in technology, but also open data we get from the website can support this kind of research.  You can see many publications and journalists reports provided by the government and research like CODATA.  It has data science and forecast reports, and also nature and science already issued a special issue about Big Data.  And they are very famous for this diagram.  For the United Nations in 2012, the United Nations released a larger data for government white paper and the governor proposed a new definition for Distinguished Delegate.  As we can see, data analysis has become the foundation behind science theory, experiment and computation.

So as Ana talked, there is a definition for Big Data, except volume, velocity, variety and value.  We also have some special considerations on the scientific big data Bates is not only in the description of the larger data site or data volume.  It is not also the process with existing technology.  Also the scientific Big Data it goes rapidly but it is not staged.  It is social or community co‑building because there is still an initial stage.  We are at the very beginning.  What causes this?  Because of some very special characteristics.  As we know, the scientific research requires different theories, and the practice of data unlike the ordinary with which we are familiar, the day life information on the website of Internet data.  Also there is very strong balance in different disciplines.  Different disciplines has different requirements on data research and applications.

So the first characteristic of scientific Big Data, we call this high dimension.  Generally scientific Big Data has high dimension and it has to reflect complex process and relationship of natural and social phenomena.  It has very special, very high requirements on the data representation.  It is quite complex.

Taking the example, if we want to do something about spatial and temporal analysis of social economic phenomena, we have to use a lot of data.  This data can be natural and geographic data with different spatial coordinates and spatial reservation data from different sensors or with different spatial and temporary solutions, and the social economy and various physical significance, and some of the data are of very small size.  Most of them are very huge, like remote sensor images.  The images can reach one gigabyte in data volume.  Also this is not only requires the data sharing but also has a high requirement on the computing capacity.

So here you can see this is the diverse detective means for Jones Science Research.  We have so many sensors and we have so many data to deal with to proceed.  In diagram shows data sharing for the scientific research.  We collect data from different resources.  This is an example for satellite data.  And we also include data from shared resources and produce the semantic and finally it was converted to the data products.  It is a circle from the data to information and knowledge.  Also there are other characteristics of scientific Big Data is high complexity.  So most computations are relayed to the scientific research, and it is connected largely through a system with highly complex data models.  It requires a combination of complex systems, estimations theory and also disciplinary or cross‑disciplinary to explore the data solution.  So here is the other example for the ecosystem and the environmental studies.  So as you see, all the data can be used for this kind of research.

So this is an example of global meteorology observation.  Data networks which includes the Geonet Cast developed by the United States and the Cast from the European Union and the CMI Cast from China.  Almost you can see this system almost covers all the continents and ocean area with the land adviser system.  The next slide.  Science mostly is conducted in large collection systems with complex data models.  So it will cause high uncertainty.  This diagram shows this is kind of uncertainty because of the model and the artificial model in the system.  And scientific data from different sources may exist errors and this leads to high degree of uncertainty for the research.

So there are also some innovation issues in the governance issue when we promote scientific Big Data and research.  Something about the data model and the storage is about data integration, distributed storage modeling and dynamic changing.  We need to research some intelligent analysis of complex data which means analysis of massive data and increasing data analysis also for capacity computing.  It is a parallel between processing and computing.

It is arose some governance issue about data quality because data from different resources, normally from different scientists.  So if we want to promote the data sharing over the network, there must be a unified framework, pro standard to accelerate this kind of data sharing.  Also it is difficult to validate the data from different sources.  So there must be some validation method and principles for the data validation and calibration.

Some issues about Big Data security and also because different scientists are doing their own work.  So the scientific result will no longer belong to the personal researcher.  So it is the entire property protection issue that we need to think about.

So in my conclusion, Big Data for scientific research, it requires not only advancements in innovative technology like data modeling storage but also on data sharing and governance issues.  So in Big Data age, scientists have a strong design and access to open data.  As we know, National Institute Of Health of the United States, NIH, already issued the data sharing policy for all the genomic data and all the projects funded by the NIH will provide data sets during their project.  So it is very great step for the data sharing for earth sciences, there is more important roles in the future.  So in cyber space, almost all the area, more openness means less obstacles and more beneficial to the leaders in higher technology.

So networks hope that the technology for multi‑stakeholder is not only in data sharing but also modeling sharing.  I have news here in the implementation plan for the future ten years, global earth operation systems, the mobile lab was proposed in the Geodestination 2025.  So by analysis, this important issue for research Big Data, we hope the Internet will expand knowledge and reduce the technology gap between the countries, and also promote balanced development that benefits human beings with science advances.

Okay.  These are the reference links for my presentation.  Okay.  Thanks.

(Applause.)

>> XINMIN GAO:  Thank you, Mr. Zhou Xiang.  He clearly explained what is the differences of the Big Data in the science and research area from the Internet area.  I think it is more complex, more requirements, more diversity requirements, from different disciplines and sectors of science.  Also he gave us a very good idea from the application in remote sensor areas.  So I think your presentation is very useful for us to considering of the policy design.  Thank you again, Mr. Zhou Xiang.

Now I would like to invite the next speaker, Professor Svetlana from Russian.  She is the Dean of the faculty of the Business Informatics University, Higher School of Economics, Russia.

The first topic is Big Data and the new opportunities for education and business.  Please, you have the floor.

>> SVETLANA MALTSEVA:  Thank you very much.  First of all, I want to thank for the invitation to participate in this very interesting discussion.  I will try maybe first of all to build some bridge between Big Data age and open data.  Because when I prepared my presentation, I thought about those two issues.  In the age of Big Data, open data becomes big.  They have the same principles.  I must say that only one feature is critical for the difference between those two paradigms.  This is veracity.  Because when we talk about open data we think that those data are proven data.  So people, organisations who will use those data may use those data and don't think about their veracity.  But in the Big Data age we must understand that open data will include not only data but also knowledge.  Knowledge and also the results of Big Data analytics.  And we must think in this case about confidence in analytical methods.  We must think about the source of this data and we must think about the results.  I think those two paradigms are useful to each other.

Maybe Big Data services will be more correct if we will think that those services will produce data information knowledge for open space, for open information.

It is first part of my presentation.  I want now to discuss the Big Data implementation in different aspects.  When we think about disciplines of this implementation, we must take into account several important aspects.  First of all, it is a special feature, special characteristics of the area of economics.  Economics or social areas.

Second aspect is environment.  And then of course maturity phase of technology.  Because today, of course, many organisations, many people tell that they use Big Data in analytics, but what is their level of this analytics?  What is their tasks, the problems for which analytics is used?  Then, of course, what about expected effects?

First of all I want to consider the business applications because I'm Dean of Business Informatics Department and our department works with many Russian companies, such as companies for IT not only Russian but such famous companies as IBM, Microsoft, EMC.  And we first of all think about applications of Big Data.  So first of all, I will look at business applications.  It is two very interesting parts of discussing the difference in big projects.  Because Big Data may not be open data.  It may be data and results that are not available to each.

For such a short story, I take the very interesting research from this site of such companies tableau.  I must say the tableau is the highest company in Big Data analytics today.  This is the result of a survey of companies from different sectors of the economics.  And you can see on this slide, I must think you all can see it good that there was one question in which sphere and how important for you to use Big Data analytics instruments.  And we can see that more important parts of the activities is decision making, accounting and marketing and communications.  I must say in the Russian sector we do such investigations.  And we saw that of course marketing and finance, first of all, risk analytics for finance sector is very important in really existing sectors of Big Data analytics in Russia.

If we look to the social sphere, first of all I want to emphasize such service as health, education, services in housing and of course Big Data has now a significant influence on this sphere of science and culture.  And just now we heard very impressive presentation about its influence on science.

But I must say that despite the interests of technology, really not a lot of companies are able to use it in practice.  And also we tried to use current research to the understanding of this phenomena.  And if we will look at Big Data as an innovation, and this is innovation.  It is innovation for each enterprise, for each sphere because this innovation demands new organisational structure.  It demands financing and so it is innovation.  You can see that only big companies with well developed infrastructure, not only IT infrastructure but infrastructure maybe in the environment, in human resources can implement it.  It is also important to note that companies may be on a different level of maturity of the technology.

To show this I used a very interesting paper and chart of Bill Smarts.  He shows that there are five stages, five levels, and I must say now different company, different spheres are on different levels in this chart.

But as Professor of health economics, for me maybe the most interesting sphere of Big Data and open data is education.  So maybe I tell more about Big Data in education.  Education processes.  I myself, I must say that education process involves the creation and processing of large amount of diverse data.  Common Big Data technology drives new changes in education.  They become maybe necessary for developing new approaches to education processes, mostly due to new analytical tools. on this slide you see that maybe today, first of all, in educational organisations we can see academic analytics.  These analytics based first of all on some indicators, on some indicators.  And there is a gap between those indicators and really learning processes.  It is very important for educational organisations to make a bridge between data of learning process and academic analytics.  Then we can see real analytics of educational organisations.

And Big Data gives us those opportunities.  What may be more learning opportunities for analytics, maybe it is not easy to read there the text on this slide.  So maybe I can say that this is improvement of classroom experience through personalization, identification, at‑risk learners, self and peer grading, predicting student performance, smart curriculum, discourse analysis, estimation, openness, and so on.  Of course, improving of attractiveness of education services through consulting and advisory services in choice of post‑graduate education and work patterns and success or failure.

What is critical for implementing all these opportunities for to the existing education system.  First of all, there are collecting data collecting procedures and tools.  I must say that in this part, there is many problems.  Not only technical problems.  Maybe they are not the most important problems.  The most important problems is the problems of human rights because maybe some students or some lecturers is not ready for collecting data about their behavior.

Next.  I think this is a very interesting topic for different aspects for different instigations.  Of course, education must have big data skills.  And those Big Data skills must have not only lecturers, professors, but students, too.  It is a problem to make students and professors, teachers to make them data scientists in some area, educational area.  Of course, I said that we must build a bridge between the learning analytics, learning processes and academic processes, academic analytics.  We need new metrics of quality for the educational process and the educational institutions.  Of course, we need integration of Big Data analytics with modeling tools.  The models of learners, teachers and all education processes.  We must have those models because we need feedback.  And feedback is critical for realizing new educational paradigms based on data‑driven processes and data‑driven education.

I must emphasize the critical areas.  Critical areas that we must take into account and that maybe need not decisions but first of all maybe need some philosophy of those decisions.  Of course, it is transformation of information management.  Traditional information management.  Maybe it must be information government and maybe not only information government but knowledge government.  So I think we need new models.  We need new rules.  We need new ideas for that.

Of course, we need so‑called Big Data culture.  Big Data culture may be a very important thing because these tools, those tools of Big Data, Big Data paradigms can transform information space into a space of knowledge.  And those knowledge will be open in open data paradigm, in using open data paradigms.

This can be both a source of more correct decisions, new decisions, very good decisions.  And at the same time it may be a source of false decisions if we will get not correct knowledge.  And I think that special roles here belong to representatives of science, culture and education.

Thank you very much.

(Applause.)

>> XINMIN GAO:  Thank you, Professor Svetlana, for your presentation.  You very clearly make some analysis of the Big Data application in the social and economy areas and particularly in business and education areas.  So I think this is a very interesting presentation.  Thank you very much.

Now I would like to invite the last speaker, Professor Chuang Liu.  Her topic is CODATA, publicly funded data sharing for science and in Developing Countries.  I think Professor Chuang Liu is a Professor in a lot of the activities here.  She was involved, and I think her presentation is very useful for us.  Please.

>> CHUANG LIU:  Thank you.  Today we talk about data.  So before the data there is a core, a core ‑‑ there is CODATA.  There are two means.  One this is an international organisation under, and there are 46 years old.  Another one is data in the Big Data era.  So we needed coordination together.  So we give a little bit more time to the CODATA.  This, all of this is ten years between the WSIS Phase I is 2003 in Geneva.  And WSIS Phase II is Tunis in 2005.  This year is between those targets.  Next year, I think for IGF is very important.  The tenth year, ten years old.  Also for WSIS Phase II is ten years also.  So I would like to have a summarize and then from the experience what we have already done.  So we can understand how open data and data publishing.  So you will see in 2003 in Geneva there are two very important documents published.  One of them is the Geneva Declaration.

In this Geneva Declaration, there is a very important sentence, in the development of the information side.  And in the second phase in the Tunis phase, CODATA on behalf of the International Council for Sciences there is permits and higher action agenda.  The three minutes in the speech, the Professor (Yvada), he was president of CODATA.  There are three minutes in the Plenary.  So the permits of CODATA will try to help in open data, data sharing and also reduce the digital divide.

And so what is our actions?  What we have is seven action lines.  What is our actions?  First of all, we have a focus, we have two focuses.  One is the public funded research data sharing for benefiting all.  So there is data, there is so many data.  There is private data, joint venture data and different kinds, but we are scientific communities focused on the publicly funded data.  How we are dealing with this kind of data?  How many of this kind of data benefit all society?  This is one focus.  Second one is the bridging the digital divide from the efforts of international, national and case leveling.

We have several action lines.  One, we have the team.  Establishment of the CODATA, team.  This is before WSIS.  So it is established in 2002.  That is the focus before the WSIS actually.

In two years each two years for review to evaluate whether this task group is good or not.  Whether it is better to evade or keep going.  Each two years we pass an exam.  2004, 6, 8, 10, 12 and this year we have another evaluation.  We are very confident we will pass this.

And then we have the teams, we have the Co‑chairs from China, from the U.S., Germany, Kenya.  So different terms.  They are allowed ten years.  Then we have the members from 17 countries, some of them from in the country but most of them working in Developed Countries.  And that is what we are working towards together.

Next we cooperate with the worldwide stakeholders.  Because for Developing Countries we have the common difficulties.  So we need money.  We need support.  We need everything, most of the things in this country.

We are stakeholders worldwide.  We work with China, U.S., Germany, and also international negotiations.  Asia‑Pacific, international academy.  GEO.  Systems, OECD, UNESCO, and whoever, who are interested in these issues, we work together.

And the alliance ‑‑ there are three.  Action line three.  We participate in WSIS and the follow‑up activities.  So in 2003 we participated in two.  In March we had UNESCO.  In Paris, we participated in the pre‑workshop.  And then in Geneva.  Also in Tunis, the same thing.  We have a pre‑workshop and then in Tunis.  Not only we follow with participants, we each have follow‑up action.  After the WSIS and the U.N., initially there are two programmes.  One is IGF.  Each year we have one forum.  Another one is U.N. gate.  This is the global alliance for ICT and development.  But this programme is closed in 2010, but we participated in each of them.  This one, as you can see, you can see with Ana and Professor Gao and we have remote calendar here.  So we work together each year.  We participate in IGF.  We identify what are the issues.  We need to work together, identify what are the challenges and how to deal with the challenges.  What is our opportunities?  How can we solve all these difficulties?  And then what could be our production?

We discussed a lot about the principles, guidelines, best practices and so on, all related.  We one by one, one by one, each year we follow this.  We keep going.  Keep talking.  Keep making our voice to the U.N.  We say this is our voice.  Please, pay more attention to ours.

Then another action, we have country series, a regional series of international workshops focused on the challenges and the needs from the countries and the regions.  And data sharing strategies and policies.  We have several workshops in China, Beijing/Shanghai.  In South Africa, Pretoria, Latin America, also Cuba and Colombia last year, and this year in Kenya also.

So we have country series and the focus on what the developing country needs with respect to open data.

Not only, we also work together with the UNBESA.  This is, we have eSize and eGovernment.  How about eGovernment being open and research data open?  How can we work together?  I think, from UNBESA, and I'm very glad we have his representative here.  We work together.

Another action is capacity building.  Training programmes.  We have one month training programmes, every two years in China.  At the end we have two days, to seven days a short training workshop.  We are mostly training trainers together.  And in Mongolia, Colombia, China, Kenya, and several places.

Also we have one to three years university degree training programmes for Developing Countries in China.  At the university.  Also three to six months of exchanges scholars programme.  All this is for Developing Countries so we can reduce the digital divide from the capacity building, like when Developing Countries grow up gradually.

We also have this training programme last month in the UNCD in every technology in Kenya.  The Minister of Kenya in the Information Technology give very high comments and at the end they launched the national programme to establish a data center in (Jinquot) in Kenya.  This is very, very good news.

Another action item, we have cases and best practices.  Because of time, I listed several cases and best practices.  One is the progress of national science data sharing strategy and policy changes in China.  And also not only policy and strategy, but infrastructure progress.  Another one is Mongolia and Kenya.  And participants formally joined the CODATA as a national member so they can get knowledge from the international organisations.  Another one is South Africa establishing a world data center on public health and environment for providing data sharing services.  And this is supported by ICSU.  Another case is Kenya establishing a data center for services not only in Kenya but in East Africa.

Another best practice, so we have the support not only of the strategy and policy but we have actions, for instance in the case of earthquakes, a very quick response.  We have collaboration and so on and in the end we have very good comments from the U.N. that this is really something that the U.N. needs.

And another case, another best practice is open knowledge about the environment.  We have the museum which will join the effort of International Geographical Union, CODATA and the Information Society of China.  This is a joint effort and it is fully open and available.

So this is how we collect all this worldwide related geography and regional and environment and we said that this is linked to the base of knowledge.  And this agenda is inclusive mechanism of open knowledge environment.  And the contributors, they are more than 200 contributors worldwide, including a Nobel Prize winner and also President of Indian Academy of Sciences.  And many Professors, and even kids, students who contribute to this.  This is fully a practice and Geneva tradition.  Inclusive and everybody is free to join and share in this.

Another best practice is the global change and research data publishing and repository.  This is the most of the data.  When we talk about the data in Big Data issues in June in year in Beijing.  The scientists decided that everybody wants to share somebody else's data.  Very few people want to put their own data online to share it with others.  So this is why.  So they need to find the key, where the problem is because they are not recognizing, scientists that publish their papers, they are very active.  They publish their data, that's the goal.  This is a problem.  So we need to find the key and at the end we solved this data publishing.  Per review data publishing.  Not only publish the data but publish the data and its related information.  Publish the data, the paper and the metadata also.

This is our platform.  We work not only for China.  This is operated by my Institute and the Geographies of Chinese Academies of Sciences and also, for example, (Jinquot) has joined.  This is the platform part of the infrastructure for Developing Countries to publish their data.

So another line is the last part, very important, is last month we passed this data sharing piece principles and guidelines.  Then we called this the Nairobi data sharing guidelines.  The workshop was CODATA, the Ministry of Information and Technology of Kenya, and the UNESCO, and so on.  These organisations all agreed we need to publish this and practice Nairobi data‑sharing principles freely, openly, with safety in mind.

So already we are not ten years.  We have several conclusions.  One is this efficient coordination and cooperation on data is needed.  So CODATA should continue to play the leading role in the world and these other stakeholders to reduce, enhance data sharing in Developing Countries.  And also openness of publicly funded research data and information has great benefit, and if we continue to do so for sciences and sustainability.

Also openness of public funded research data should work together with the government data, publishing of government data together.  So we also in the next coming years we should work together with UNBESA.  Also keeping the openness is one of the critical issues of IGF in the coming years.  It is necessary.  It is one of the Key Issues for reaching the post‑2015 sustainable development goals.  Thank you.

(Applause.)

>> XINMIN GAO:  I think Professor Liu, we all recognize that CODATA is more open to the open data principles and they also follow the Big Data culture.  Thank you very much for your presentation.

Okay.  Unfortunately, the time for our workshop has run out.  I still want to leave some time for the audience.  Raise your questions and make some comments.  I also would like to ask remote participants if there are some questions.  Please, tell us.

Now, open for the audience.  Okay, please.

>> AUDIENCE:  Louise Bennett, BCS, U.K.  Just two points I would like to make very quickly.  One is, I think in your definitions of open data you have left out something very important.  That is that Big Data and open data sources nowadays are largely unstructured.  That requires a very different analytical technique from old statistical data sets.

To take the meteorological example, I'm a meteorologist.  When I first started, I plotted the meteorology of each station as it came in each day and drew a map.  Then we had satellite.  I was trying to forecast floods.  Now I would use things like use tube which is completely unstructured.  I don't know where it is going to come from.  That requires a very different analytical technique.  I think it's very important to consider that.

The other issue that was only vaguely touched by Ana in her presentation is somewhat the problems and they are to do with the ethics of using big open data sources if they include personal data.  I would recommend to you a publication that just has come out in the U.K. between the Office of National Statistics and the Medical Research Council for how to do analysis of big open data sources of everyone's medical records for medical research.  But reassuring the public that they will be properly protected and ethically scrutinized.

I think that's vitally important in open data analysis that is using personal data.

>> XINMIN GAO:  I think, could you summarize?  You have two questions.  Briefly.

>> AUDIENCE:  My questions would be:  Do you agree that a lot of open big data is unstructured and therefore requires new analytical techniques.

The second one is if the open data you're using includes personal data, you need to consider ethical issues.  I suggested someone who is on a publication that I think might help you if you do agree with that.

>> XINMIN GAO:  I would like to ask Mr. Zhou Xiang, the first question and then Ms. Ana, you can answer the second one?

>> ANA NEVES:  Well, I can only say that I totally agree with the comments.

>> XIANG ZHOU:  Yes, I have quick comments.  I totally agree with you.  As you know, taking the information with remote sensor is definitely unstructured.  Maybe to some extent it is requiring very expect technical data modeling, data processing.  And that is especially even a sensitive application.  It needs different data from different sources like socioeconomic data.  Otherwise it will not play very important roles for the decision makers and public users.  Thank you.

>> XINMIN GAO:  Okay.  Ana?

>> ANA NEVES:  No, I already said it.  I totally agree with the comments made.

>> XINMIN GAO:  Okay, okay.

From the remote participants, there are some questions or comments?

>> REMOTE MODERATOR:  It's right here.  To Professor Liu, it seems that sharing is a little bit difficult.  What is the main problem?  Thank you.

>> CHUANG LIU:  The problem is the data contributor, they are not credited or rewarded.  That is the issue, why they don't like to share the data.  But like sharing publications, we can properly solve this problem.  Now DOI is the international standards, DOI is a help in the technical help.  I think the money publishers will work on this.  I think that's great progress.

>> XINMIN GAO:  Are there any other questions or comments?  I see none.  So I want to conclude our workshop and I think this workshop is successful and all the panelists made very useful and interesting presentations for us.  And they raised the importance of the Big Data and open data concepts.  And also they raised a lot of ideas for improving and to encourage Big Data applications in different areas, including education and certain research areas.  Also they suggested it should be considered, the differences from the Internet application to Big Data and also research and education fields.  I think in the education and in research, scientific research areas are more complex.  I agree with Ms. Liu's viewpoint.

I would like to thank all the panel for your contributions and also thanks for all the participants here and the remote participants for joining us.  Thank you very much.  Now I announce the workshop has concluded.  Thank you very much.

(The workshop concluded.)

(CART provider signing off.)

 
***
This is the output of the real-time captioning taken during the IGF 2014 Istanbul, Turkey, meetings.  Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but should not be treated as an authoritative record.
 

***