Open Economy: PSI reuse - Las nuevas reglas de la economía de la participación

Mostrando entradas con la etiqueta PSI reuse. Mostrar todas las entradas

miércoles, 31 de agosto de 2016

What if we could calculate our own real-time customized “official indicators”

Note: This article is a translation of what I wrote in Spanish for the International Open Data Conference 2016 blog. You can read the original post in spanish: ¿Y si todos pudiésemos calcular nuestros propios “indicadores oficiales” personalizados en tiempo real and in english: What if we could calculate our own real-time customized “official indicators”?.

Almost all governments worldwide and multilateral institutions such as OCED, the UN, the World Bank or the European Commission began their open data policies with the release of the statistic datasets they produce. Because of that, we have a big amount of indicators we can work with in reasonably accessible formats to study almost any issue, either environmental, social, economical or a combination of all these aspects.

Besides providing us with the datasets, in some cases they have created tools to access the data easily (APIs), and even applications that help us work with the indicators (visualizations).
These indicators follow periodic cycles which can happen monthly, yearly or even multiannual perdiods due to the high cost of their production. In general, the methodologies used to calculate the indicators are not available for citizens. In the best-case scenario, they are documented in a very superficial way on their fact sheet.

Photo by William Iven

Now let’s imagine for a moment that the national social security systems, the company registers, the customs registers, the environmental agencies, etc, release the data they hold as open datasets in real time. One of the effects we could easily imagine is that a lot of indicators that these days are being released periodically could be known and, even better, explored in real time.

Besides, this would remove the possibility for anyone to get privileged information, considering that we all could have the same ability to analyze the evolution of the indicators to take our own decisions. Even more, we could customize calculations according to our own particular situation by working on the methodologies.
The fact is that in many cases, the production cycle of some indicators could be shortened until we get closer to ‘real time’, and the cost of production could be reduced greatly as well thanks to open government data.

Even though this is a big step ahead, I don’t think we should settle down with having the indicators as open data; we should aspire to examine the open datasets and methodologies used to calculate these indicators and even customize them, because if conveniently anonymized there is no reason for them not to be released as open data.

sábado, 20 de agosto de 2016

Some very simple practices to help with the reuse of open datasets

Note: This article is a translation of what I wrote in Spanish for the International Open Data Conference 2016 blog. You can read the original post in spanish: Algunas prácticas muy sencillas que facilitan la reutilización de conjuntos de datos abiertos and in english: "Some very simple practices to help with the reuse of open datasets".

In the past few years, an important number of materials have been published to help data holders with that release of open government data. In his article “The good, the bad…and the best practices”, Martin Alvarez gathered a total of 19 documents, including guides, manuals, good practices, toolkits, cookbooks, etc. Different kinds of authors have invested resources in the development of these materials: national governments, regional governments, foundations, standardization organisms, the European Commission, etc.; hence the number of different perspectives.
On the other hand, a large amount of effort is still being made in the development of standards for the release of open government datasets, either for general purposes or specific domains.

Photo by: Barn Images

Too often, however, very easy rules that facilitate sustainable open datasets reuse are forgotten when datasets are published. I am just mentioning some of the obstacles we often find when we explore a new dataset and assess whether it is worth incorporating it to our service:

Records do not include a field with a unique identifier, which makes it very difficult to monitor changes when the dataset is updated.
Records do not contain a field with the date when it was last updated, which also complicates monitoring which records have changed from one publication version to the next one.
Records do not contain a field with the date of creation, which makes it difficult to know the date each one were incorporated to the dataset.
Fields do not use commonly agreed standards for the type of data they contain. This often occurs in fields with dates and times, or economic values, etc…but is also common in other fields.
Inconsistencies between the content of the dataset and its equivalent published on HTML web pages. Inconsistencies can be of many types, from records published on the website and not exported to the dataset to differences in fields that are published in one format or the other.
The record is published on the dataset much later than on the website. This can make a dataset useless for reuse if the service requires immediacy.
Service Level Agreements on the publication of datasets are not specified overtly. It is not that important to merely judge those agreements as good or bad; what is really important is that they are known, as it is very hard to plan data reuse ahead when you do not know what to expect.
These elements are not provided: a simple description about the content of the fields and structure of the dataset, as well as the relevant criteria used to analyze that content (lists of elements for factor variables, update criteria, meaning of different states, etc.).

As you can see, these practices are not necessarily linked to open-data-related work; they rather deal with the experience in software development projects, or simply with common sense.

Even though most of them are very easy to implement, they are of great importance to convince somebody to invest their time in an open dataset. As you may know, dealing with web scrapping can be more convenient than reusing open datasets; And these are a few simple practices that make the difference.

sábado, 6 de agosto de 2016

How far should a public administration go with regard to the provision of value—added services based on open data?

Note: This article is a translation of what I wrote in Spanish for the International Open Data Conference 2016 blog. You can read the original post in spanish: ¿Hasta dónde debe llegar la administración en la prestación de servicios sobre datos abiertos? and in english: How far should a public administration go with regard to the provision of value—added services based on open data?

Last Monday I took part in the panel “Reuse of EU open data: challenges and opportunities” during the Reuse of EU legal data Workshop, organized by the Publications Office of the European Union. One of the interesting issues that came up during the panel (you can watch it here) focused on the well-known question: How far should a public administration go with regard to the provision of services based on open government datasets?

The discussion in the context of fostering open government data reuse, arise from the difficulties of finding a balance between the services that every public administration must provide to citizens for free and the space that should be left for private initiatives to create value from open government datasets. In many cases, that unstable balance creates certain tensions that do not contribute to innovation.

In the past few years, I have heard numerous arguments both from the supply and the demand side. These arguments show positions from one end: “public administrations should only provide raw data and no services;” to the opposite: “the public administration should go forward in the value chain as much as possible when providing services to citizens.”

My position in this matter, which I had the chance to defend during the debate, is that it is not useful to work in drawing a red line between what should be considered a basic/core service and a premium/value-added service. Quite on the contrary, we should work on the definition of the minimum incentives that should be designed for opendata-driven innovation to flourish and deliver wealth creation.

Photo by: Rodion Kutsaev

For that reason, I used the panel to make the following statement, which could be a starting point to clearly define the minimum conditions that a reuser needs to create value added services:

“open government datasets should be released in such condition that a reuser can build the same services that are provided for free by the data holder.”

This is basically because, in many cases, value creation starts from that baseline; this is, from just improving a service that already exists. If an existing public service cannot be reproduced, for example due to a few hours delay in the release of the underlying dataset or because of the limited quality of the released data, then it will not be possible to innovate by launching and improved product or service to the market.

In my opinion, this approach to the issue can help us make some progress in this debate. I hope this first statement can be improved and better worded by contributions from the community, or otherwise proved wrong by evidence or better arguments than my own.

martes, 15 de marzo de 2016

Let’s open more datasets, because what could go wrong?

Note: This article is a translation of what I wrote in Spanish for the International Open Data Conference 2016 blog. You can read the original post in spanish: Abramos más conjuntos de datos, ¿qué puede salir mal? and in english: "Let’s open more datasets. What could go wrong?".

In conversations between members of the open data community, especially those responsible of providing data, one often overhears statements such as “it’s necessary to stimulate the demand for open data,” “we can’t reach the reusers,” “it would be interesting if data providers and reusers talked more.” I am sure that you have heard such statements in many occasions.
Most probably, this uneasiness is not unknown to the IODC organizers, whom need to be aware that previous editions of the event have mostly been focused on what is usually called the “supply side,” this is the public organizations in charge of the custody and providing groups of open data. What is true is that in Spain, possibly due to the fact that it is the Ministry of Industry the one that promotes open data policies, it has always been encouraged that reuse companies are very present in events about open data. And this will surely be noticed in the program of the 4th IODC next October.

However, I would like to tell you a secret that could help understand why, apparently, there is no such long-awaited open data demand: it turns out that for reuse companies, it is often more productive to obtain data from the web than using open data portals. Unfortunately, technologies for data extraction from documents have advanced in recent years much faster than the existent datasets in portals.

Even though it is quite inefficient and we may not like it, currently it is the only possible way in many sectors for companies to generate data value. In other sectors, when there is no published data, neither in documents nor in datasets, there is no demand to stimulate. Companies, especially small companies, survive on the value that they can create and sell today, not on future promises.

If you were a company, where would you put resources? On an open source library to improve a data-extraction algorithm for PDFs or taking part in circular arguments about the best way of opening data?
In my opinion, as I am on the “demand side,” I would like IODC 2016 to be a turning point, not as much as to define more standards, more indexes and policies and laws, but to obtain a publication agreement of more useful datasets.

If we actually aim to encourage innovation and creation of value from open data, I suggest we flood portals with useful datasets. What could go wrong? Actually, much of these data are already inside published documents on the web, and much effort is being put on extracting and cleaning them when it could rather be put on creating data value.

domingo, 7 de noviembre de 2010

Quick guide to Opendata EC Public Consultation, the on-line survey on the PSI Directive

The Digital Agenda for Europe lists the revision of the Directive 2003/98/EC on the re-use of public sector information (PSI Directive) among its first key actions. It is worth reminding the key role of Spanish Presidency including PSI reuse in the Granada Declaration showing a commitment with promotion of open data that we all hope it will be followed with some action.

In September European Commission opened a public consultation with the purpose of gathering information from as many sources as possible on their views on the review of the PSI Directive. The consultation will feed into the debate on possible policy options that should be considered for the review, and will contribute to the impact assessment that will be carried out subsequently, associated with proposals for possible legislative or other measures.

As you all know a public consultation is a regulatory tool employed to seek the opinions of interested and affected groups in a certain matter. Gathering their views, opinions and contributions using the Internet both Member State administrations and EU institutions can understand the needs of citizens and enterprises better.

The PSI consultation document is published only in English but responses are acceptable in all EU languages, as it is not stated otherwise on the consultation documents themselves. So please do not think that being a non-english fluent speaker is a barrier to participate.

Werther you are a government, a public sector content holder, a commercial or non-commercial re-user or other interested parties your contribution is really important because your view will feed on the review of the PSI Directive.

The consultation includes questions divided on several blocks:

the PSI re-use context and possible action to consider,
substantive issues regulated by the PSI Directive,
practical measures,
changes that have taken place and barriers that still exist
other issues to comment regarding the review of the PSI Directive.

It will take you some time, perhaps over 30 minutes to make a good contribution, although it is stated that it is only 15 minutes to complete the survey. But is it worth the effort. You can answer online survey on the PSI Directive but I should recommend to have a look at the pdf version of the consultation document, in order to have a complete view, before answering.

It is also worth noting the commitment that the Commission Vice-President for the Digital Agenda, Neelie Kroes is showing with the open data community. As she pointed out, much of Europe's PSI is insufficiently or even sometimes not exploited, which means losing out a great opportunity to generate innovation.

The replies to this consultation will be published on the Commission's PSI web site. The consultation will run until 30 November 2010 and 3 weeks before closing, EC has gathered about 350 responses, which is not much, comparing with the importance of the matter. So please contribute with your views to build a more innovative Europe.

viernes, 21 de mayo de 2010

Weirdest reason against open data in Spain

While I was working in my presentation for the PSI Meeting 2010, where I will be the 8th June representing Euroalert, I remembered the weirdest argument I've ever heard against Open Data. I'd like to share it with you because it is a really good (though disappointing) example of what many public organizations might be thinking about their data.

I was in a meeting with a public authority, trying to reach an agreement that would let Euroalert the re-use of public information they manage and that it is not open yet. Well, they really think it is open data because it is published in their website. So, again I had to explain that webpages or PDFs are not true machine readable formats. But this misunderstanding is quite usual and surely you have heard this many times. What really knocked me down was what came after several other evasive and not well fundamented arguments. It was something like this:

"If we unlock the data and you (and other companies) develop a service that improves the features we are providing for free, that will harm small companies because they might not be able to buy it"

I cannot be sure if they really believe that by keeping data locked they are helping anyone but I had the feeling that they were against the idea of companies making money with public data. Anyone had this type of discussion? I will be very interesting in sharing views on the subject.

By the way, in the presentation in the roundtable "Turning the reuse of public sector information into new business models and innovative services", I will be talking about the difficulties the open data movement is facing, when it comes to enhancing new and innovative business models. (More insights on the matter from the Chamber of Commerce of Stockhom at ePSIplatform)

My point is that on the one hand we have the Linked Data dreamers and on the other hand we have to deal with PDFs and HTML with partial data and civil servants that fight against innovation. And we will loose most of the potential of PSI reuse if more action is not taken. Otherwise a Open Data will be just a fancy playground with old datasets running over online maps. Beautiful but not very useful for companies.

But this discussion is not today's objective. I will publish the slides in Open Economy and probably the Euroalert team will make a good coverage of the event in the blog. In the mean time say with me: RAW DATA NOW!!! (I really recommend Sir Tim's TED talk on open data and the next web)

sábado, 3 de abril de 2010

Open Data Movement: Free our Data

It has been several months since I wanted to talk you about the Open Data Movement, specially after my participation in November in FICOD 09 representing Euroalert.net, in the roundtable about creating value through the reuse of public sector information.

During the past few months we have been living kind of an Open Data Rush that started with Obama's promise of government transparency, which in terms of data openness ended, among others, in Data.gov pioneering initiative. Since then several countries, mainly English speaking ones, have launched their own initiatives. You can check, New Zeeland Open Data Catalogue, or Australia Catalogue, and also individual cities like Open Toronto or New York City open data websites. There is a good repository of governments that are opening up their data vaults around the world at The Guardian Open Data Platform or at Fundacion CTIC website

In the European Union since 2003 we have the Directive 2003/98/EC on the Re-use of Public Sector Information (PSI) that has already been transposed into all the members national laws. That means, in simple words, that the EU27 Member States are enforced by law to promote open data, although to date only United Kingdom seems to be doing a significant effort. Prime Minister Gordon Brown, who is being advised by Sir Tim Berners Lee and Professor Nigel Shadbolt, has talked in several speeches about his aim of turning UK into a world leader in making government data more accessible to the public. This commitment to Open Data Community, considered as an important element of Building Britain’s Future, has led to Data.gov.uk website for the promotion of the reuse of UK public data or the announcement of the creation of "The Institute of Web Science" initially funded with £30M.

In Spain, we are quite far from leading anything in the web, although it is remarkable the effort done by Proyecto Aporta, a small initiative at the Ministry of Industry with very scarce resources, that I presented you a few months ago. For example, in March 2010 Proyecto Aporta launched a beta version of a catalogue of public information in Spain, which sadly contains very little raw data available, and thus, useful for reuse by companies or individuals.

There is not a strong political support from Spanish authorities, and Spain is going to loose another opportunity to improve its performance in digital innovation and to participate in the major changes in economy and society we are living. I cannot not understand why an initiative that would have such a huge positive impact on innovation it is not being pushed firmly in our country. Open Data is a very cheap investment for a Government and the main reasons that are pushing forward these initiatives around the world are purely economic. The European Commission estimated in 2006 that the overall market size for the reuse of Public Sector Information in the EU is 27.000 million Euros (0.25% of the total aggregated GDP for the EU)

In order to be up to date, I recommend the EPSIPlatform, Europe's One-Stop Shop on Public Sector Information (PSI) Re-use, where you will find, among other useful resources, the best tracking I know of all news, announcements and moves in the Open Data World. You can also follow EPSIplatform in Twitter. If you want to get involved I suggest you to follow The Open Knowledge Foundation (OKFN) projects around any kind of information that can be freely used, reused, and redistributed.

Open Data might be a drop in the ocean of economy, but it is a really cheap move, it is easy for public authorities, there are not major shortcomings, it does not hurt any industry and there is a lot of political return, apart from the economic as transparency is a hot topic for citizens. So please move on!. Companies need raw data for developing new and innovative products! Free our data!