The Global Open Data Index collects and presents information on the current state of open data release around the world. The Global Open Data Index is run by Open Knowledge International with the assistance of volunteers from the Open Knowledge Network around the world. The first Open Data Index was released on October 28, 2013. This page explains the methodology behind the Global Open Data Index. If you have any further questions or comments about our methodology please reach out to the staff, community of volunteers, and Index reviewers on the Open Data Index forum.
The Global Open Data Index is not an official government representation of the open data offering in each country, but an independent assessment from a citizen’s perspective. It is a civil society audit of open data enabling citizens and governments to measure government’s progress on open data. The Index gives both parties a measurement tool and a baseline for discussion and analysis of the open data ecosystem in their country and internationally. The datasets that are taken into account seek to represent civil society’s preferences, and therefore measure open data publication from a key user’s perspective (further details, see datasets section below).
The Global Open Data Index is not only a benchmarking tool, it also plays a powerful role in sustaining momentum for open data around the world - and in convening civil society networks to use and collaborate around this data. If, for example, the government of a country does publish an open dataset, but this is not clear to the public and cannot be found through a simple search, then the data can easily be overlooked and not put to good use. Governments and open data practitioners can review the Index results to see how accessible the open data they publish actually appears to their citizens, see where improvements are necessary to make open data truly open and useful, and track their progress year to year.
Like any other benchmarking tool, the Global Open Data Index tries to answer a question. In our case, the question is as follows:
“What is the state open data around the world?”
From this question, other important questions emerge, such as:
Open data has two key aspects: legal and technical openness. Which of these two — and which specific requirements e.g. an open license, machine readability, bulk access — is the most challenging for data publishers? For example, do governments find it easy to publish machine readable data but struggle to apply an open license?_
According to the common open data assessment framework there are four different ways to evaluate data openness — context, data, use and impact. The Global Open Data Index is intentionally limiting its inquiry to the publication of datasets by national governments. It does not look at the broader societal context — for example the legal or policy framework, (FOI, etc.) — and it also does not seek to assess use or impact in a systematic way.
In contrast to past editions, the Index now also seeks to capture information on practical openness, i.e. data findability and usability. These questions are not currently scored but this information will provide valuable information for both governments and users.
The scored Open Data Index questions do not assess the quality of the data. This narrow focus of data publication enables it to provide a standardized, robust, comparable assessment of the state of the publication of key data by governments around the world. We are nevertheless aware that data quality is a key concern of the open data community and a significant barrier to reuse.
Different countries have different governance structures (Federal vs. National government, etc.) and different policies regarding open data. We set out here our key assumptions that inform our approach and that were taken into consideration while collecting and assessing the data.
Assumption 1: Open Data is defined by the Open Definition We define open data according to the ‘Open Definition’— The open definition is a set of principles that define openness in relation to data and content. It is the, original, “gold-standard” definition for open data. It is also simple and easy to operationalise. We note one small deviation from the current v2.1 of the Open Definition. The only part of our methodology that is not aligned with the open definition is “Open Machine readable” format. We give full score to machine-readable formats whose source code is not open, but who are usable with at least one free and open source software in order to emphasise practical openness.
Assumption 2: The role of government in publishing data In the past, there have been questions in the index community about the role of the government in ensuring the publication of a specific dataset. In many fields, some of government services are privatised, which means the data is owned and produced by a company and not the state. Our view and assumption is that for the key datasets we survey, the government has a responsibility to ensure the availability of such data even if is it held and managed by a third-party.
Assumption 3: The Global Open Data Index is a national indicator Not all countries have the same governance structure and have differing degrees of centralisation of services. Some have a main government with municipalities, other have much more complicated structures with sub governments (regions and states). Different governments may collect different data for different geographical regions. It is possible that not all of the sub governments have to abide by the same laws, since they have some autonomy. In addition, whilst not strictly required, we expect that national governments also provide aggregation of that data from sub governments so as to ensure users have an easy way to access use the data (the best solution is one consolidated dataset but at a minimum could consist of a single point of access to all data subsets). The Global Open Data Index measures the publication of open data at the country-level. “National” publication of open data can take two forms
Dataset definitions are crucial in enabling respondents to accurately assess datasets and to do in a way that is comparable across countries. Each year we have refined our definitions and this has continued this year.
National Statistics: Key national statistics such as demographic and economic indicators (GDP, unemployment, population, etc). To satisfy this category, the following minimum criteria must be met:
Government Budget National government budget at a high level. This category is looking at budgets, or the planned government expenditure for the upcoming year, and not the actual expenditure. To satisfy this category, the following minimum criteria must be met:
Government Spending: Records of actual (past) national government spending at a detailed transactional level. A database of contracts awarded or similar will not be considered sufficient. This data category refers to detailed ongoing data on actual expenditure. Data submitted in this category should meet the following minimum criteria:
Draft Legislation: Data about the bills discussed within national parliament as well as votings on bills (not to mix with passed national law). Data on bills must be available for the current legislation period.
National Laws: This data category requires all national laws and statutes available to be available online, although it is not a requirement that information on legislative behaviour e.g. voting records is available. To satisfy this category, the following minimum criteria must be met:
Election Results: This data category requires results by constituency / district for all major national electoral contests. To satisfy this category, the following minimum criteria must be met:
National Map: This data category requires a high level national map. To satisfy this category, the following minimum criteria must be met:
Pollutant Emissions: Data about the daily mean concentration of air pollutants, especially those potentially harmful to human health. Data should be available for all air monitoring stations or air monitoring zones in a country.In order to satisfy the minimum requirements for this category, data must be available for the following pollutants and meet the following minimum criteria:
Company Register: List of registered (limited liability) companies. The submissions in this data category do not need to include detailed financial data such as balance sheet, etc. To satisfy this category, the following minimum criteria must be met:
Location datasets: A database of postcodes/zipcodes and the corresponding spatial locations in terms of a latitude and a longitude (or similar coordinates in an openly published coordinate system). The data has to be available for the entire country. Data submitted in this category must satisfy the following minimum conditions:
Administrative boundaries: Data on administrative units or areas defined for the purpose of administration by a (local) government.
Procurement : All tenders and awards of the national/federal government aggregated by office. Monitoring tenders can help new groups to participate in tenders and increase government compliance. Data submitted in this category must be aggregated by office, updated at least monthly & satisfy the following minimum criteria:
Water Quality : Data, measured at the water source, on the quality of water is essential for both the delivery of services and the prevention of diseases. In order to satisfy the minimum requirements for this category, data should be available on level of the following chemicals by water source and be updated at least weekly:
Weather Forecast: 5 day-forecasts of temperature, precipitation and wind. Forecasts have to be provided for several regions in the country. In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria:
Land Ownership: Data should include maps of lands with parcel layer that displays boundaries in addition to a land registry with information on registered parcels of land. The following characteristics must be included in cadastral and registry information submitted *Parcel Boundaries
In a few cases, we have received submissions for places that are not officially recognised as independent countries; we have included these if they are complete and accurate submissions. Therefore, the Global Open Data Index 2016 ranks ‘Places’ and not ‘Countries’. Generally we seek to survey jurisdictions with sufficient autonomy to be responsible for data management and publication. Usually these are countries; however, there are cases where country jurisdiction is disputed and we generally seek to be flexible and inclusive where we can.
Each dataset in each place is evaluated using a set of questions that examine the openness of the datasets based to the open definition and the Open Data Charter.
In 2016, we introduced the new survey of the Global Open Data Index (GODI). The new scoring follows two major ideas:
We assume that each question of our survey measures a crucial characteristic of either the legal, technical and practical ‘openness’ of data. Our scoring follows an assessment of the weighting (see below) in which we describe why a question is important for open data and how a scoring can reflects this importance. We also explain cases why we should not score a question. With this approach we aim to reduce the potential bias towards single aspects of openness.
The new scoring gives in total 40 points to open licences/public domain status and machine readable and open file formats. These technical and legal aspects of openness are the core of the Open Definition 2.1 and we seek maintaining a strong emphasis on them. However, aspects like timely publication, data availability and accessibility are equally important to access and use open data. Questions around data accessibility receive a score of in total 60 points.
Section A: Background Information (Not Scored)
Section B: About the Data (Scored)
Question | Description | Scoring |
---|---|---|
Are the data available online without the need to register or request access to the data? | Answer “Yes”, if the data are made available by the government on a public website. Answer “No” if the data are NOT available online or are available online only after registering, requesting the data from a civil servant via email, completing a contact form or another similar administrative process. | Score: 15 |
Is the data available free of charge? | The data is free if you don’t have to pay for it. | Score: 15 |
Is the data downloadable all at once? | Answer “Yes”, if you can download all data at once from the URL at which you found them. In case that downloadable data files are very large, their downloads may also be organised by month or year or broken down into subfiles. Answer “No” if if you have to do many manual steps to download the data, or if you can only retrieve very few parts of a large dataset at a time (for instance through a search interface). | Score: 15 |
Data should be updated every [Time Interval]: Is the data up-to-date? | Please base your answer on the date at which you answer this question. Answer “No” if you cannot determine a date, or if the data are outdated. | Score: 15 |
Is the data openly licensed/in public domain? | This question measures if anyone is legally allowed to use, modify and redistribute data for any purpose. Only then data is considered truly "open" (see Open Definition). Answer ”Yes” if the data are openly licensed. The Open Definition provides a list of conformant licenses. Answer also “Yes” if there is no open licence, but a statement that the dataset is in “public domain”. To count as public domain the dataset must not be protected by copyright, patents or similar restrictions. If you are not sure whether an open licence or public domain disclaimer is compliant with the Open Definition 2.1, seek feedback on the Open Data Index discussion forum. | Score: 20 |
In which formats are the data? | Tell us the file formats of the data. We automatically compare them against a list of file formats that are considered machine-readable and open. A file format is called machine-readable if your computer can process, access, and modify single elements in a data file. The Index considers formats to be “open” if they can be fully processed with at least one free and open-source software tool. The source code of these format does not have to be open. Potentially these formats allow more people to use the data, because people do not need to buy specific software to open it. | Score: 20 |
Section B: About the Data (Not Scored)
The Index uses a non-probability sampling technique — also known as a “snowball sample”. A snowball sample tries to locate subject of studies in areas that are hard to locate. In our case, we work with contributors who are interested in open government data activity who can assess the availability and quality of open datasets in their respective locations. We do so not only by using referrals, but also by reaching out on social media, through regular communications our Open Government Data and Open Data Index forums, and by actively networking at conferences and events. This year, we also hired local coordinators, that outreached to their networks and assist in soliciting new submissions. This means that anyone from any place can participate and contribute to the Global Open Data Index as a contributor and make submissions, which are then reviewed. We do not have a quota on the number of places that can participate. Rather, we aim to sample as many places around the world as we can. This also has an impact on the quality of the data we collected in the first stage of the Global Open Data Index. Contributors have diverse knowledge and backgrounds in open data and therefore they sometimes need help finding the data we are looking for. The following section explains how we tried to deal with this problem.