Frequently Asked Questions

Table of Contents

About the Open Data Census

What is the India City Open Data Census?

The India City Open Data Census is an ongoing, crowdsourced measure of the current state of access to a selected group of datasets in municipalities across India. Any community member contribute an assessment of these datasets in their municipality at any time. Census content will be peer-reviewed periodically by a volunteer team of Open Data Census Librarians led by Open Knowledge India.

How can the results of the India City Open Data Census be used?

The India City Open Data Census does not aim to create a comprehensive list of open datasets around India for data users, nor does it aim to define what datasets are the most important to open. Instead, the India City Open Data Census seeks to be a benchmarking tool, which people can use to ignite conversations with their government about open government data.

What's the current state of the India City Open Data Census?

The India City Open Data Census was conceptualized on Open Data Day (February 22, 2014).

Who created the India City Open Data Census?

The India City Open Data Census was initiated in partnership by Sunlight Foundation, and http://in.okfn.org/. It is maintained by Open Knowledge Foundation staff members and the Open Government Data working group with contributions from many members of the wider community.

What is the history of the India City Open Data Census?

The international Open Data Census was created by the Open Knowledge Foundation in 2012, and provides a clear measure of available open data -- not what is claimed, but what data is actually available and how open it is. The original Census was designed by open data experts, including the Open Knowledge Foundation Open Government Working Group, and undergoes a process of peer review and evidence checking to ensure high quality results. In early 2014, Open Knowledge Foundation announced the bespoke Local Open Data Census. The India City Open Data Census was launched on Open Data Day.

Provide feedback and suggestions for additional datasets to include in future censuses at Open Knowledge India.

Detailed discussions of the data categories relating to submissions and review Issues and challenges for submitters and Open Data Census Librarians (reviewers) are discussed on the Census discussion list. See the full Census discussion list archive here.

How reliable is the India City Open Data Census?

The information in the Census is collected by open data experts and enthusiasts around the world including Open Knowledge India and the Open Knowledge Foundation Open Government Working Group. The Census data undergoes a process of peer review and evidence checking to improve the quality of results. That said, we rely on the contributions of local community users of government datasets, so if you see a problem please submit a comment. Contributors and Editors are also cited on each dataset submission.

Submitting information to the Census

At the moment we are only collecting information on data that is currently available in 2014. The India City Open Data Census is a survey of the state of open data in India, focusing on the the availability and openness of a specific set of key datasets.

What is the India City Open Data Census data collection and review process?

It works like this:

  1. Contributors submit information about the availability (or not) of key datasets in their city (for example Environment data in Kolkata or Finances in New Delhi).

  2. For edits to submissions, contributors may Propose Revisions.

  3. Open Data Census Librarians either approve (with or without amendments) or reject the Proposed Revisions.

  4. If approved, these Submissions become an official entry in the Census and are displayed on the website.

How can I improve the Census information about an Indian City?

  1. If you have information about a dataset which isn't in the Census yet you can add it! Anyone can submit new information to the Census.

  2. Click the blue “Submit Information” button on the right next to the appropriate category.

  3. Fill the form based on the dataset you have found (there are detailed instructions on the page).

  4. Click Submit. Your submission is now waiting for review, and will be visible in the table as 'awaiting review' after a few minutes.

How can I correct an existing entry in the Census?

Please feel free to contact us at Open Knowledge India.

How can I add an Indian City to the Census?

Contact us at Open Knowledge India and we will work to include your city in the template.

How do I become my city’s Open Data Census Librarian?

Open Data Census Librarians are the reviewers and point persons for the Census assessment in their community. They are responsible for filling out a profile page, becoming familiar with this FAQ and the Data Explainers, and reviewing open data periodically. To become or nominate a community member to be an Open Data Census Librarian for your city, please reach out to Open Knowledge India.

What do all the questions about the datasets mean?

When filling in information about a dataset, there's a list of questions to answer about the availability and openness of the datasets. The answers then appear in the City overview page for the Census.

Question

Details

Weights

Does the data exist?

Does the data exist at all? The data can be in any form (paper or digital, offline or online etc). If it is not, then all the other questions are not answered.

5

Is data in digital form?

This question addresses whether the data is in digital form (stored on computers or digital storage) or if it only in e.g. paper form.

5

Publicly available?

This question addresses whether the data is "public". This does not require it to be freely available, but does require that someone outside of the government can access it in some form (examples include if the data is available for purchase, if it exist as PDFs on a website that you can access, if you can get it in paper form - then it is public). If a freedom of information request or similar is needed to access the data, it is not considered public.

5

Is the data available for free?

This question addresses whether the data is available for free or if there is a charge. If there is a charge, then that is stated in the comments section.

15

Is the data available online?

This question addresses whether the data is available online from an official source. In the cases that this is answered with a 'yes', then the link is put in the URL field below.

5

Is the data machine readable?

Data is machine readable if it is in a format that can be easily processed by a computer. Data can be digital but not machine readable. For example, consider a PDF document containing tables of data. These are definitely digital but are not machine-readable because a computer would struggle to access the tabular information (even though they are very human readable!). The equivalent tables in a format such as a spreadsheet would be machine readable. Note: The appropriate machine readable format may vary by type of data – so, for example, machine readable formats for geographic data may be different than for tabular data. In general, HTML and PDF are not machine-readable.

15

Available in bulk?

Data is available in bulk if the whole dataset can be downloaded or accessed easily. Conversely it is considered non-bulk if the citizens are limited to just getting parts of the dataset (for example, if restricted to querying a web form and retrieving a few results at a time from a very large database).

10

Openly licensed?

This question addresses whether the dataset is open as per http://opendefinition.org. It needs to state the terms of use or license that allow anyone to freely use, reuse or redistribute the data (subject at most to attribution or sharealike requirements). It is vital that a licence is available (if there's no licence, the data is not openly licensed). Open Licences which meet the requirements of the Open Definition are listed at http://opendefinition.org/licenses/.

30

This question addresses whether the data is up to date and timely - or long delayed. For example, for election data that it is made available immediately or soon after the election or if it is only available many years later. Any comments around uncertainty are put in the comments field.

Is the data provided on a timely and up to date basis?

10

URL of data online?

The link to the specific dataset if that is possible. Otherwise to the home page for the data. If that is not impossible, then the link to main page of site on which the data is located. Only links to official sites are eligeble, not third party sites. When it is necessary for submitters to provide third party links, then they are put in the comments section.

Format of data?

This question describes the form that the data is available in. For example, for tabular data it might be: Excel, CSV, HTML or even PDF. For geodata it might be shapefiles, geojson or something else. If available in multiple formats, the format descriptors are listed separated with commas. Any further information is put in the comments section.

URL to license or terms of use?

Please provide the url to the license or terms of use governing access and use of this data (if known). If there is more than one URL you would like to list, please just list the primary one in this field and add further information in the comments box below.

Date data first openly available?

This question describes when the data first became openly available (online, in digital form, openly licensed etc). Sometimes this is approximate. For example, "2012" or "Jan 2012". If there is a precise date, then they are typed in in a dd-mm-yyyy format. If the data is not open, then this question will instead describe the date the data first became available at all. (Note: Obviously some open data was available in other forms previously, so the date specified here is the date it became openly available).

Title and short description?

Please enter the title(s) and excerpted short description(s) of the dataset(s) as provided by the publisher. Description should be kept to a few sentences (max 1 paragraph)

Data Publisher?

If known, please enter the department / organisation responsible for publishing this dataset along with contact email (if known). If the specific person responsible for this is known please also list them.

Rate Quality of the Data (Content)?

Rate the quality of the data in terms of its actual content (ignoring structure) - is the data accurate, is it provided at a detailed, granular level, etc (and ignoring whether the data is in PDF or Excel or whether it is just human-readable rather than machine-readable). 1 is worst, 10 is best. Please justify your rating in the details and comments section

Rate Quality of Data (Structure)?

Rate the quality of the data in terms of how well it is structured, and how easy it is to use. For example, is the data provided in a good data format (CSV vs PDF), are the files well structured and easy to process programmatically (or do you have to clean them before ue), is there an API? 1 is worst, 10 is best. Please justify your rating in the details and comments section.

Further Details and Comments (optional but strongly encouraged)?

Please add detail here to expand on and support your answers above. Information on data availability is especially useful, for example, is the data partially available, are there plans to make it available in the future? is the data available from an unofficial source. Markdown formatting is supported.

How should I use the comments/details field when submitting and reviewing?

Comparing datasets between local governments is, as mentioned, a complex and often difficult task. This is why the comments/details field is public, so that submitters and Open Data Census Librarians can explain the reasoning for their choices. In other words, the comments/details field is your main tool to ensure that your city’s entries and scores can be compared to other cities’. We therefore strongly encourage you to be thorough in your comments, as that will reflect on how your city is perceived and compared.

Tip: Try to see the comments of cities with similar score in the given category, or go to cities whose data systems and governance structure may be similar to your city’s.

Questions about the assessment of openness

Are data to be considered publicly available if an FOI request is needed to retrieve them?

Publicly available is meant without having to put in FOI request -- so it should be available without further ado.

What about cities where there is no official mention of licensing attached to the data in question?

Licensing of online datasets can be found in the datasets’ metadata or sometimes if the dataset is available via an online open data portal or database within that portal or databases Terms of Service. Licensing might also be articulated in your city’s open data policy. For maximal legal re-use, open government data should have a worldwide public domain designation, such as such as the Creative Commons CC0 statement or a Open Data Commons Public Domain Dedication and License (PDDL).

What formats can generally be considered machine readable?

Since machine readability is not strictly a matter of data format, here are some further considerations to consider: HTML, even well structured, will only sometimes count as machine-readable and is, by default, not machine-readable - because it most often needs parsing and thereby is not directly reusable.

CSV, XML and XLS would usually count as machine readable, but not always. Consult Sunlight Foundation’s Open Data Guidelines if you’re in doubt, or if there is a dispute, reach out to the Census discussion list.

In general we suggest to look at machine readable as a combination of fact and objective judgement, and not say that a particular format is automatically machine-readable or not machine-readable. So, machine-readable is to be understood in the sense that you could extract the data and directly reuse it.

This issue is discussed in more detail in the Sunlight Foundation’s Guideline on mandating data formats for maximal technical access and in this thread on the Census discussion list.

I want to help, but I'm not sure where to start!

On the India City Open Data Census, you can see more about the 17 categories of data that we are focusing on in the About section of the site. Each of the entries for each city has been sent in by community members, who have simply used Google or other search engines to find out what datasets are available (simply finding the URL) and under which circumstances (are the data openly licensed, can they be downloaded in bulk etc.) -- and then made a submission via the form on the site, where they simply fill out a handful of questions and put in the URL for the data. All in all it is a really easy (and fun) task that helps to put a city on the open data map -- and it's easy to get started!

You can do some research yourself! Pick a city where the Census shows there is data missing or where there are comments showing that there's uncertainty (perhaps the licence hasn't been specified, for example) or a city that you know well. A targeted search or working with others is most fun and helpful. Get together with friends, colleagues, your local open data communityand dig into data on a given topic or for a given city together.

Where can I discuss the India City Open Data Census with others?

Join the Open Data Census discussion list.

I'm confused! How can I get help?

There are lots of people who can help on the Open Data Census discussion list, and there are no silly questions, so we encourage you to post there.

Understanding the India City Open Data Census results

How does the scoring system work?

The India City Open Data Census measures the state of openness of 17 datasets for each city. The overall score for a dataset is based on the response to specific questions with varying weightings -- the weighting for each question is listed in the question table above. The overall city score is then calculated from the score on each dataset.

The score algorithm is:

If answer is "yes" to a question add the weighted value to score for that dataset

  1. Add up total scores for each dataset to get a city score

  2. As the weightings indicate, timeliness is now included with a weighting of 10.

One of the aims of the questions for each dataset is to provide an increasing set of requirements leading up to full openness (excluding ‘timely’ which is important but not a requirement for open data). It should be noted that this does not mean each question directly builds on the previous since some of are parallel (e.g. digital form and publicly available) but in general there is a progression, so ”No” on an earlier question may well imply ”No” on a later question.

If you are intrigued by Open Data...

Learn more

If the India City Open Data Census has caught your interest, there's lots more open data and open government to learn about.

You can learn more about the initiatives of Open Knowledge India.