Open data is an alluring solution to many problems in the developing world. It promises to improve transparency, accountability and governance, and all it usually requires is that governments and companies marshal their data into rows and columns, and publish it online for anyone to review, edit, use, or reuse.

But implementing open data policies require much more than just formatting and publishing data. In the case of city governments, the most accurate information about basic civic services is often not in official databases, but in the minds and memories of junior government officials and is disseminated through informal communications with researchers or journalists. Will open data succeed in doing the public service in contexts that do not have a formal data collection and management system?

Government data: where facts are fictions

In the South Indian city of Chennai, official data on very basic civic services such as public toilets, slums, water connections, and garbage is far off the mark from reality.

Transparent Chennai, an action research group where I worked, found that there was no reliable data on public toilets. For instance, in a short span of time, the official estimate of the number of toilets changed from an unreliable 714 in 2012, to 960 and then to 1004, both in 2013. The real number of public toilets remains unknown.

Official data. Photograph: Transparent Chennai

The story is similar in the case of other civic amenities such as water, garbage and low-income housing. Estimates of the number of subsidized water connections from a Right to Information petition I filed in 2012 vary wildly and implausibly between different areas in the city, from 0 water connections to 850. This is in spite of the fact that water is scarce and the demand for these connections high.

In the case of garbage, official data almost certainly undercounts the amount of garbage city residents generate per day. Siddharth Hande, founder of Kabadiwalla Connect, a social enterprise that promotes recycling, believes that at least 34% of 4,500 tonnes of garbage we produce per day is missing from the official record books, because its it’s recycled informally by waste-pickers outside the system. “And this is a conservative estimate,” he says.

Even the number of slums in the city depends on whom you ask. The city government pegs it at 2,478, but the same estimate from the government agency responsible for slums is 300 slums lower.

But why is official city data unreliable and riddled with so many inconsistencies?

On some occasions, the city government constructs infrastructure, say public toilets or public water fountains, but fails to record their existence in a central database. On others, officials are not computer-literate and prefer old-school data entry techniques such as noting down “particulars” in small pocket-sized books. To add to this, the city is divided into so many administrative units that data held by junior officials is often vastly different from data at the city’s headquarters. The end result is that when citywide data is compiled, it does not add up.

With such shambolic data management systems, one would expect the city and its services to be in a perpetual state of chaos. But it is not. There is a degree of accuracy for the city data, but it cannot be found in city office files, drawn from an RTI response, or downloaded from a database.

Informal databases

Essentially, knowledge held by junior officials in their heads – and not available in any database – ends up being the most accurate information that the city has about its infrastructure.

Many junior officials have an encyclopedic understanding of their areas. In the case of public toilets, for instance, junior officials I met were able to pinpoint the precise location of all the public toilets they were in charge of on a map. They were also able to recall minute details such as the number of cubicles the toilets had and whether the toilet was usable.

Similarly, a junior official responsible for slums in the city was able to identify and take us to specific slums, from a long list with similar sounding names. More impressively, the official was able to tell us whether the slum was on private land or government land, and whether it was eligible for government intervention, information that is absent in official documents.

In the case of garbage, senior local conservancy staff we met had an in-depth understanding of the lay of trash-producing land they were incharge of. They had accurate figures for the number of households on a road and knew on average how much waste was collected from each road each day. This information was used by a coalition of community organizations to plan for a zero-waste system in the city.

And, this – the fact that the most accurate data about Chennai is held intangibly, in the minds and memories of city officials – is probably what open data policies fail to take into account. India has already begun investing in open data, but for it to be effective, the government should formalize and record this informally maintained database.