(This is an ongoing draft/manifesto of thoughts that have; I am a web developer at Cleveland Public Library but these are my views and not those of my employer)(Writing this out made me think of even more questions than answers. I may be critical but I’m critical because I care about libraries).

As a participant in the open data and civic tech movement as a brigade captain of Open Cleveland and web developer at the Cleveland Public Library, I see the potential of libraries playing a much larger part of the open data movement.

Public libraries can be and should be (?) stewards of digital open data because they’ve historically been stewards, have public trust and neutrality, subject domain experts, and are connected with the community.

What’s open data: https://opengovdata.org/ is great).


Public Libraries have historically been stewards of data:
Historically, we librarians, are already are open data stewards. The Cleveland Public Library’s Public Administration Library (https://cpl.org/locations/public-administration-library/) has been designated as the “the most complete collection of material on Cleveland city government available anywhere” including City Council legislation, budgets, and more. This stewardship and sharing only has been on paper or microfilm.

CPL is also a Federal Depository Libraries. They were the ‘data portals’ - a centralized access point - an on-paper data portal, guaranteeing public access to federal data ( Census, reports, contact information, and more). (With this data being managed on a federal level on data.gov, how should Federal Depository Libraries continue their function (I don’t know)?)

City data portals and the open data movement haven’t been focused on maintaining or sharing historical data, often only sharing the most current version of a data set. Who is archiving and saving that? As archivists, libraries can fill this role too.

Public Libraries have subject-based experts and know how to find knowledge:
We know that just because there’s an open data portal, doesn’t mean that people will use it. Cities hosting open data portals are realizing that a portal isn’t enough. For open data to have any effect for the public good, it needs to be used like another resource, a tool, a means to an end; a source to answer the question; a source to analyze. Open data is just another source of knowledge that needs to be interpreted (by knowing how to filter the data, how to structure their queries technically, to use technical tools, etc) to find and then further analyze the information that a patron is trying to access so the patron has their answer/knowledge.

Connected with the community:
Our public libraries are still in the community and relatively do a better job of working with all communities and being places welcoming to all. They’re one of the few organizations that still have a wide and collaborate with entities across different sectors. They’re one of the only 3rd places left for people to meet. They’re one of the few places where people who normally don’t interact with each other can.

Libraries including CPL have been teaching people how to utilize the then-new sources of information, the internet and tools to make sense of it (excel) and basic digital literacy courses. Carnegie Library in Pittsburgh are teaching data literacy courses and how to use data sources. These are good starts for libraries to help people and institutions, especially those from marginalized backgrounds, learn how to access the data. As it’s just not enough to provide the raw material (books, databases) or open government data, The libraries can help people make sense and enhance the patrons’ use of these materials as they do with book discussions, instructing patrons to access and use databases, offering geneology clinics to use and understand those resources. The library would be the data intermedary perhaps doing the data analysis, helping people and institutions understand the data.

Have neutrality, public trust:
(Perhaps the most contentious point and least fleshed out?) Libraries are luckily generally well funded in NE Ohio and generally have the public’s trust. By being non-elected positions or at least, so far, relatively not politically influenced, they could continue to share data if a government administration cuts access to their data (just see what’s happening on a federal level). They could help present the material to patrons in ways that the government may find critical of them.

The challenges here and ahead:
Even from my limited experience at CPL, we’re limited by capacity. Librarians have the subject expertise but don’t have general technical expertise to do the extracting, transforming, and loading of raw data sets into ways for patrons to access. A combination of better technical training for staff members and also developers making it easier to fulfill patrons’ common data requests. Perhaps a bad analogy, like how there’s LaTeX on one end of the spectrum for extremely custom, esoteric needs that’s extremely powerful and Word available (suitable for common needs) for word processing and formatting. Libraries should have staff members who would know both to accomodate the variety of needs of patrons.

Although libraries are already sharing some historical open data, the process to migrate the data from the paper records into a digital format is laborious. A first part of the digitization process - creating digital images of these items (look at all of the digital collections!) - has been generally embraced by libraries but the knowledge and tools to transform that into open or structured data generally hasn’t been done (except perhaps OCR’ing some text of books). Budgets would need to be increased to increase staff/instiutional capacity to migrate the data on paper into a digital/structured, open format.

For example, CPL has plenty of digital maps available but no spatial data sets yet (for example: boundaries, building footprints) that could be created from these digital maps. I’m working on creating a geospatial data set of Cleveland’s annexations from these two maps.

We sometimes go half-way in preservation and need to make sure that we’re keeping these capabilities for open data: for example, digitizing maps at a low enough resolution so that it would be difficult for someone to geoference and orthorectify, licensing is another one. NYPL’s Space/Time Directory is creating tools to improve digitization processes/workflows to create these data sets. Perhaps, we should offer our maps already georeferenced (we don’t).

Administrations also need recognize the value (and limits) of open data to fund this and if they haven’t already, establish the partnerships with the holders of the data, the local governments.

Libraries have historically been stewards of data and I think they generally may have somewhat missed the initial curve of the growing open data sector/ecosystem. As an established 3rd party with a mission and history to maintain and share knowledge for the broader community with minimal restrictions ; they also can be the intermediary because the raw data and help the patrons, find the meaning, interpretation of the data. In the meantime, libraries should begin working with local governments and groups like Code For America brigades (volunteer groups using open data and civic tech to imporve their communities and local governments) to learn how they can partner to serve the community needs fulfilled in part by open data and as a be steward of data.

(Thank you to everyone who’s inspired to write this and laid some ground work writing, studying, or talking about this, notably Leila Slutz, Anastasia Diamond-Ortiz, and Mita Williams).