Will Skora's blog

Oct 22, 2017 - The role of open data and libraries

(This is an ongoing draft/manifesto of thoughts that have; I am a web developer at Cleveland Public Library but these are my views and not those of my employer)(Writing this out made me think of even more questions than answers. I may be critical but I’m critical because I care about libraries).

As a participant in the open data and civic tech movement as a brigade captain of Open Cleveland and web developer at the Cleveland Public Library, I see the potential of libraries playing a much larger part of the open data movement.

Public libraries can be and should be (?) stewards of digital open data because they’ve historically been stewards, have public trust and neutrality, subject domain experts, and are connected with the community.

What’s open data: https://opengovdata.org/ is great).

Why:

Public Libraries have historically been stewards of data:
Historically, we librarians, are already are open data stewards. The Cleveland Public Library’s Public Administration Library (https://cpl.org/locations/public-administration-library/) has been designated as the “the most complete collection of material on Cleveland city government available anywhere” including City Council legislation, budgets, and more. This stewardship and sharing only has been on paper or microfilm.

CPL is also a Federal Depository Libraries. They were the ‘data portals’ - a centralized access point - an on-paper data portal, guaranteeing public access to federal data ( Census, reports, contact information, and more). (With this data being managed on a federal level on data.gov, how should Federal Depository Libraries continue their function (I don’t know)?)

City data portals and the open data movement haven’t been focused on maintaining or sharing historical data, often only sharing the most current version of a data set. Who is archiving and saving that? As archivists, libraries can fill this role too.

Public Libraries have subject-based experts and know how to find knowledge:
We know that just because there’s an open data portal, doesn’t mean that people will use it. Cities hosting open data portals are realizing that a portal isn’t enough. For open data to have any effect for the public good, it needs to be used like another resource, a tool, a means to an end; a source to answer the question; a source to analyze. Open data is just another source of knowledge that needs to be interpreted (by knowing how to filter the data, how to structure their queries technically, to use technical tools, etc) to find and then further analyze the information that a patron is trying to access so the patron has their answer/knowledge.

Connected with the community:
Our public libraries are still in the community and relatively do a better job of working with all communities and being places welcoming to all. They’re one of the few organizations that still have a wide and collaborate with entities across different sectors. They’re one of the only 3rd places left for people to meet. They’re one of the few places where people who normally don’t interact with each other can.

Libraries including CPL have been teaching people how to utilize the then-new sources of information, the internet and tools to make sense of it (excel) and basic digital literacy courses. Carnegie Library in Pittsburgh are teaching data literacy courses and how to use data sources. These are good starts for libraries to help people and institutions, especially those from marginalized backgrounds, learn how to access the data. As it’s just not enough to provide the raw material (books, databases) or open government data, The libraries can help people make sense and enhance the patrons’ use of these materials as they do with book discussions, instructing patrons to access and use databases, offering geneology clinics to use and understand those resources. The library would be the data intermedary perhaps doing the data analysis, helping people and institutions understand the data.

Have neutrality, public trust:
(Perhaps the most contentious point and least fleshed out?) Libraries are luckily generally well funded in NE Ohio and generally have the public’s trust. By being non-elected positions or at least, so far, relatively not politically influenced, they could continue to share data if a government administration cuts access to their data (just see what’s happening on a federal level). They could help present the material to patrons in ways that the government may find critical of them.

The challenges here and ahead:
Even from my limited experience at CPL, we’re limited by capacity. Librarians have the subject expertise but don’t have general technical expertise to do the extracting, transforming, and loading of raw data sets into ways for patrons to access. A combination of better technical training for staff members and also developers making it easier to fulfill patrons’ common data requests. Perhaps a bad analogy, like how there’s LaTeX on one end of the spectrum for extremely custom, esoteric needs that’s extremely powerful and Word available (suitable for common needs) for word processing and formatting. Libraries should have staff members who would know both to accomodate the variety of needs of patrons.

Although libraries are already sharing some historical open data, the process to migrate the data from the paper records into a digital format is laborious. A first part of the digitization process - creating digital images of these items (look at all of the digital collections!) - has been generally embraced by libraries but the knowledge and tools to transform that into open or structured data generally hasn’t been done (except perhaps OCR’ing some text of books). Budgets would need to be increased to increase staff/instiutional capacity to migrate the data on paper into a digital/structured, open format.

For example, CPL has plenty of digital maps available but no spatial data sets yet (for example: boundaries, building footprints) that could be created from these digital maps. I’m working on creating a geospatial data set of Cleveland’s annexations from these two maps.

We sometimes go half-way in preservation and need to make sure that we’re keeping these capabilities for open data: for example, digitizing maps at a low enough resolution so that it would be difficult for someone to geoference and orthorectify, licensing is another one. NYPL’s Space/Time Directory is creating tools to improve digitization processes/workflows to create these data sets. Perhaps, we should offer our maps already georeferenced (we don’t).

Administrations also need recognize the value (and limits) of open data to fund this and if they haven’t already, establish the partnerships with the holders of the data, the local governments.

Libraries have historically been stewards of data and I think they generally may have somewhat missed the initial curve of the growing open data sector/ecosystem. As an established 3rd party with a mission and history to maintain and share knowledge for the broader community with minimal restrictions ; they also can be the intermediary because the raw data and help the patrons, find the meaning, interpretation of the data. In the meantime, libraries should begin working with local governments and groups like Code For America brigades (volunteer groups using open data and civic tech to imporve their communities and local governments) to learn how they can partner to serve the community needs fulfilled in part by open data and as a be steward of data.

(Thank you to everyone who’s inspired to write this and laid some ground work writing, studying, or talking about this, notably Leila Slutz, Anastasia Diamond-Ortiz, and Mita Williams).

Sep 5, 2015 - Mapping your neighborhood in Cleveland and Akron

Where is your neighborhood? What is its name? Where is its boundary? Is this boundary fuzzy for you?

Neighborhoods and these answers change from person to person.

In Cleveland, neighborhoods’ boundaries are largely left to the imaginations of residents, visitors, realtors, businesses, and non-profits.

The City Planning Commission’s Statistical Planning Areas, adopted in early 2000s. These names are argely ignored and not widely adopted with good reason, they are missing and many of the names there aren’t used in everyday life.

Here’s your chance to say where your neighborhood is and view what others have shared.

Map Your Neighborhood in Cleveland and Akron at http://skorasaur.us/nh

You’re encouraged to map (that is draw) the neighborhood where you live but others that you may not live in but may spend a lot of time in or feel strongly about.

No neighborhood or city boundaries are present on the map; to remove biases and to encourage boundaries across city lines.

With projects like http://openstreetmap.org and improved technology and software, mapping is not only a noun, but it is being used as a verb - creating and modifying what is (or isn’t) on a map - the canvas representing a space.

After you’ve mapped a neighborhood, view what others have drawn.

I hope this sparks a conversation of neighborhood identity in each of you.

Thanks to work of Nick Martinelli and Andy Woodruff and Tim Wallace at Bostonography, I’ve been able to build upon their work and customize it for Cleveland.

Identifying neighborhoods has fascinated me for some time and inspired me to create my first map - my (incomplete) interpretation of Cleveland’s neighborhoods, in 2010-2011.

To make your own instance for a city, the source code and directions are available on github. I’ve made a couple adjustments (like directions) that I’ll be shortly adding.

Jun 1, 2015 - Recently

What I’ve been up to (outside of my work):

Setting up tech (registration, website updating/maintainence, and writing the content) for the 8th annual Jake’s Invitational.

If you’re looking to golf for a great cause in Northeast Ohio, check out the 8th Jake’s Invitational on August 9th.

We fund children’s futures by giving them financial aid to Lawrence School, a great place for students with learning differences.

Spending more time with open data.

Obtaining the data (especially local data) to be used in maps has been time-consuming. When exploring or thinking about different topics to understand through maps, I am limited by the data that is available.

This has led me to spend more time to advocate for and work with open data on a broader scale. I’ve been co-leading Open Cleveland which along with OpenNEO and Hack Cleveland has been the open data movement in Cleveland.

We’re educating local politicians and city employees that civic data they work with and manage can be useful if it’s available to the public like creating a web form so someone can apply online to take formal stewardship of the vacant lot next door to them.

Data alone won’t solve anything but it will make a lot of others’ jobs easier.

I didn’t submit a talk to NACIS this year. Do I regret it? Not yet. I might later.

I’ll share some Carto thoughts on animated temporal maps very briefly:

I’ve been thinking a little about animated temporal maps, maps whose features change based on a specific time. One examples

Torque by CartoDB is one easy to use library that is described to do temporal mapping. I haven’t seen as much use of Torque (or many temporal maps) in recent months on cartotalk on twitter.

I hadn’t thought of any use of torque either, until last week, visualizing over time, Cleveland’s building demolitions.

For the outsiders of Cleveland, yes, many of these were likely houses; it’s a visual representation of the housing crisis.

I was wondering how I could see it spread, what areas were hit hardest. I want to see different ways how this can be visualized.

This first visualization is just a proof of concept I got up and running; I’ve fiddled with torque’s API a little since then although not enough to write up for you just yet, will do soon. I am now sleepy.

Listening to Jean-Christian Arod - Detour Nostalgique. from the movie CRAZY. I fell in love with the song back in 05 or 06, and just rediscovered it earlier tonight, listening to it a few times on lop.

Newer Page: 4 of 6 Older