View on GitHub

Some things I’ve made

Data extraction

https://github.com/jheasly/homeless-cleanups

Using the pdfplumber Python package, wrote a jupyter notebook script to go through 6,590 pages in 2,814 documents FOIA'd from the City of Eugene to extract work order data and write it to a .csv file.

Dataviz

https://www.gannett-cdn.com/west-hub-production/homeless-camps/index.html

With data from .csv file created above, made a map in Mapbox GL and hosted it in a Google Cloud Platform bucket and embedded it in this story presentation I built using Gannett's proprietary In Depth framework.

How Lane County voted for president

First attempt at a choropleth, using Mapbox GL & QGIS to join scraped county election results with precinct shapefile. Exported GeoJSON out of QGIS (and then learned about reducing GeoJSON file size).

Scrapers

USA TODAY national data & investigations team: ‘A national disgrace’: 40,600 deaths tied to US nursing homes

Pitched in to work on a distributed three-person collaboration of Python developers building scrapers on deadline to supplement the manual collection of state nursing home data for a USA TODAY story detailing the national COVID-19 death toll at long-term care facilities.

Eugene, Ore., police call log

First scraper I wrote; Dec. 2008. Scrapes Eugene Police Department police call log every 15 minutes. Currently >920K rows.

Springfield, Ore., police call log

Scrapes Springfield police call log every 15 minutes. Since 2013, >230K rows.

Websites that reverse publish, APIs

http://local2.registerguard.com/civic/meetings/

A place for local credentialed entities to enter meeting information as required by law. Password-protected posts publish immediately to web (and owner has CRUD capability) and reverse publishes daily into print Civic Calendar item.
Public repo: github.com/registerguard/civic_calendar2

http://vote.registerguard.com

No link for it is currently sad and moribund. (Perhaps resurrected in 2020.) A landing page for local election information. Powered by JSON feeds that come from a Django backend fed by a Selenium-powered web scraper of Oregon Secretary of State site. Outputs results in InDesign tagged text for use in print. (Okay, if you must look, here's a link.)
Public repo: github.com/registerguard/ballot
Sample JSON API response: vote.registerguard.com/results/laneco.json

http://go.registerguard.com/entertainment/

A currently superseded Django entertainment calendar app that allowed for anonymous and trusted users to enter event information, available online and created weekly Entertainment section listing via InDesign tagged text.

https://cloud.registerguard.com/discovery/

Online adventure guide listing utilizing Leaflet & Open Street Map, powered by Tarbell and Google Sheets that also produces InDesign tagged text for print.
Public repo: github.com/registerguard/discovery

XML feed mungers, Twitter bots, RSS feeds

http://projects.registerguard.com/school-closings/

Parses a push FlashAlert.net XML feed every 15 minutes that results, when there are school delayed openings and closures, in this index page, a home page widget and a Tweet from @registerguard. (Note: If there currently isn't bad weather in Lane County, Ore., USA, there probably isn’t a lot to see here.)
Public repo: github.com/registerguard/django-flashnews

http://projects.registerguard.com/school-closings/roads/

Ditto.
Public repo: github.com/registerguard/django-flashnews

Also, built an automated print archive.

Our previous CMS had no public-facing archive, so I took the initiative to build one. The only available database driver was written in Java, so I learned enough Jython to get a nightly cronjob export working.

The archive was useful for many things, e.g. it powered story feeds used by The Associated Press, ProQuest, etc. Here's a NewsBank Atom feed.

When it came time to transition to a new CMS, I used the archive app to quickly pull together a custom XML export of nine year's worth of stories — more than 250,000 locally-produced items plus related assets — that were all imported into the new CMS; no stories lost.