August 2015 - Present
Engineering data pipelines and data warehousing for technical analysts and data scientists.
New York, New York
Uses Python, Postgres, SQLAlchemy, Spark, HDFS, AWS, Docker
Data Science Engineer
Summer 2013 - August 2015
Consolidated data from around the business for cohesive analysis.
Built a dataflow extraction pipeline, parsing datasets out of our legacy
python pipelines and loading it into Neo4j to
analyze the graph of dependencies between pipelines.
Migrated legacy pipelines written in an in-house framework based on cron
to Apache Airflow by reusable
Operator to minimize boilerplate between the legacy codebase and the new
Built a generic pipeline for replicating snapshots of vendor datasets as
type-2 dimensions, preserving versioned history for historical analysis with
with SQLAlchemy's internal SQL
Wrote infrastructure to automatically scale up
an internal interactive computing environment
for analysts and data scientists to explore and
process data in Jupyter,
Wrote pytest fixtures
to house components in Docker
containers to facilitate automated testing of data transformations
and pipelines under the same conditions they would operate under
Data Science & Engineering - Dow Jones
Designed and maintained systems for collecting,
storing, and processing large quantities of data for
data science and analysis.
Minneapolis, Minnesota and New York, New York
Python, Scala, Apache Hadoop, AWS Elastic Compute Cloud, AWS Elastic Map Reduce
Junior Software Developer
Fall 2011 - Summer 2013
- Ingested data from a menagerie of legacy
systems for centralized analysis
Wrote a declarative system for ingesting data
from spreadsheet-based reports through
predicate coordinates. It was for getting data
from non-technical departments that had reports
which would often change formats, so it was
necessary to build an ingestion system that
would adapt to manually-crafted reports.
Developed numerous dashboards that displayed metrics
from around the business, giving you a cohesive,
consistent view where before you had to look for
that data across several different departments.
Designed a new approach to getting machine
learning models to production by rewriting them
to leverage Google Big Query.
By converting common transformations
to raw SQL automatically with SQLAlchemy, a
process that took several hours would often complete in
CPA Global, Minneapolis, Minnesota
Fully-featured webapp for managing intellectual property and coordinating between teams of Attorneys, Paralegals, Docketers and Laypeoples.
Uses Java, Struts2, Spring, Maven, Hibernate, Quartz, MS SQL Server
Implemented a toolset to feed the product's codebase from its numerous repositories to a central OpenGrok search engine.
Worked on a team of motivated developers fixing bugs, writing features, and refactoring legacy implementations into understandable, performant, new ones.