Data Report Automation
Lyft - Summer 2019

lyft logo

Overview

Lyft's Regulatory Reporting team is in charge of building and sending off many data reports of varying sizes and degrees of intricacy. Some of the smaller, more monotonous reports need to frequently be sent out to regulators, but the process of gathering data, formatting, and sending such reports is the same every time. My task was to create a template and documentation that my team could use to turn their manual processes for smaller reports into fully automated processes to gain back work time. This was an individual project that I worked on from July to August 2019. While I had ownership of the project, I worked cross functionally with a Lyft software engineer, Sam Reese, and with an engineer at an external secure digital storage company.

The Code

I developed a script for Apache Airflow that contains all 3 parts of the report process:


  1. Pull raw data from Lyft's data tables using Hive.
  2. Validate and format the data in Pandas.
  3. Send the data report using Python and the secure digital storage company's REST API to the report's storage location for regulators.

I used Apache Airflow and a directed acyclic graph (DAG) in Python to build the steps of the workflow and schedule recurring runs of script. This automated the script and elimated the need for manual work! I went through the design process for one report, and then built a general template and wrote documentation in order for my team to continue my work and automate more reports in the future. My project saves the Regulatory Reporting team an average of 7 hours per automated report per quarter, and furthered the team's goal of automating 5 data reports before end of year.

Languages and Tools Used

python logo
pandas logo
hive logo
airflow logo
git logo

To learn more about my design, please reach out to me at gendelprete@gmail.com!

back arrow

Back to Projects

back to top arrow

Back to Top