2) Database & Pipeline Structure

1 minute read

1. Database

  • A database was built so that the tables needed for this dashboard could be automatically processed every week, and a pipeline was built to manage it.

  • The structure of the tables is shown in Figure 1. These tables are updated weekly and stored in the Google Bigquery database.
  • Because tables are stored separately for each season, the cost of processing data is very low. Therefore, rather than writing the query to update the tables in the most efficient way, I wrote the query in a very simple way to overwrite the tables every week.
database_structure
Figure 1: Database structure

2. Pipeline

  • In addition, I built a pipeline using Python and Google BigQuery API to automatically update the tables in the Google Bigquery database every week.
  • Before building the pipeline, I used about 30 minutes every week to manually update the database by running direct queries.
  • After building the pipeline, the time is reduced by about 83%, and it takes about 5 minutes to update the database every week.