GitHub Postgres ETL: 2 Easy Methods
PostgreSQL, also known as Postgres, is an RDBMS (this stands for advanced Relational Database Management System); it allows you to collect any data effortlessly and securely.
Table of Contents
It’s a mainstream choice for many projects regardless of the size and enjoys the benefit of being consistent and having great features such as reliable transaction and concurrency with no read locks, which makes it popular.
Some of the reasons PostgreSQL went mainstream are due to the proven Architecture, Data Integrity, Reliability, Extensibility, Robust feature set, and, not to forget, its dedication to the Open-Source core platform development community. This made it easy to deliver continuous performance development and innovative solutions.
GitHub is an App Development platform that allows you to host and review program Codes, manage Projects, and design Software alongside millions of Developers. Developers choose GitHub since it makes available the significant DevOps features that organizations need for their projects, regardless of the size.
This article teaches two easy methods to perform a GitHub Postgres ETL. So let’s start!
Table of Contents
- Methods to Set up GitHub Postgres ETL
Reliably integrate data with Hevo’s Fully Automated No Code Data Pipeline
If yours are anything like the 1000+ data-driven companies that use Hevo, more than 70% of the business apps you use are SaaS.
Integrating the data from these sources in a timely way is crucial to fuel analytics and the decisions it takes.
But given how fast API endpoints can change, creating and managing these pipelines can be a soul-sucking exercise.
Hevo’s no-code data pipeline platform lets you connect over 150+ sources in a matter of minutes to deliver data in near real-time to your warehouse.
Moreover, the in-built transformation capabilities and the intuitive UI means that even non-engineers can set up pipelines and achieve analytics-ready data in minutes.
Take our 14-day free trial to experience a better way to manage data pipelines.Get started for Free with Hevo!
Methods to Set up GitHub Postgres ETL
Let us start with the manual method of ETL
Method 1: Manual GitHub Postgres ETL
Step 1: PostgreSQL Installation
Before you begin setting up GitHub Postgres ETL, you will have to install it. Follow these steps to get started.
1) Installing PostgreSQL on MacOS
Step 1: There are several ways to install PostgreSQL. One of the easier and more popular ways among programmers to get started is with Postgres.app
Navigate to http://postgresapp.com/ and then click “Download“:
Step 2: Once you’ve downloaded it on your system, double-tap to unzip the file and then move the PostgreSQL elephant symbol into your app folder. Double-tap the symbol to launch the app.
Step 3: You should see another window launched that shows “Welcome to Postgres“. If it says that it can’t connect with the Postgres server, the DEFAULT port is presumably already being used. Ensure you don’t have some other instances of Postgres on your PC. Uninstall them if you do so, and continue with these steps. Click the “Open psql” button:
Step 4: Postgres.app will create a role and database by default that match your current macOS username. Then proceed to connect straight away by running psql.
Step 5: You will then see something in your terminal looking like this (with your macOS username in front of the prompt rather than ‘Postgres):
Step 6: You can start using PostgreSQL now. For documentation on command-line tools etc. visit http://postgresapp.com/documentation/
2) Installing PostgreSQL on Ubuntu
Ubuntu’s default repositories contain Postgres packages, so we can install these easily using the apt packaging system. Since this is our first time using apt in this session, you need to refresh your local package index. You can then install the Postgres package and a -contrib package that adds some additional utilities and functionalities:
sudo apt-get update sudo apt-get install PostgreSQL PostgreSQL-contrib
By default the only role created is the default ‘Postgres’, so PostgreSQL will only respond to connections from an Ubuntu user called ‘Postgres’.
Now that your software is installed, you can go over how it works and how it may be different from similar Database Management Systems you may have used.
You’ve come across both operating systems to integrate PostgreSQL and GitHub in this post. However, you may use any of the environments available. For continuous integration, you can use GitHub Actions (PowerShell (pwsh)). GitHub Actions are an excellent mechanism for continuous integration or other automated tasks on your repo and a perfect fit for integrating PostgreSQL.
Besides, you will be using the latest GitHub Actions Virtual Environments that already has PostgreSQL installed by default after installation.
Each of the GitHub Actions in this guide will:
- Start PostgreSQL and check if PostgreSQL GitHub is integrated and running.
- Create a special user scheduler.
- Create a test database timetable.
Of course, you may want to add more steps in real life, e.g., import test data, checkout, build, test, gather coverage, release, etc.
Step 2: Integrating GitHub and PostgreSQL
1) PostgreSQL GitHub Actions for MacOS
To integrate GitHub and PostgreSQL using GitHub Actions on MacOS, write the following lines of code on the PostgreSQL interface of the recently installed application.
Setup-PostgreSQL-macOS: If: true # false to skip job during debug name: Setup PostgreSQL on MacOS runs-on: macos-latest steps: - name: Start PostgreSQL on MacOS run: | brew services start postgresql echo "Check PostgreSQL service is running" i=10 COMMAND='pg_isready' while [ $i -gt 0 ]; do echo "Check PostgreSQL service status" eval $COMMAND && break ((I--)) If [$i == 0]; then echo "PostgreSQL service not ready, all attempts exhausted" exit 1 fi echo "PostgreSQL service not ready, wait 10 more sec, attempts left: $i" sleep 10 done # Homebrew creates an account with the same name as the installing user, but no password - name: Create scheduler user run: | psql --command="CREATE USER scheduler PASSWORD 'somestrong'" --command="du" postgres - name: Create timetable database run: | createdb --owner=scheduler timetable PGPASSWORD=somestrong psql --username=scheduler --host=localhost --list timetable
There are only a few differences between Ubuntu and macOS:
- Use brew services to start the server.
- The pg_is ready procedure is more complicated.
As you probably noticed, you may skip sudo -u Postgres prefixes since the current user has all the rights needed in this environment.
2) PostgreSQL GitHub Actions for Ubuntu
setup-postgresql-ubuntu: if: true # false to skip job during debug name: Setup PostgreSQL on Ubuntu runs-on: ubuntu-latest steps: - name: Start PostgreSQL on Ubuntu run: | sudo systemctl start postgresql.service pg_isready - name: Create scheduler user run: | sudo -u postgres psql --command="CREATE USER scheduler PASSWORD 'somestrong'" --command="du" - name: Create timetable database run: | sudo -u postgres createdb --owner=scheduler timetable PGPASSWORD=somestrong psql --username=scheduler --host=localhost --list timetable
Nothing unusual here for Ubuntu users. You can use systemctl to start PostgreSQL and the pg_is ready utility to check if the server is running. To create a scheduler user, you can use a psql client in non-interactive mode. You can then send two commands to it:
- CREATE USER …;
- du — list users.
First, create the user. Second, output the list of users for control.
To create a timetable database, you can use the createdb utility. Pay attention to the fact that sudo -u Postgres allows you to not specify connection credentials because a system user is allowed to connect locally without any restrictions. Then, just like in the previous step, list the databases with psql for control. You have successfully integrated PostgreSQL GitHub.
3) PostgreSQL GitHub Action for Windows
To integrate PostgreSQL GitHub on the Windows operating system, please write the following commands on the PostgreSQL Windows OS interface.
Setup-postgresql-windows: if: true # false to skip job during debug name: Setup PostgreSQL on Windows runs-on: windows-latest steps: - name: Start PostgreSQL on Windows run: | $pgService = Get-Service -Name postgresql* Set-Service -InputObject $pgService -Status running -StartupType automatic Start-Process -FilePath "$env:PGBINpg_isready" -Wait -PassThru - name: Create scheduler user on Windows run: | & $env:PGBINpsql --command="CREATE USER scheduler PASSWORD 'somestrong'" --command="du" - name: Create timetable database run: | & $env:PGBINcreatedb --owner=scheduler timetable $env:PGPASSWORD = 'somestrong' & $env:PGBINpsql --username=scheduler --host=localhost --list timetable
With Windows, everything is different, but not as scary as haters usually paint it to be:
- The Windows environment uses PowerShell (pwsh) by default.
- Windows administers services.
The code is straightforward. The only thing we want to highlight is the ampersand sign (& aka call operator) before some commands. The call operator (&) allows you to execute a command, script, or function.
Since the PostgreSQL utilities psql and createdb are not in the PATH, you need to specify their full path. You use the environmental variable PGBIN for that. But after the concatenation operation, you’ll have a regular string. Therefore, a call operator will allow you to execute a resulting command.
Challenges of Setting up GitHub Postgres ETL Manually
- For PostgreSQL, remembering psql commands can be challenging. You can try to decode them. For instance, dt – describe tables, du – describe users, etc.
- Make sure you are using the recent virtual environment for your PostgreSQL installation.
Method 2: Using Hevo to Set up PostgreSQL GitHub Integration
Hevo helps you directly transfer data from various sources to PostgreSQL, Business Intelligence tools, Data Warehouses, or a destination of your choice in a completely hassle-free & automated manner.
Hevo is fully managed and completely automates the process of loading data from your desired source and enriching the data, and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled securely and consistently with zero data loss.
Hevo takes care of all your data preprocessing needs required to set up GitHub Postgres Integration and lets you focus on key business activities and draw a much more powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always has analysis-ready data in your desired destination.
The following steps can be implemented to set up GitHub Postgres Integration using Hevo:
Configure Source: Hevo can bring data from your GitHub account into your Destination for Free. Hevo connects to GitHub through Webhooks. Add Webhook URL to your GitHub Account, copy the generated webhook URL, and add it to your GitHub account.
Integrate Data: Complete GitHub Postgres Integration by providing your PostgreSQL database credentials, such as your authorized Username and Password, along with information about your Host IP Address and Port Number value. You will also need to provide a name for your database and a unique name for this destination.
Deliver smarter, faster insights with your unified data
Using manual scripts and custom code to move data into the warehouse is cumbersome. Changing API endpoints and limits, ad-hoc data preparation, and inconsistent schema make maintaining such a system a nightmare. Hevo’s reliable no-code data pipeline platform enables you to set up zero-maintenance data pipelines that just work.
Wide Range of Connectors – Instantly connect and read data from 150+ sources, including SaaS apps and databases, and precisely control pipeline schedules down to the minute.
In-built Transformations – Format your data on the fly with Hevo’s preload transformations using either the drag-and-drop interface or our nifty python interface. Generate analysis-ready data in your warehouse using Hevo’s Postload Transformation
Near Real-Time Replication – Get access to near real-time replication for all database sources with log-based replication. For SaaS applications, near real-time replication is subject to API limits.
Auto-Schema Management – Correcting improper schema after the data is loaded into your warehouse is challenging. Hevo automatically maps source schema with the destination warehouse so you don’t face the pain of schema errors.
Transparent Pricing – Say goodbye to complex and hidden pricing models. Hevo’s Transparent Pricing brings complete visibility to your ELT spend. Choose a plan based on your business needs. Stay in control with spend alerts and configurable credit limits for unforeseen spikes in the data flow.
24×7 Customer Support – With Hevo, you get more than just a platform. You get a partner for your pipelines. Discover peace with round-the-clock “Live Chat” within the platform. What’s more, you get 24×7 support even during the 14-day free trial.
Security – Discover peace with end-to-end encryption and compliance with all major security certifications,, including HIPAA, GDPR, and SOC-2.Get started for Free with Hevo!
You have now carried out a continuous integration between PostgreSQL GitHub using GitHub (PowerShell (pwsh) actions on the latest GitHub Actions Virtual Environments, with the latest PostgreSQL installed by default. Be that as it may, there is still a lot more to learn with Postgres and there’s a much simpler method of integrating PostgreSQL GitHub using the Hevo platform.
Hevo helps you directly transfer data from a source of your choices such as GitHub to a Data Warehouse, Business Intelligence, or desired destination like PostgreSQL in a fully automated and secure manner without having to write the code. It supports 150+ Sources including 30+ free Data Sources like GitHub. It will make your life easier and make data integration hassle-free. It is User-Friendly, Reliable, and Secure.
Want to take Hevo for a spin?
SIGN UP and experience the feature-rich Hevo suite first hand.
You can also look at the unbeatable pricing that will help you choose the right plan for your business needs.
Share your experience of learning the methods of GitHub Postgres ETL in the comments!