PeterK
Xpert
In a recent user group, I described our solution to keeping track of Exasol metadata. This article is meant to provide more details of our implementation for those who may wish to adapt parts of it.

 

Background

First of all, by "Exasol metadata" I mean: table, view, function, script and schema DDL, users, roles, permissions, connections, system parameters, etc

The problems we are trying to solve are:

  1. Historicizing the metadata
    • Being able to easily see what any of the metadata was in the past
    • Being able to easily restore any portion of the metadata based on a previous snapshot
  2. Being aware of all metadata changes in Exasol so that we can evaluate their impact on:
    • Security
    • Performance
  3. Being able to give view authors feedback regarding
    • Best practices. Guiding them in the use of native or custom functions/scripts.
    • SQL formatting
    • Performance

Our solution, in summary, is to dump all the desired Exasol metadata on a regular basis into text files that are then stored and tracked in a code-versioning system. The versioning system handles change reviews, provides a feedback mechanism, provides an audit trail, and can be used as the source for restoring metadata from a backup. 

 

Prerequisites

Our specific implementation requires being familiar with Golang and git/bitbucket however these can generally be replaced by any coding/scripting language and code-versioning system that you are comfortable with.

 

How to track Exasol metadata in git

 

Step 1: Decide on the update interval and scheduler

You need to decide how often you want to capture a snapshot of Exasol's metadata. Keep in mind its performance impact on Exasol. Depending on how much metadata you are capturing it can take a minute or longer to capture. We chose to capture it daily.

You'll also need a way to regularly run the job. This could be as simple as a cronjob or you can use a job scheduling app.

 

Step 2: Open a branch in git/bitbucket

git uses "branches" to keep track of related changes in text files. Each branch will end up being sent to one or more reviewers for approval and so you may want to open multiple git branches, each one containing related Exasol metadata that pertains to a particular group of reviewers. For instance, changes to users, roles, and permissions could be sent to DBAs for review while changes to views could be sent to whoever is responsible for that team's views.

At the beginning of the job run you'll need to open a git branch:

git clone <your-git/bitbucket-repo> <local-dir>

cd <local-dir>

git checkout -B <branch-name>

The branch name should probably contain today's date and some some indication of the branch's purpose e.g "2021-11-17-USER-PERMISSIONS"

 

Step 3: Export the Exasol metadata

Next you'll need to export whatever Exasol metadata you want to keep track of into text files within your local git branch. 

If you are familiar with Golang you can use this library to do so. You will need to write a Golang wrapper around the library in order to customize your mechanism for providing Exasol connection credentials and passing command-line options. This particular library will store the DDL for each database object in an individual file in a directory structure that mirror the Exasol schema. 

You can also use Exasol's database migration SQL as a starting point for writing a script in the language of your choice for converting Exasol metadata to SQL/DDL that can then be saved in files within your git checkout.

Either way the end point of this step should be a set of text files in your local git directory containing the DDL representing the current state of your Exasol instance.

 

Step 4: Commit any changes to git

It's possible there have been no metadata updates since the last capture so you'll need to first look for changes so that you don't send out empty branches for review:

git status -s

If that doesn't return any output then there have been no changes and so you can exit your job.

If there is output then you'll need to commit the changes to git:

git add --all

git commit -m "Exasol metadata backup: <branch-name>"

git push origin <branch-name>

At this point you can delete your local git checkout directory if you wish

cd ../

rm -rf <local-dir>

 

Step 5: Send out BitBucket pull request for review

Different code-versioning systems have different ways of triggering requests for review. In BitBucket's case it can be done via their API:

curl "https://bitbucket-url/rest/api/1.0/projects/your-proj/repos/your-repo/pull-requests" \
    --user "$user:$password" \
    --header "Content-Type: application/json" \
    --data @- <<EOJ
{
   "title": "$title",
   "description": "$description",
   "fromRef": { "id": "refs/heads/$branch_name" },
   "toRef": { "id": "refs/heads/master" },
   "reviewers": [
      {"user":{"name":"..."}}
   ]
}
EOJ

 Here you would decide who gets to review this portion of the metadata (multiple reviewers if so configured).

 

Step 6: Review the changes and provide feedback

At this point the various reviews can look over the changes and provide feedback to the view authors (or whoever maintains that particular metadata.)  

Important: Note that the metadata changes highlighted in the pull request represent changes that have already been made in Exasol.  So this is more of an "after-the-fact" review. If you what a process for requiring reviews before the changes are made in Exasol you could consider requiring the changes first be made in a non-production Exasol instance, followed by a review/approval, and only then applying/pushing the changes to a production instance.

 

Step 7: Merge the changes

Since the changes have already been made in Exasol the concept of "failing/rejecting a pull request" doesn't really make sense. So regardless of whether the feedback is positive or negative, in the end, the pull request needs to be merged into the git trunk so that the subsequent job run can be based off of it. Bitbucket can be configured to either auto-merge pull-requests upon approval or require a separate merge step. 

 

Additional Notes

If you wish you also choose to export small amounts of table or view data (the actual content, not the DDL) and store/track it in git. The Golang libary referenced above includes an option to enable this. This is handy for important lookup tables that, for instance, control security access. You would then get an alert (pull request) whenever it changes. This can be useful for catching mistakes or for generally making a wider group of people aware of changes. 

 

Additional References

 

We're happy to get your experiences and feedback on this article below! 

Comments
littlekoi
Xpert

Keeping Exasol metadata / DDL in Git is a great idea.

However, it might be better to do it other way around. Change DDL in Git first, review and test it, and apply changes to production Exasol afterwards.

The full workflow can look like this:

  1. Developers implement some changes in individual task branches and test in individual DEV environments. Task branches are pushed to Git.
  2. Other developers review changes, approve or reject.
  3. All approved tickets / branches are combined into single release branch, which is automatically tested in STAGING environment.
  4. If everything is alright, changes are deployed to production and applied to PROD environment.

 

With this approach you will be able to:

  • track changes not only to specific day, when it was applied, but also to specific person and specific ticket;
  • track all changes, including everything which happened between "syncs";
  • review every change before it goes live, not after;
  • create consistent fresh DEV environment from scratch at any point of time, using only files in Git;

 

Thank you!

PeterK
Xpert

Thank you @littlekoi  - This is a good point that is important to explain further.  The workflow I described above is only one of many possible workflows that can be implemented using the same basic tools (metadata-exporter + code-versioning-tool). The warehouse admins should decide which workflow is best for their company. You could even have multiple workflows for different types of metadata (this is what we do.)

Example 1:

We allow non-developers (e.g. business analysts) to create views directly in production Exasol in sandboxed-schemas for their own data-mining purposes or for their own dashboards/visualizations. These are non-critical internal-use only views so there is no need to require prior-reviews. For this use-case the workflow I described works well. The workflow allows admins or team-leads to review the non-developer's DDL changes on a daily basis and provide advice.

Example 2:

For non-developers (users not having direct access to git) creating critical views that should be reviewed/approved prior to production then the workflow is similar to the above but happens in a non-production Exasol environment. The users build their views and on a daily (or other) basis the changes automatically generate a PR. The reviewers don't merge the PR until they are satisfied with the changes. Once it is merged it can be deployed to production based on the DDL in git. Since the stored files are valid DDL the deployment can even be automated if you integrate with a deployment tool (e.g. Jenkins + BitBucket).

Example 3:

For developers who have access to git the workflow could be that they create their own git branches and update them with the DDL-export tool on an ad-hoc basis. In this case there is no need for automating the daily PR. The developer can send the PR when they are ready and the rest of the flow proceeds like Example 2.

Example 4:

There are other use-cases that don't involve views at all where you want the review to be after-the-fact in production. For example when there are multiple DBAs it is very handy for all of them to see a PR containing all the previous day's changes in non-view metadata like users, roles, permissions, parameters, etc  This keeps everybody aware of the changes and is very handy for catching mistakes. (This has saved me multiple times when object permissions were silently dropped due to a "CREATE OR REPLACE ..." !!)

 

So I think the important take-away here is just that by using existing metadata-exporting and code-versioning tools we get quite far in taming the wild beast that is warehouse-metadata 🙂

ugamarkj
Xpert

My implementation is similar to Peter's. My developers have their own dev schemas in production. My managers are responsible for reviewing the dev schema objects before migrating them to production. We use DataGrip to help with this. A change log and JIRA ticket references are included in a comment at the top of the views and the DataGrip diff viewer lets my managers compare the code to what is in production.

So Git / BitBucket are more of a historical record that can also be linked to our JIRA stories to improve forensic and recovery capabilities. This works well enough for us. Having it be part of the production migration workflow might be nice, but it adds complexity and the potential benefit isn't currently worth the overhead.