Low-code Automation for Google BigQuery

By Clive Bearman, Director of Product Marketing; and Damien Edwards, Analytics Data Architect

Google BigQuery is a cost-effective, highly scalable, serverless data warehouse designed for business agility. The platform manages all resources, scales automatically and is highly available. Consequently, warehouse administrators don’t have to spend time making typical implementation decisions like CPU sizing or maximum storage allocation.

Google BigQuery supports two different SQL dialects:

  1. Standard SQL: Standard SQL is compliant with the SQL 2011 standard and is the preferred dialect for querying data.
  2. Legacy SQL: This dialect was supported by the first version of BigQuery.

Click here for a complete explanation of the Google BigQuery Architecture.

Why Automate Google BigQuery With Qlik?

  1. Improve Initial data loading
  2. Reducing manual SQL coding warehouse administration
  3. Automating data warehouse updates
  4. Decreasing the time required to prep and provision data for downstream analytics

In fact, we can automate these areas and many more with the Qlik Data Integration (QDI) platform. The latest release of the QDI solution is optimized for Google BigQuery and automates many aspects of BigQuery operation. For example, we can use Qlik Replicate for near real-time change data capture (CDC) and Qlik Compose for Data Warehouses to build internal data warehouse structures and provision data mart datasets.

Qlik Data Integration For Google BigQuery

The following paragraphs describe features specifically created and optimized for data warehouse automation. They are as follows:

  • Data Warehouse Model Generation
  • Automated Mapping Generation
  • Data Warehouse ETL Generation
  • Data Mart ETL Generation
  • Workflow Generation and Orchestration

Data Warehouse Model Generation

Qlik Compose for Data Warehouses makes it easy to generate a logical model from source data by connecting to source and introspecting the metadata to produce a model that conforms to the third normal form(3NF)/Data Vault methodology. Qlik also provides support for assigning Type1/Type2 history attributes within your logical models, improving the data loading process for the BigQuery dataset tables.

Automated Mapping Generation

Qlik Compose for Data Warehouses will create the BigQuery dataset tables and generate mappings for the full and CDC ETL sets. In the ETL sets, Compose will automatically generate table mappings between the landing area columns and staging columns for the BigQuery Data Warehouse dataset tables. Ultimately, this reduces development lifecycle time.

Data Warehouse ETL Generation

Qlik automatically generates and orchestrates the SQL syntax required to execute the ETL process within BigQuery that complete the full load of the tables in the data warehouse. Qlik also automatically generates and orchestrates the SQL syntax necessary to load the CDC data into BigQuery datasets.

Data Mart ETL Generation

Qlik provides a data wizard that automatically creates a transactional data mart utilizing the BigQuery data warehouse data set. Qlik’s automation refreshes data in a transactional BigQuery data mart without complex manual transformation or scripting.

Workflow Generation and Orchestration

You can also use Qlik to generate a full and CDC workflow with tasks to load the BigQuery data warehouse and data mart tables. The workflows to load CDC data can be scheduled through Qlik or from a third-party scheduling tool.

Six Steps To Google BigQuery Automation

It just takes six steps to automate Google Big Query with Qlik, and they are as follows:


Step 1. Automate Data Ingest and Update

Configure Qlik Replicate Task to use Google BigQuery as an endpoint. Qlik Replicate will create the dataset in BigQuery from virtually any data source whether on-premises or in the cloud. Once Qlik Replicate completes the full data load, the tasks transitions to CDC mode to replicate source data changes in near real-time.

Step 2. Qlik Compose Source and Data Warehouse Configuration

Configure a Qlik Compose for Data Warehouses project landing and data warehouse connection. The data warehouse connection will provide details of the target data warehouse and data mart dataset. The landing connections specifies the BigQuery data set that will be used for integration within the data model.

Step 3. Automatically Generate the Warehouse Data Model

The model panel is used to intelligently discovered table metadata for the logical and physical data warehouse model. Manage Model controls which attribute changes and history types can be applied to the data warehouse model.

Step 4. Generate Data Warehouse Automation Instructions

The data warehouse panel selects tables that will be created as Google BigQuery tables in the data warehouse dataset. Table mappings will be created that manage the relationships between the landing and staging columns in the BigQuery tables. Once completed, the ETL process code is generated and ready for execution on BigQuery.

Step 5. Create Transactional Data Marts

The data mart panel provides a “New Star Schema” wizard to create the transactional star schema. The wizard starts by asking you to choose a fact table from the data warehouse dataset. You then select the parent dimension tables for the chosen fact table. You also select an attribute to serve as the transaction date for the schema. Once the schema wizard completes, BigQuery syntax is generated and executed to load create and load the data mart tables.

Step 6. Orchestrate the Warehouse Workflow

When the data warehouse CDC ETL are set, a task is created and added to the default CDC workflow in the Qlik Compose monitor screen. A data mart task is also added to the default CDC workflow.

Conclusion

Google BigQuery is a market-leading cloud data warehouse that provides exceptional performance when manipulating large data sets for analytic workloads. Although BigQuery offers virtually hands-free management, Qlik’s Data Warehouse automation solution will save you significant development time when used with the Google platform. Complete the following form to request a personalized demonstration of Qlik’s low-code data warehouse automation solution.

Automating Google #BigQuery w/ @Qlik's #data integration platform can save you significant development time, per @cbearman

 

In this article:

Keep up with the latest insights to drive the most value from your data.

Get ready to transform your entire business with data.

Follow Qlik