What we like about BigQuery

5 min read
Thumbnail for Blog Post - What we like about BigQuery || blog/why-bigquery/bigquery.png

As a data person, your most important working environment is likely going to revolve around a datawarehouse. We have worked with a plethora of different cloud datawarehouses in the past, and this is the second post in our series — talking about what we like about Google BigQuery.

Find the other parts in this series here:

The Pricing Model

BigQuery employs a consumption-based pricing model, aligning with the industry standard. However, its pricing intricacies warrant a closer examination:

On-Demand Queries and Storage Usage

The core of BigQuery’s pricing revolves around on-demand queries and storage utilisation. Users are billed based on the amount of data processed by queries and the storage capacity occupied by their datasets.

Pricing Tiers and Flexibility

BigQuery offers various pricing tiers, providing flexibility to cater to diverse workloads. Users can choose between on-demand pricing for sporadic usage and flat-rate pricing for more predictable, consistent workloads. The latter option, though, involves a commitment to a fixed monthly cost — which can be appealing when a business is starting to scale (so you expect to always use the reserved amounts) or when you are working with a LOT of data.

Pro Tips for Cost Management

⚠️ Query Cost Monitoring: Due to the inherent complexity of BigQuery’s pricing structure, vigilant monitoring of query costs is essential. Efficient queries contribute directly to cost savings. Craft queries thoughtfully, minimising unnecessary processing. Leverage tools like the Query Execution Details page to analyse and optimise query performance. Use the Query Validator to identify potential optimisations before execution.

⚠️ Data Storage Monitoring: Regularly review and manage stored data to avoid unnecessary costs. Implement data lifecycle policies to automatically expire or move outdated data to cheaper storage options like BigQuery’s Long-Term Storage. This ensures you’re not paying for storage that isn’t actively contributing to analytics.

⚠️ Leverage BigQuery’s cost controls: Set query processing limits to avoid unexpected spikes in usage. Establish budget alerts to receive notifications when costs approach predefined thresholds. This proactive approach allows users to manage and adjust usage based on budget considerations.

⚠️ Use table decorators: When working with time-based data, use table decorators to query specific points in time. This minimises the amount of data processed, reducing costs. Take advantage of this feature, especially when dealing with large datasets with historical information.

Security

BigQuery employs a comprehensive set of security features and practices to ensure the protection of data and maintain a secure environment for users, including:

Authentication and Authorisation

  • Identity and Access Management (IAM): BigQuery integrates with Google Cloud’s IAM, enabling administrators to control access to BigQuery resources. IAM allows the assignment of roles (such as editor or viewer) to users, groups, or service accounts, ensuring fine-grained control over permissions. This is not always as easy as it sounds: for instance, you can assign access to BigQuery but not the organisation itself — so it looks like access is granted, but it isn’t really.
  • OAuth and Service Account Authentication: Users can authenticate through OAuth 2.0, and service accounts provide a way for applications and virtual machines to access BigQuery securely.

Global Data Redundancy

BigQuery provides global data replication, ensuring redundancy across multiple geographic locations. This not only enhances availability but also contributes to disaster recovery and data durability.

Integration with Google Cloud Ecosystem

Integration with the broader Google Cloud ecosystem provides numerous advantages for BigQuery users. This seamless integration offers a unified platform for cloud computing, allowing easy accessibility and storage of data through services like Cloud Storage. Users can efficiently ingest and transfer data using Cloud Pub/Sub, visualise and report data using Looker, and incorporate machine learning models through services like AI Platform and VertexAI. Google Analytics (and Google Ads) data is also very easily loaded into the system, as are Firebase events.

The integration extends to identity and access management policies, enabling consistent security measures. Users can also leverage serverless compute with Cloud Functions, orchestrate workflows using Cloud Composer, and collaborate seamlessly with Google Workspace.

Additionally, the integration includes monitoring and logging, data lifecycle management, global infrastructure benefits, and unified billing and cost management, creating a powerful and cohesive environment for comprehensive data solutions.

We find this super useful in cases where our clients might already have a signficant presence on GCP. Beyond that, we love Dataform’s philosophy — we still prefer to use dbt for robust data model deployment, but if you want to get started quickly with proper modelling approaches then GCP/BigQuery have you covered.

Real-time Analytics

BigQuery facilitates real-time analytics through features designed for continuous data ingestion and analysis. It supports streaming inserts, enabling the immediate processing of data as it arrives. The streaming buffer allows near real-time availability of data for queries. Time-partitioned tables aid in efficiently querying recent data, and continuous query processing ensures dynamic and up-to-date results. Event-triggered processing and materialised views enhance flexibility and performance.

Real-time analytics results can be seamlessly integrated with data visualisation tools for live dashboards. Monitoring and alerting capabilities, dynamic schema handling, and integration with external streaming platforms further contribute to a comprehensive real-time analytics environment in BigQuery.

Drawbacks of BigQuery:

  1. Limited Time Travel Functionality:
    • Challenge: In contrast to some competitors like Snowflake, BigQuery provides a more constrained time travel functionality, featuring a shorter retention period for data versioning.
    • Impact: Users may find limitations in the historical data retrieval options, which could impact scenarios requiring longer data retention for analysis and recovery.
  2. Manual Clustering:
    • Challenge: BigQuery lacks automatic clustering, placing the responsibility on users to manually manage clustering for optimising table layouts and improving query performance.
    • Impact: The absence of automated clustering may require additional effort to organise and structure data effectively, impacting the efficiency of certain queries and increasing costs.
  3. Data Loading Performance:
    • Challenge: While BigQuery is highly scalable, the performance of data loading can vary based on the structure and characteristics of the data being ingested.
    • Impact: Users should be mindful of the potential fluctuations in data loading performance, especially when dealing with diverse and sizeable datasets, to ensure consistent and predictable data upload times.

Understanding and addressing these challenges is crucial for users seeking to maximise the benefits of BigQuery while navigating its complexities. Diligent management and optimisation efforts can help mitigate these drawbacks and enhance the overall experience with the platform.

In summary, the choice between BigQuery and other cloud warehouses depends on various factors, including your organisation’s existing infrastructure, specific use cases, budget considerations, and preferences in terms of features and capabilities. It’s essential to evaluate all platforms based on your unique requirements before making a decision.