HeatWave
HeatWave provisions HeatWave clusters for accelerated processing of analytic queries. A HeatWave cluster consists of a MySQL DB System node and two or more HeatWave nodes. The MySQL DB System node includes a HeatWave plugin that is responsible for cluster management, query scheduling, and returning query results to the MySQL DB System. HeatWave nodes store data in memory and process queries.
When a HeatWave cluster is enabled, queries that meet certain prerequisites are automatically offloaded from the MySQL DB System to the HeatWave cluster for accelerated execution. Queries are issued from a MySQL client or application that interacts with the HeatWave cluster by connecting to the MySQL DB System node. Results are returned to the MySQL DB System node and to the MySQL client or application that issued the query.
This guide describes how to deploy and manage HeatWave clusters. It is intended for HeatWave administrators and assumes familiarity with MySQL and related tools. After deploying a HeatWave cluster, refer to the HeatWave User's Guide for information about how to load data and run queries.
Alternatively, refer to one of the HeatWave Quickstarts:
- The HeatWave tpch Quickstart walks you through adding a HeatWave cluster to a MySQL DB System, importing the
tpch
sample database into the into the DB System using the MySQL Shell Parallel Table Import Utility, manually loading data into HeatWave, and running queries. - The HeatWave airportdb Quickstart walks you through adding a HeatWave cluster to a MySQL DB System, importing the
airportdb
sample database into the DB System using the MySQL Shell Dump Load utility, loading data into HeatWave using Auto Parallel Load, and running queries.
For information about MySQL, see the product documentation available at MySQL Documentation.
Adding a HeatWave Cluster to a DB System
- The DB System was created using a MySQL.HeatWave.VM.Standard.E3 or MySQL.HeatWave.BM.Standard.E3 shape. The MySQL.HeatWave.BM.Standard.E3 shape is recommended for data sets of 10TB or more.
Managing HeatWave Clusters
This topic describes how to manage your HeatWave clusters using the console. The following topics are described:
HeatWave Cluster Information
This section describes the HeatWave Cluster Information and HeatWave Node sections of the MySQL DB System Details page, where you can view, manage, and edit your HeatWave cluster .
Table 14-1 HeatWave Cluster Information
Field | Description |
---|---|
Actions | The following actions are possible:
|
Cluster Size | The number of nodes in the cluster. |
Shape | The shape used for HeatWave cluster nodes. |
State | The state of the HeatWave cluster:
|
Created | The date and time the HeatWave cluster was created. |
Last Updated | The date and time the HeatWave cluster was last updated. |
Table 14-2 HeatWave Nodes
Column | Description |
---|---|
Node Id | The name of the HeatWave node. |
State | Current state of the HeatWave node:
|
Created | The date and time the HeatWave node was created. |
HeatWave Cluster Failure and Recovery
When a HeatWave node failure is detected, HeatWave automatically attempts to bring the node back online and reform the cluster.
Status of the HeatWave cluster can be monitored on the MySQL DB Systems list page. See Checking HeatWave Cluster Status for more information. HeatWave monitors node status using a heartbeat mechanism. If there is no response from a node after 60 seconds, the node is considered down, which triggers the recovery process.
When the HeatWave cluster is reformed, HeatWave attempts to reload data that was previously loaded. By default, data is reloaded from an Object Storage persistence layer that is created when the HeatWave cluster is enabled for the first time. To facilitate recovery, data is persisted to Object Storage when it is loaded into HeatWave and when data changes are propagated from the DB System to HeatWave. If reloading data from Object Storage fails for any reason, HeatWave reloads the data from the DB System. Loading data from Object Storage is faster because the data does not need to be converted to the HeatWave storage format, as is required when loading data from the DB System. Data removed from HeatWave when a table is unloaded is removed from Object Storage in a background operation.
To monitor the recovery process, you can connect to the DB System using a MySQL client and query the rapid_service_status
variable, which reports the status of the HeatWave cluster. For information about connecting to the DB System, see Connecting to a DB System.
mysql> SHOW STATUS LIKE 'rapid_service_status';
+----------------------+--------------+
| Variable_name | Value |
+----------------------+--------------+
| rapid_service_status | ONLINE |
+----------------------+--------------+
OFFLINE
: The HeatWave cluster has not formed yet. No operations can run.CLUSTERREADY
: The HeatWave cluster has formed but some tables are pending recovery. Operations are permitted on tables that are loaded. To determine which tables are loaded, you can query the HeatWave Performance Schema tables.mysql> USE performance_schema; mysql> SELECT NAME, LOAD_STATUS FROM rpd_tables,rpd_table_id WHERE rpd_tables.ID = rpd_table_id.ID;
ONLINE
: The HeatWave cluster is established and previously loaded tables are reloaded. Operations are permitted.RECOVERYFAILED
: There was an unsuccessful attempt to recover the tables after a HeatWave cluster failure. The cause of the failure could be a network error, a software failure, a hardware failure, or some other issue that cannot be resolved by the automated recovery process. In most such scenarios, stopping and starting the HeatWave cluster can resolve the issue.One possible cause is that the HeatWave cluster has run out of memory. After the HeatWave cluster is back online, ensure that you have enough nodes for your data by performing a node count estimate. See Generating a Node Count Estimate.
HeatWave Cluster Size Estimates
This topic describes HeatWave Cluster Node Count Estimates.
HeatWave Cluster Node Count Estimates provide a recommendation for the number of HeatWave nodes needed to run a workload. Recommendations are based on machine learning. When the service is started, database tables on which HeatWave queries are run need to be loaded to HeatWave cluster memory. The size of the HeatWave cluster needed depends on the tables and columns to be loaded, and the compression achieved in memory for this data. Under-provisioning the HeatWave cluster results in a data load or query execution failure due to space limitations. Over-provisioning the HeatWave cluster results in additional costs for unneeded resources. Based on the database tables the user intends to load to memory, machine learning is used to intelligently predict the number of HeatWave nodes needed.
The following topics are described:
Generating a Node Count Estimate
This topic describes how to generate a HeatWave node count estimate to determine the number of nodes required for your data.
A node count estimate is generated using machine learning techniques based on the node shape that you select and the data present in the DB System associated with the HeatWave cluster. You can generate a node count estimate when adding a HeatWave cluster to a DB System, or at any time after to adjust the number of nodes as your data increases or decreases in size.
- The data you intend to load into the HeatWave cluster must be present on the DB System.
- Optionally, log into your DB System and run
ANALYZE TABLE
on tables you intend to load into the HeatWave cluster. Estimates should generally be valid without runningANALYZE TABLE
, but runningANALYZE TABLE
ensures that estimates are as accurate as possible.
Node Count Estimate Table Errors
This topic describes table errors that may appear in node count estimate results.
Node Count Estimate Table Errors
Table Error | Description |
---|---|
TOO MANY COLUMNS TO LOAD | The table has too many columns. The column limit is 470. |
ALL COLUMNS MARKED AS NOT SECONDARY | There are no columns to load. All table columns are defined as NOT SECONDARY . |
CONTAINS VARLEN COLUMN WITH >8000 BYTES | A VARLEN column exceeds the 8000 byte limit. For more information on VARLEN , see VARLEN Encoding |
ESTIMATION COULD NOT BE CALCULATED | The estimate could not be calculated. For example, a table estimate may not be available if statistics for VARLEN columns are unavailable. |
UNABLE TO LOAD TABLE WITHOUT PRIMARY KEY | A table must be defined with a primary key before it can be loaded into HeatWave. |
HeatWave tpch Quickstart
The HeatWave tpch Quickstart walks you through adding a HeatWave cluster to a MySQL DB System, importing data into the into the DB System using the MySQL Shell Parallel Table Import Utility, manually loading data into HeatWave, and running queries.
The following topics are described:
Prerequisites for the tpch Quickstart
This topic lists the prerequisites for the HeatWave tpch Quickstart.
- An operational MySQL DB System created using the MySQL.HeatWave.VM.Standard.E3 or MySQL.HeatWave.BM.Standard.E3 shape. If you do not have a MySQL DB System, refer to Getting Started with MySQL Database Service for the steps required to create one. Make note of the IP address of the DB System's Endpoint, and the administration user and password.
- A running Compute instance (Oracle Linux is used in this Quickstart) attached to a public subnet on the same VCN as the MySQL DB System. Make note of the public IP address of the compute instance. For information about setting up a Compute instance, refer to Creating a Compute Instance.
- MySQL Shell 8.0.22 or higher installed on the Compute instance. For installation instructions, see Connecting to the MySQL DB System with SSH and MySQL Shell.
- Access to Object Storage and an existing bucket.
- A valid OCI CLI configuration file. See SDK and CLI Configuration File. If you have installed and configured the Command Line Interface in the default location, you have a valid configuration file. If you have not installed and configured the CLI, you must either install it, or create a configuration file manually. MySQL Shell references the CLI configuration file when accessing the Object storage bucket.
- In addition to the mandatory policies for the MySQL DB System, you, or your group, have been granted the
mysql-analytics
policies described in Policy Details for MySQL Database Service. - The
tpch
sample database referenced in the Quickstart is an ad-hoc decision support database derived from the TPC Benchmark™ H (TPC-H) specification. The database consist of eight separate tables. For an overview of the database schema, refer to the Logical Database Design section of the specification document. Example DDL statements are provided for creating the schema and tables, but you must supply your own data files for populating thetpch
sample database. You can generate data using thedbgen
tool provided in the TPC-H Tools download from TPC Download Current. At least 1GB of data is recommended. For instructions, refer to the reference section of the HeatWave User Guide.
Uploading tpch Sample Data to Object Storage
This topic describes how to upload tpch
sample data files to an Object Storage bucket.
tpch
sample data files referenced in the following instructions are in CSV format with fields terminated by the pipe "|" character. In total, the files should contain approximately 1GB of data. The tpch
sample data files include:nation.tbl
region.tbl
part.tbl
supplier.tbl
partsupp.tbl
customer.tbl
orders.tbl
lineitem.tbl
tpch
sample data files to the Object Storage bucket:- In the Oracle Cloud Infrastructure Console navigation menu, go to Object Storage, and then select Object Storage.
- From the list of Buckets, click Create Bucket or select your object storage bucket.
- On the Bucket Details page, under Objects, click Upload.
- In the Upload Objects panel, drag and drop the
tpch
sample database files to the dropzone, or click select files to locate them on your machine. - Click Upload, and then click Close.
- Note the object URLs for each file that you uploaded. In the list of objects, click the Actions icon (three dots) for an object, and then click View Object Details. The object URLs are required later when loading data.
Creating the tpch Sample Database and Importing Data
This topic describes how to create the tpch
sample database on the MySQL DB System and import the sample data. The sample data must be available on the MySQL DB System before it can be loaded into the HeatWave Cluster.
Sample database creation and import operations are performed using MySQL Shell. The MySQL Shell parallel table import utility provides fast data import for large data files. The utility analyzes an input data file, divides it into chunks, and uploads the chunks to the target MySQL DB System using parallel connections. The utility is capable of completing a large data import many times faster than a standard single-threaded upload using a LOAD DATA
statement. For additional information, see Parallel Table Import Utility.
If you prefer to use your own data instead of the
tpch
sample database, refer to Importing and Exporting for information about loading data into the MySQL DB System. After loading data into the MySQL DB System, refer to the HeatWave User Guide for information about preparing data and loading it into the HeatWave Cluster.tpch
sample database on the MySQL DB System and import data:Adding a HeatWave Cluster
This topic describes how to add a HeatWave cluster to your MySQL DB System.
To add a HeatWave cluster:
- Open the navigation menu and select Databases. Under MySQL, click DB System.
- On the MySQL DB Systems page, select your DB System.
- On the MySQL DB Systems Details page, select Add HeatWave Cluster from the More Actions drop-down menu.
- On the Add HeatWave Cluster dialog, click Select Shape to select a shape for your HeatWave nodes. The shape defines the number of CPU cores, the amount of RAM, and so on. Currently, only the MySQL.HeatWave.VM.Standard.E3 shape is supported for HeatWave nodes.
- Define the number of HeatWave nodes to create. The default number of nodes is 2, which is a sufficient number of nodes for the HeatWave
tpch
Quickstart, assuming roughly 1GB of data. Optionally, you can perform a node count estimate to determine an appropriate number of nodes to create. For details, see Generating a Node Count Estimate. - Click Add HeatWave Cluster to create the HeatWave cluster.
Loading tpch Data Into the HeatWave Cluster
This topic describes how to load tpch
data into the HeatWave cluster.
RAPID
as the secondary engine, and executing table load operations.tpch
sample data into the HeatWave cluster:Running tpch Queries
This topic describes how to query tpch
data in the HeatWave cluster.
For other tpch
sample database queries that you can run, see Additional tpch Queries. For more information about running queries, refer to the HeatWave User Guide.
Additional tpch Queries
This topic provides additional tpch
queries that you can run to test the HeatWave Cluster.
TPCH-Q1: Pricing Summary Report Query
As described in the TPC Benchmark™ H (TPC-H) specification: "The Pricing Summary Report Query provides a summary pricing report for all lineitems shipped as of a given date. The date is within 60 - 120 days of the greatest ship date contained in the database. The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG
and LINESTATUS
, and listed in ascending order of RETURNFLAG
and LINESTATUS
. A count of the number of lineitems in each group is included."
SELECT
l_returnflag,
l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
GROUP BY l_returnflag , l_linestatus
ORDER BY l_returnflag , l_linestatus;
TPCH-Q3: Shipping Priority Query
As described in the TPC Benchmark™ H (TPC-H) specification: "The Shipping Priority Query retrieves the shipping priority and potential revenue, defined as the sum of l_extendedprice * (1-l_discount)
, of the orders having the largest revenue among those that had not been shipped as of a given date. Orders are listed in decreasing order of revenue. If more than 10 unshipped orders exist, only the 10 orders with the largest revenue are listed."
SELECT
l_orderkey,
SUM(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < DATE '1995-03-15'
AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey , o_orderdate , o_shippriority
ORDER BY revenue DESC , o_orderdate
LIMIT 10;
TPCH-Q9: Product Type Profit Measure Query
As described in the TPC Benchmark™ H (TPC-H) specification: "The Product Type Profit Measure Query finds, for each nation and each year, the profit for all parts ordered in that year that contain a specified substring in their names and that were filled by a supplier in that nation. The profit is defined as the sum of [(l_extendedprice*(1-l_discount)) - (ps_supplycost * l_quantity)]
for all lineitems describing parts in the specified line. The query lists the nations in ascending alphabetical order and, for each nation, the year and profit in descending order by year (most recent first). "
SELECT
nation, o_year, SUM(amount) AS sum_profit
FROM
(SELECT
n_name AS nation,
YEAR(o_ORDERdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM
part
STRAIGHT_JOIN partsupp
STRAIGHT_JOIN lineitem
STRAIGHT_JOIN supplier
STRAIGHT_JOIN orders
STRAIGHT_JOIN nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_ORDERkey = l_ORDERkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%') AS profit
GROUP BY nation , o_year
ORDER BY nation , o_year DESC;
HeatWave airportdb Quickstart
The HeatWave airportdb Quickstart walks you through adding a HeatWave cluster to a MySQL DB System, importing data into the DB System using the MySQL Shell Dump Load utility, loading data into HeatWave using Auto Parallel Load, and running queries.
The following topics are described:
The
airportdb
Quickstart involves uploading the airportdb
data files to an Object Storage bucket and loading airportdb
into a DB System from the Object Storage bucket. Alternatively, if you do not want to install from an Object Storage bucket, you can download airportdb
to a Compute instance and load it into the DB System directly from the Compute instance. For instructions, see Installing airportdb from a Compute Instance.Prerequisites for the airportdb Quickstart
This topic lists the prerequisites for the HeatWave airportdb Quickstart.
- An operational MySQL DB System created using the MySQL.HeatWave.VM.Standard.E3 or MySQL.HeatWave.BM.Standard.E3 shape. If you do not have a MySQL DB System, refer to Getting Started with MySQL Database Service for the steps required to create one. Make note of the IP address of the DB System's Endpoint, and the administration user and password.
- A running Compute instance (Oracle Linux is used in this Quickstart) attached to a public subnet on the same VCN as the MySQL DB System. Make note of the public IP address of the compute instance. For information about setting up a Compute instance, refer to Creating a Compute Instance.
- MySQL Shell 8.0.22 or higher installed on the Compute instance. For installation instructions, see Connecting to the MySQL DB System with SSH and MySQL Shell.
- Access to Object Storage and an existing bucket.
- A valid OCI CLI configuration file. See SDK and CLI Configuration File. If you have installed and configured the Command Line Interface in the default location, you have a valid configuration file. If you have not installed and configured the CLI, you must either install it, or create a configuration file manually. MySQL Shell references the CLI configuration file when accessing the Object storage bucket.
- In addition to the mandatory policies for the MySQL DB System, you, or your group, have been granted the
mysql-analytics
policies described in Policy Details for MySQL Database Service.
Uploading airportdb Sample Data to Object Storage
This topic describes how to upload the airportdb
data files to an Object Storage bucket.
- Download the
airportdb
sample database to a temporary location such as/tmp/
orC:\temp\
and unpack it. Theairportdb
sample database is available for download from Other MySQL Documentation as a compressed tar or Zip archive, which is approximately 640 MBs in size. The database consists of 14 tables and is approximately 2GB in size when installed. For an overview of theairportdb
schema, refer to Setting Up the airportdb Database. - In the Oracle Cloud Infrastructure Console navigation menu, go to Object Storage, and then select Object Storage.
- From the list of buckets, click Create Bucket or select your object storage bucket.
- On the Bucket Details page, under Objects, click Upload.
- In the Upload Objects panel, drag and drop the sample database files to the dropzone, or click select files to locate them on your machine.
- Click Upload, and then click Close.
Loading airportdb on the DB System
This topic describes how to load the airportdb
sample database on the MySQL DB System using the MySQL Shell dump loading utility. Data must be available on the MySQL DB System before it can be loaded into the HeatWave Cluster.
The MySQL Shell dump loading utility util.loadDump()
supports loading schemas on a DB System. Using parallel threads, the utility is capable of completing a large data loads many times faster than a standard single-threaded upload. For additional information, see Dump Loading Utility.
If you prefer to use your own data instead of the
airportdb
sample database, refer to Importing and Exporting for information about loading data into the MySQL DB System. After loading your own data into the MySQL DB System, refer to the HeatWave User Guide for information about preparing and loading data into the HeatWave Cluster.airportdb
sample database on the MySQL DB System:Adding a HeatWave Cluster
This topic describes how to add a HeatWave cluster to your MySQL DB System.
To add a HeatWave cluster:
- Open the navigation menu and select Databases. Under MySQL, click DB System.
- On the MySQL DB Systems page, select your DB System.
- On the MySQL DB Systems Details page, select Add HeatWave Cluster from the More Actions drop-down menu.
- On the Add HeatWave Cluster dialog, click Select Shape to select a shape for your HeatWave nodes. The shape defines the number of CPU cores, the amount of RAM, and so on. Currently, only the MySQL.HeatWave.VM.Standard.E3 shape is supported for HeatWave nodes.
- Define the number of HeatWave nodes to create. The default number of nodes is 2, which is a sufficient number of nodes for the HeatWave
airportdb
Quickstart, assuming roughly 2GB of data. Optionally, you can perform a node count estimate to determine an appropriate number of nodes to create. For details, see Generating a Node Count Estimate. - Click Add HeatWave Cluster to create the HeatWave cluster.
Loading airportdb Data Into the HeatWave Cluster
This topic describes how to load airportdb
data into HeatWave using Auto Parallel Load.
RAPID
as the secondary engine, and executing table load operations. Auto Parallel Load facilitates this process by automating those steps and optimizing the number of parallel load threads for each table. For more information, see Auto Parallel Load.airportdb
data into HeatWave:Running airportdb Queries
This topic describes how to query airportdb
data in the HeatWave cluster.
airportdb
sample database tables are loaded into the HeatWave cluster, queries that qualify are automatically offloaded to the HeatWave cluster for accelerated processing. To run queries:For other airportdb
sample database queries that you can run, see Additional airportdb Queries. For more information about running queries, refer to the HeatWave User Guide.
Additional airportdb Queries
This topic provides additional airportdb
queries that you can run to test the HeatWave Cluster.
Query 1: Number of Tickets > $500.00, Grouped By Price
SELECT
booking.price,
count(*)
FROM
booking
WHERE
booking.price > 500
GROUP BY
booking.price
ORDER BY
booking.price
LIMIT
10;
Query 2: Average Age of Passengers By Country, Per Airline
SELECT
airline.airlinename,
AVG(datediff(departure,birthdate)/365.25) as avg_age,
count(*) as nb_people
FROM
booking, flight, airline, passengerdetails
WHERE
booking.flight_id=flight.flight_id AND
airline.airline_id=flight.airline_id AND
booking.passenger_id=passengerdetails.passenger_id AND
country IN ("SWITZERLAND", "FRANCE", "ITALY")
GROUP BY
airline.airlinename
ORDER BY
airline.airlinename, avg_age
LIMIT 10;
Query 3: Most Tickets Sales by Airline for Departures from US Airports
SELECT
airline.airlinename,
SUM(booking.price) as price_tickets,
count(*) as nb_tickets
FROM
booking, flight, airline, airport_geo
WHERE
booking.flight_id=flight.flight_id AND
airline.airline_id=flight.airline_id AND
flight.from=airport_geo.airport_id AND
airport_geo.country = "UNITED STATES"
GROUP BY
airline.airlinename
ORDER BY
nb_tickets desc, airline.airlinename
LIMIT 10;
No comments:
Post a Comment