I can teach you Snowflake analytics! I have never seen a database do analytics better than Snowflake. Last week we taught you Lead, and this week we are teaching you Lag. You use a Lead to place the value from the next row on the current line of the answer set. You can then see today’s value, and on the same line, see tomorrow’s value. You do the opposite for the Lag. You can see today’s value, and on the same line, see yesterday’s value.
The first thing I am going to do is show you a Lead. Then, the Lag will make more sense.
In each example, you will see an ORDER BY statement, but it will not come at the end of the query. The ORDER BY keywords is always within the Lead or Lag calculation. It is the ORDER BY statement that runs first, and once the system orders the data, the Lead or Lag will calculate. It is the initial ordering of the data set that gives these analytics the name “Ordered Analytics.” The other name is “Window Functions,” because they calculate within a certain window of rows.
Let me summarize the Lead in the picture below. Order the data first by the column Sale_Date and after the data sorts, then begin with row one and ask, “Can we get the Daily_Sales value from the next row, and add it to the current line?” If the answer is “Yes,” then do it, but if the answer is “No,” then put in a Null. Continue this logic through the entire data set.
The LEAD above allows you to see the Daily_Sales for today next to tomorrow’s value. Notice the example where it has LEAD (Daily_Sales, 1). The one represents the moving window. We want to get the value of Daily_Sales one row down. If there were a two instead of a one, we would want to get the Daily_Sales value two rows down. Sometimes you will see only LEAD (Daily_Sales), which defaults to Lead (Daily_Sales, 1).
Check out the next picture below. This query is the same as the query example above, but instead of LEAD, we are using LAG.
The LAG above allows you to see the Daily_Sales for today next to yesterday’s value. Notice the example where it has LAG (Daily_Sales, 1). The one represents the moving window. We want to get the value of Daily_Sales one row up. If there were a two instead of a one, we would want to get the Daily_Sales value two rows above. Sometimes you will see only LAG (Daily_Sales), which defaults to LAG (Daily_Sales, 1).
In the example below, we are using two LAG window functions. We alias the first LAG as Prev_Value because it is capturing the Daily_Sales value one row above. We give the second Lag an alias of Prev_2_Rows because it is capturing the value two rows above. On the current line of the answer set, we see today’s Daily_Sales, yesterday’s Daily_Sales, and the Daily_Sales from two-days previous.
Now, we are going to take the next example even farther by adding a PARTITION BY statement. The PARTITION BY resets the calculation and acts much like a GROUP BY statement. Notice that each Product_ID calculates within the Product_ID only. When a new Product_ID appears, the LAG calculation starts over.
Now, in our next example, we will use PARTITION BY again, but we will have three lag statements with the last one to lag three rows back. I am doing this for a reason. I want to show you how to use the QUALIFY statement in the final example. In the picture below, notice the null values. They appear because there are no rows above available. We have run out of past values to display.
In the picture above, notice the null values. They appear because there are no rows above available. The last row in the answer set for each Product_ID gives us the full picture. Watch how clever this report gets when you add the QUALIFY statement. Compare the picture above with the picture below. Notice that the QUALIFY statement removes the additional rows not needed. Qualify is an additional filter that works on the analytic totals after they calculate.
The QUALIFY statement is to Ordered Analytics, much like a HAVING statement is to aggregation. Only Teradata and Snowflake have the QUALIFY statement available.
Here is an easy way to pivot Snowflake data with the Nexus. Download a free trial at www.CoffingDW.com.
Step 1: Run a query against a Snowflake, Teradata, SQL Server, Oracle, DB2, Redshift, Azure SQL Data Warehouse, Greenplum, Hadoop, Postgres, SAP HANA, or MySQL table. Pivot the answer set in the main Nexus Screen by pressing on the Pivot button.
When you click on the Pivot button, you will then see a new screen. Take a look at the picture below, but all you have to do is, “Left-click on OK.”
You will then see a new screen showing the columns from your answer set, which refer to as fields, and a series of box areas with the name of Filters, Columns, Rows, and Values. Left-click and drag any field(s) to the areas.
The pivot report builds as you drag the fields to the areas. Note that you can have multiple fields in some of the areas (i.e., Columns area). Below is an example of the fields in their areas, and the final pivot report.
The final pivot report is complete. Check out the picture below. Here are the steps:
- I drag the field “REGION” to the COLUMNS area.
- I also drag the field “SALES_PERSON” to the COLUMNS area.
- I drag the field “PRODUCT” to the ROWS area.
- I drag the field “COST” to the VALUES area.
- I nominate myself for “Employee of the Year.”
The final report is displayed. Check out the picture below. I see my salespeople from the North Region and the South Region, and the sum of their sales for Boots, Jackets, Jeans, and T-Shirts. Their region groups the salespeople. I then see subtotals for both the North Region and the South Region. I now see two different sets of grand totals. On the bottom is the total, which is the sum of all sales per salesperson. On the far right is the total for each product.
You can save the report in a wide variety of formats. Here is an example of saving the report to Excel.
To return to Nexus, press the minimize button or the X in the top right corner.
I hope you enjoyed today’s Snowflake analytic lesson. See you next week.
CEO, Coffing Data Warehousing
Direct: 513 300-0341
YouTube channel: CoffingDW