0% found this document useful (0 votes)
216 views

Snowflake Certification

The document discusses a hands-on lab guide for loading data into Snowflake and performing analytics. It covers loading structured and unstructured data, creating databases and tables, running queries, data sharing, and access controls. The guide walks through multiple steps to demonstrate Snowflake's capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
216 views

Snowflake Certification

The document discusses a hands-on lab guide for loading data into Snowflake and performing analytics. It covers loading structured and unstructured data, creating databases and tables, running queries, data sharing, and access controls. The guide walks through multiple steps to demonstrate Snowflake's capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

Important links

18 July 2021 01:21

In case if you have partner credentials.

SESSIONS
Data Cloud Tour for Financial Services
Data Cloud Tour for Telecommunications
Data Cloud Tour for Manufacturing
Data Cloud Tour for Retail & CPG
Data Cloud Tour for Public Sector
Data Cloud Tour for Media & Entertainment
Data Cloud Tour - Universal Session

--------------------------------------

Snowflake Certification Page 1


HANDS-ON LABGUIDE FOR INPERSON ZERO-TO-SNOWFLAKE
20 July 2021 01:42

• The “story” of this lab is based on the analytics team at Citi Bike, a real, citywide bikeshare system in New York City, USA.This team wants to be able to run analytics on data to
better understand their riders and how to serve them best. We will first load structured .csv data from rider transactions into Snowflake. This comes from Citi Bike internal
transactional systems. Then later we will load open-source, semi-structured JSON weather data into Snowflake to see if there is any correlation between the number of bike rides
and weather

• Create a database and table


• Create an external stage
• Create a file format for the data

Getting Data into Snowflake

There are many ways to get data into Snowflake from many locations including
• the COPY command,
• Snowpipe auto-ingestion
• an external connector,
• A third-party ETL/ELT product.

More information on getting data into Snowflake,see https://docs.snowflake.net/manuals/user-guide-data-load.html

We are using the COPY command and S3 storage for this module in a manual process so you can see and learn from the steps invo lved. In the real-world, a customer would likely use an
automated process or ETL product to make the data loading process fully automated and much easier.

Data

The data we will be using is bike share data provided by Citi Bike NYC. The data has been exported and pre-staged for you in an Amazon AWS S3 bucket in the US-EAST region. The data
consists of information about trip times, locations, user type, gender, age of riders, etc. On AWS S3, the data represents 61 .5M rows, 377 objects, and 1.9GB total size compressed.

DDL operations are free! Note that all the DDL operations we have done so far do NOT require compute resources, so we can cre ate all our objects for free.

Step 1 - Create a table


Step 2 - Create an external Stage
Step 3 - Create an external File Format
Step 4 - Loading Data
4.1 - Resize and Use a Warehouse for Data Loading
○ The “Size” drop-down is where the size of the warehouse is selected. For larger data loadingoperations or more compute-intensive queries, a larger warehouse will be
needed. The t-shirtsizes translate to underlying compute nodes, either AWS EC2 or Azure Virtual Machines. Thelarger the t-shirt size, the more compute resources from the
cloud provider are allocated to thatwarehouse. As an example, the 4-XL option allocates 128 nodes. Also, this sizing can bechanged up or down on the fly with a simple click.
○ If you have Snowflake Edition or greater you will see the Maximum Clusters section. This iswhere you can set up a single warehouse to be multi-cluster up to 10 clusters. As
an example,if the 4-XL warehouse we just mentioned was assigned a maximum cluster size of 10, it couldscale up to be 1280 (128 * 10) AWS EC2 or Azure VM machines
modes powering thatwarehouse...and it can do this in seconds! Multi-cluster is ideal for concurrency scenarios,such as many business analysts simultaneously running
different queries using the samewarehouse. In this scenario, the various queries can be allocated across the multiple clustersto ensure they run fast.
○ The final sections allow you to automatically suspend the warehouse so it suspends (stops)itself when not in use and no credits are consumed. There is also an option to
automaticallyresume (start) a suspended warehouse so when a new workload is assigned to it, it willautomatically start back up. This functionality enables Snowflake’s fair
“pay as you use”compute pricing model which enables customers to minimize their data warehouse costs.

4.2. - Load the Data


○ Now we can run a COPY command to load the data into the TRIPS table we created earlier.

4.3 - Create a new warehouse for Data Analytics


○ Going back to the lab story, let’s assume the Citi Bike team wants to ensure no resource contentionbetween their data loading/ETL workloads and the analytical end users
using BI tools to querySnowflake. As mentioned earlier, Snowflake can easily do this by assigning different,appropriately-sized warehouses to different workloads. Since Citi
Bike already has a warehouse fordata loading, let’s create a new warehouse for the end users running analytics. We will then use thiswarehouse to perform analytics in the
next module.

Step 5 - Analytical Queries, Results Cache, Cloning

5.1. - Snowflake has a result cache that holds the results of every query executed in the past24 hours. These are available across warehouses, so query results returned to one use
rare available to any other user on the system who executes the same query, provided the underlying data has not changed. Notonly do these repeated queries return extremely
fast, but they also use no compute credits.

○ Run queries to see which day is busiest, etc.

5.2 - Snowflake allows you to create clones, also known as “zero-copy clones” of tables, schemas, anddatabases in seconds. A snapshot of data present in the source object is taken
when the clone iscreated, and is made available to the cloned object. The cloned object is writable, and is independentof theclone source. That is, changes made to either the
source object or the clone object are not partof the other.A popular use case for zero-copy cloning is to clone a production environment for use byDevelopment & Testing to do
testing and experimentation on without (1) adversely impacting theproduction environment and (2) eliminating the need to set up and manage two separateenvironments for
production and Development & Testing.

5.3 - A massive benefit is that the underlying data is not copied; just themetadata/pointers to the underlying data change. Hence “zero-copy” andstorage requirements are not
doubled when data is cloned. Most datawarehouses cannot do this; for Snowflake it is easy!

Step 6 - Working With Semi-Structured Data, Views, JOIN

6.1. Snowflake can easily load and query semi-structured data, such as JSON,Parquet, or Avro, without transformation. This is important because anincreasing amount of business-
relevant data being generated today issemi-structured, and many traditional data warehouses cannot easily load andquery this sort of data.

○ Snowflake’s VARIANT data type allowsSnowflake to ingest semi-structureddata without having to pre-define the schema.

6.2 - Create an external stage


Via the Worksheet create a stage from where the unstructured data is stored on AWSS3.

6.3 - Loading and Verifying the Unstructured Data

Snowflake Certification Page 2


6.3 - Loading and Verifying the Unstructured Data

6.4 - Create a View and Query Semi-Structured Data

A View allows the result of a query to be accessed as if it were a table. Viewscan help you: present data to end users in a cleaner manner (like in this lab wewill present “ugly”
JSON in a columnar format), limit what end users can view ina source table for privacy/security reasons, or write more modular SQL.There are also Materialized Views in
which SQL results are stored, almost asthough the results were a table. This allows faster access, but requires storagespace. Materialized Views require Snowflake Enterprise
Edition or higher

6.5 - Use a Join Operation to Correlate Against Data Sets

Step 7 - Using Time Travel

Snowflake’s Time Travel capability enables accessing historical data at any point within apre-configurable period of time. The default period of time is 24 hours and with Snowflake
Enterprise Edition it can be up to 90 days.

Some useful applications of this include:


○ Restoring data-related objects (tables, schemas, and databases) that may have been accidentally or intentionally deleted
○ Duplicating and backing up data from key points in the past
○ Analysing data usage/manipulation over specified periods of time

7.1 Drop and Undrop a Table

Step 8 - Roles Based Access Controls and Account Admin

In this module we will show some aspects of Snowflake roles based access control (RBAC), includingcreating a new role and gra nting it specific permissions. We will also cover the
ACCOUNTADMIN(aka Account Administrator) ro

Step 9 - Data Sharing

• Snowflake enables account-to-account sharing of data through shares, which are created by dataproviders and “imported” by data consumers, either through their own Snowflake
account or aprovisioned Snowflake Reader account. The consumer could be an external entity/partner, or adifferent internal business unit which is required to have its own, unique
Snowflake account.

• With Data Sharing –


○ There is only one copy of data, which lives in the data provider’s account
○ Shared data is always live, real-time and immediately available to consumers
○ Providers can establish revocable, fine-grained access grants to shares
○ Data sharing is simple and secure, especially compared to the “old” way of sharing data which was often manual and involved transferring large .csv across the Internet in a
manner that might be insecure

Note - Data Sharing currently only supported between accounts in the same SnowflakeProvider and Region

Snowflake Certification Page 3


SnowPro Certification Overview
18 July 2021 01:19

• Snowflakes 1st level industry certification


• Target audience
• Exam format
○ 100 multiple select and multiple choice
○ Passing score 80%
○ 2 hour limit
○ Closed book
• Subject areas
○ Architecture
○ Virtual warehouses

• Resources
○ SnowPro Core Certificaitons FAQ's
Sample test

Snowflake Certification Page 4


○ Sample test
○ Study guides
○ Certification badge information
○ Community.snowflake.com/s/

Snowflake Certification Page 5


1.1 Architecture Overview
19 July 2021 19:26

Snowflake Certification Page 6


• Live Data sharing - Public Data Marketplace. Own secure private data exchange, only with people they need.

• You have perform data sharing, data marts.

Snowflake Certification Page 7


• Snowflake is designed as an OLAP system.
• Snowflake not designed as OLTP system. Its not a good case as an OLTP system.
• Logical Datamarts

Snowflake Certification Page 8


Snowflake Certification Page 9
• DataScience - Xtra Large (XL)
○ Resize happens on the fly. Queries keep running and new queries runs on the new database.
○ Suspend them once done.
○ Auto shutdown with auto suspend problem.
• Marketing - They might have lots of queries running
○ We will solve the concurrency problem with multi cluster warehouse.
○ As the query start to queue, it can start the compute and more concurrent queries running. How can you manage them. They are done running those compute workloads.
• Share your data
○ Publicly with eco system
○ Data Exchange features. Without copying the data.
• Cloning

Snowflake Certification Page 10


• Cloning
○ A clone in dataset, allows you to create a cloning of the database.
○ Quickly run your test and development lifecycle.
○ 1 TB in size. Clone of it, with Dev and Test use of it.
○ Let's review of Snowflake edition.

Snowflake Certification Page 11


1.2 Structure
19 July 2021 21:11

Snowflake Objects

• All snowflake objects resides within logical containers, with the top-level container being the snowflake account
• All objects are individually securable.
• Users performs operations on objects " priviledges" that are granted to roles.
• Sample Privileges
○ Create a virtual Warehouse
○ List Tables in Schema
○ Insert data into a table
○ Select data from table.
• Role base access

• Additional name will be Organizations --> Multiple Accounts.

Overview of Access controls

https://docs.snowflake.com/en/user-guide/security-access-control-overview.html#securable-objects

Table Types

• Permanent
○ Persist until dropped
○ Designed for data that requires the highest level of data protection and recovery
○ Default table type
○ Time travel - Up to 90 days with Enterprise
○ Fail Safe - Yes

• Temporary
○ Persist and tied to a session (think single user)
○ Used for transitory data (for example, ETL/ELT)
○ Time Travel - 0 or 1 days
○ Fail Safe - No
Single login session. Accessible only from a session.

Snowflake Certification Page 12


○ Single login session. Accessible only from a session.

• Transient
○ Persist until dropped
○ Multiple User
○ Used for data that needs to persist, but does not need the same level of data retention as a permanent table
○ Time Travel - 0 or 1 days
○ Fail Safe - no
○ Applicable to database, schema and table

• External
○ Persist until removed
○ Snowflake over an external datalake
○ Data accessed via an external stage
○ Read-only
○ Time Trave - No
○ Fail Safe - No

• Fail Safe
○ Is a safety net backup features. Admins to recover and restore data after for 7 days after the time travel is expired
○ Fail safe is only possible contacting Snowflake Technical support.
○ Only Permanent table supports it
○ You are not paying for fail safe backup storage if your table is not permanent.
• Time Travels
○ Its allows you to get the data historically.
○ Customer, admin or user can define what is the time travel retention period for a table, schema or for an entire database.
○ In enterprise edition and above, you can have 90 days of time travel.
○ In Standard edition you can only have 1 day of time travel.
○ From a user perspective
▪ You can query the historical data up to that period of time.
▪ You may run query against that data and tell what that data look like 1 day, 2 day or n days ago.

Views

Snowflake Certification Page 13


• Materialized view is one of the 4 serverless services within Snowflake, that uses as background compute , which is charged back to the customer on a per
second basis.
• So doesn’t require a customer managed virtual warehouse. Snowflake maintains the virtual warehouse in the background and pays for the time.
• A secure materilazed view does incur compute and storage services, stored in the result set.
• Any secure view will take some more time to execute because its time performant.
• Only Role which created the view , can see the DDL's or Roles with ownership. This is for secure view.

Snowflake Certification Page 14


1.3 Cloud services layer
07 November 2022 04:07

Management

Snowflake Certification Page 15


Management
• Centralized Management for all storage
• Manage the compute that works with storage
• Transparent, online updates and patches.

Optimizer services
• SQL Optimizer - Cost based optimizer
• Automatic join order optimization
○ No user input or tuning required.
• Automatic statistics gathering.
• Pruning using metadata about micro-partitions

Security
• Authentication
• Access control for Roles and users
• Access control for shares
• Encryption and Key management.

Metadata Management
• All statistics and metadata are managed and kept automatically up-to-date by the cloud services layer.
• Stores metadata as data is loaded into the system.
• Handles queries that can be processed completely from metadata.
• Used for time travel and cloning
• Every aspect of Snowflake architecture leverages metadata.

Snowflake Certification Page 16


1.4 Data Sharing
07 November 2022 04:22

• Data providers and Data consumers have both snowflake accounts.


• Secure direct data sharing - foundational data sharing feature.
• Public data marketplace and private data exchange , own their private data exchange for internal / external sharing.

• BI / Reporting can directly connect to shared data sets.


• A data share can be one to many. There is no limitation on how many consumer data can be shared with.
• Inbound / Outbound shares.

Snowflake Certification Page 17


• A reader account - A snowflake customer can share the data with non-snowflake customer.
○ A snowflake customer creates a reader account within its snowflake usage and shares it with other customers.
• Snowflake does not charge you to do a datashare.
○ Provider can monetize it and charge the consumer for it.
○ But snowflake do not charge the data for it.
○ Data consumers will use the data based on their datashare.
• For a reader account,
○ The snowflake customer will provide the compute / virtual warehouses to the reader to consume the data.
○ Provider can montetize it, charging them for data sharing.

What you can share?


• tables,
• Secure views
• Secure UDF

Snowflake Certification Page 18


• Consumer cant reshare the share
• Cant use the clone the table, schema, tables.
• No fees for data sharing.
• In any edition you can do a datasharing.

• How do we do data sharing with row level security?


○ This is possible by making it work at a sub defined level.

Snowflake Certification Page 19


1.5 User Defined Functions
09 November 2022 03:33

Snowflake Certification Page 20


Snowflake Certification Page 21
Stored Procedures
16 December 2022 00:38

• Stored procedures are Java script only.

• Currently store procedure is possible in SQL , Python, Java and Javascript and Scala too.

Snowflake Certification Page 22


Snowflake Certification Page 23
Re-Strucutring
23 October 2023 00:47

SNOWPRO® CORE CERTIFICATION OVERVIEW

• Data Loading and Transformation in Snowflake.


• Virtual Warehouse Performance and Concurrency
• DDL and DML Queries
• Using Semi-Structured and Unstructured Data
• Cloning and Time Travel
• Data Sharing
• Snowflake Account Structure and Management

• Exam Version: COF-C02


• Total Number of Questions: 100
• Question Types: Multiple Select, Multiple Choice
• Time Limit: 115 minutes
• Languages: English & Japanese
• Registration fee: $175 USD
• Passing Score: 750 + Scaled Scoring from 0 - 1000
• Unscored Content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form
and do not impact your score, and additional time is factored into account for this content.
• Prerequisites: No prerequisites
• Delivery Options:
• Online Proctoring
• Onsite Testing Centers

Snowflake Certification Page 24


Important to Remember
10 January 2024 01:51

• What ever you put in double quotes will be kept as is.


• For e.g.
○ "HOME" - Will be treated capital
○ "home" - Will be treated small
• Whereas if you do not use double quotes, then everything will be treated upper case.

• The file formats that can be used to load data are CSV, XML, JSON, AVRO, ORC, and PARQUET.

Snowflake Certification Page 25


Udemy - Course Content
12 November 2023 01:01

Snowflake Certification Page 26


Snowflake Certification Page 27
Udemy - Snowsight - Interface
14 November 2023 02:10

• Databases
○ Schemas
▪ Database Objects
□ Tables

Snowflake Certification Page 28


Udemy - Snowflake Architecture
14 November 2023 02:20

7. Self-Managed, cloud , dataplatform.


○ Self managed
▪ No need to install any hardware or software.
▪ No Software is needed.
□ Its like Gmail service.
▪ No Maintenance
○ Completely Cloud Native.
▪ Designed for cloud.
□ Built for cloud
▪ Runs in the cloud
□ All components are completely in the cloud.
▪ Cloud Optimized
□ Storage and compute scale independently.
○ Unifying Data platform
▪ On single platform around data.
▪ Data Warehouse
▪ DataLake
▪ Data Science

8. Multi Cluster shared Disk

Architectures
○ Shared Disk
○ Shared nothing.
○ MultiCluster - Shared Data Architcture
▪ Hybrid architecuture
□ Shared Architecutre - Common Central Storage
□ Shared Nothing - Uses MPP - Massive Parallel processing compute clusters
 Each node stores a portion of the data locally.

9. Three Distinct layer in Snowflake


○ Data base storage
▪ Decoupled from the compute.
▪ Hybrid Collumnr storage
□ The data is compressed into blobs
□ Those blobs will be stored in external cloud provider.
□ We don’t have access to the blob files.
□ Snowflake manages all aspects about storage.
□ Optimized for OLAP/Analytical purposes.
○ Query Processing layer - Virtual warehouses
▪ Muscle of the system
▪ Responsible to processquery
▪ Warehouse = MPP compute cluster.
▪ It provides CPU , Memory and temporary storage.
○ Cloud Services - Brain of the system.
▪ Brain of the system.
▪ Collection of services to coordinate and manage the components.
▪ Also run on compute instances of cloud providers.
▪ What this cloud layer service is doing?
□ Authentication
□ Access control
□ Metadata Management
□ Query parsing and optimization.
□ Infrastrucutre management.

10. Data Load exercise

11. Snowflake Editions

Standard Enterprise Business Critical Virtual Private


Introductory Level Additional features for the needs of large-scale Even higher data protection for organization Highest level of security
enterprises with extremely sensitive data
All foundtiaonal features are available Multi cluster warehouses • Sensitive data features • Need to reach out to snowflake to set it up. Not possible
directly using website
• There are dedicated servers
• Complete DWH • All Standard features • All enterprises features • All business crtiical features
• Automatic Data Encryption • Multi cluster warehouses • Additional security features such as customer- • Dedicated virtual servers and completely separate Snowflake
• Broad support for standard and special • Time travel upto 90 days managed encryption environment
data types • Materliazed views • Support for data specific regulation • Dedicated metadata store.
• Time travel upto 1 day • Search optimization • Support for HIPPA • Completely isolated environment. Different hardware.
• DR for 7 days beyond time travel • Column - level security • Database failover/failback (DR)
• Network Policies • 24-hour early access releases
• Secure data share
• Federated authentication & SSO
• Premier Support 24/7
Credit Cost - $2 / Credit ( US East / Ohio) $3 / Credit ( US East / Ohio) $4 / Credit ( US East / Ohio) Contact Snowflake ( US East / Ohio)

12. Pricing

• Compute and storage costs decoupled.


• Pay on what you need.
• Scalable and at affordable cloud prices.
• Pricing depending on the region/cloud provider.
• Data Transfer costs.

Compute Cost

• Active warehouses
○ Only paying for this.
Not paying for suspended.

Snowflake Certification Page 29


○ Not paying for suspended.
○ Used for standard query processing.
• Cloud services
○ Behind the scenes
○ Managed by Snowflake.
○ Cost is very small.
○ Customer is not charged for this. Its only charged when if the consumption exceeds 10% of warehouse consumption.
○ Managed by Snowflake.
• Serverless
○ Automatic reclustring
○ Snowpipe
○ Serach optimization service
○ Managed by Snowflake
○ Automatically resized.

• Charged for active warehouses per hour.


• Billed by second (minimum of 1 min)
• Depending on the size of the warehouse
• Time / Active warehouses / Size.
• Charged in Snowflake Credits

Q1 - How does the consumption works?


• Amount of time a warehouse is running and the size of the warehouse.
• Virtual warehouse sizes
○ XS - 1
○ S-2
○ M-4
○ L-8
○ XL - 16
○ 4XL - 128
• The credit cost is also dependent on which Region and Provider , your account is based on.

Q2 - What does a credit cost include?

13. Pricing continuted

Data Storage Cost

• Monthly storage fees


• Based on average storage used per month.
• Cloud providers cost
○ $40/TB/Month
○ Region - US Ohio AWS.
• Cost calculated after compression
• Pay for what you use.
• Capacity storage
○ You can save some cost with capacity storage.
○ Pay upfront on defined capacity. This will be lower price.
• Scenario
○ Assume we need 1TB
○ 23 US dollars for Capacity storage.
○ We only needed 100GB . But we need to pay full dollars.

• Start with ondemand


• Once you are sure about the usage use capacity storage.

Data Transfer

• Data ingress is free


• Data egree is charged
• Based on different cloud providers.

14. Storage Monitoring

Individual table storage

Show tables General statistics about table storage, and table properties Show Tables;
Table_stroage_metrics view in Get more detailed information about amount of storage for active databases, time_travel Select * from
Information_Schema and fail_safe option DB_NAME.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;

Snowflake Certification Page 30


• Information_Schema and fail_safe option DB_NAME.INFORMATION_SCHEMA.TABLE_STORAGE_METRICS;
Table_Storage_Metrics view in Get more detailed information about amount of storage for active databases, time_travel Select * from snowflake.account_usage.table_storage_Metrics;
Account_Usage and fail_safe option

• You can see the detailed data stroage at the table level in the admin--> account_admin--> usage view.

15. Resource Monitors

• Definition - objects that we can use to control and monitor the credit usage of both warehouses and also our entire account. -- All editions.
• Set Credit Quota - Set in a defined cycle. A limit , that will reset every month.
• Can be set at account level or individual warehouses. On a group of warehouses as well.

• We have three type of actions and we can say at what percentage of the quota this action should happen?
• Notify - % of credit is used
• Suspended and notify - % of credit is used - In this case, it will complete Existsting queries and then only it will suspend.
• Suspend immediately and notify - % of credit is used - This will abort the current queries.
• We can use percentages that are about 100%.
• These can only be created with users that have account_admin role.
• The account_admin can delegate some tasks so we can just grant privileges of monitor and modify on specific resource monitors .
• They can also be used to track the usage of cloud services that are needed to execute certain queries related to a virtual wa rehouse.
○ So our resoruce monitor will suspend the virtual warehouse when the limit is reached but it cannot prevent entirely the usage of cloud services that is needed to run some basic steps for the
warehouse.

16. Hands on Resource Monitors

17. Warehouses and Multi-Clustering

• ** Multi-cluster warehouse is provided from enterprise edition.


• Snowpark Optimized - Recommended for memory intensive workload like ML training.
• For Snowpark Optimized warehouses - The credit consumption is 50% higher.
XS Not present
S Not Present
M 6

L 12
XL 24
6XL 2 *2 * 2 * 2 * 2 = 32 * 24 = 768

• Multi-Cluster Warehouses
○ This is very important when you have a lot of concurrent users.
○ So when we a huge load,
▪ The single warehouse might not be able to cater to all the queries.
▪ So there will be a queue of queries to be executed.
▪ Which will gets executed one by one.
▪ This is very bad for the users, as they need to wait.
▪ So when we have a large number of queries to be processed, we group multiple compute nodes/warehouses together into one multi-cluster warehouse.
▪ This is something where we can use some autoscaling.
▪ So snowflake will automatically detect when there is enough workload to add additional clusters into our warehouse.
○ This is especially good for more concurrent users.
○ This is not ideal solution when we have more complex workload.
○ For example, we have some ML use cases where the data size is use and the query is going to be complex in that case a bigger warehouse will make much more sense.
○ This means we will
▪ Scale up
▪ Rather than scaling out.
Scale up Move to a bigger warehouse

Scaling out Add multiple same size warehouses to the existsting compute/capacity
○ Two modes
1. Maximize - Always have same amount of cluster. No difference between minmum and maximum. This is static workloads. Same amount of users.
2. Autoscale - Specify different minimum and maximum. This is for dynamic workloads. Different amount of users.
How does this autoscale mode work? When is the point, when snowflake decide to add an additional clusters

Snowflake Certification Page 31


○ How does this autoscale mode work? When is the point, when snowflake decide to add an additional clusters
▪ Based on different policies
▪ Standard - Favors adding additional warehouses and having a better performance.
▪ Economy - Conserve the credits over starting additional warehouses.
○ Scaling policies

18. Setting up a multicluster warehouse.

• Query Acceleration - If you have an unexpected workloads , then they will be managed by query acceleration i.e. all compute managed by snowflake t hemselves. Basically when you enable query
acceleration you are telling snowflake that there could be unexpected worklaod so please provide some additional scalable com pute which you manage by yourself but be ready and provide that compute if
needed.
• Maximized
This mode is enabled by specifying the same value for both maximum and minimum number of clusters (note that the specified value must be larger than 1). In this mode, wh en the
warehouse is started, Snowflake starts all the clusters so that maximum resources are available while the warehouse is runnin g.
This mode is effective for statically controlling the available compute resources, particularly if you have large numbers of concurrent user sessions and/or queries and the numbers do not
fluctuate significantly.

19. Objects in Snowflake

• Hierarchy of snowflake is very important.


○ Oraganization - This can be done by Org Admin
▪ Account1
▪ Account 2
□ Mutliple account objects
□ Users
□ Roles
□ Databases
 Schemas
◊ UDF's
◊ Views
◊ Tables
◊ Stages
◊ Other objects (Pipes, Streams, etc.)
□ Warehouses
□ Other account objects(resoure monitors)

• Why multiple accounts


○ You can have mulitple accounts tagged to different cloud providers.
○ So you can have account 1 with Azure and Account 2 with AWS.
○ So an global organization might have multiple accounts based on
▪ Mutliple regions
▪ Different cloud providers.
▪ Different departments

20. SnowSQL

• Connect Snowflake using Command line.

Snowflake Certification Page 32


Udemy - Loading and Unloading
11 January 2024 18:26

21. Internal stages

• So a stage is a location in Snowflake where we store data.


• So whenever we want to load some data into a table, we need to have the files first available in a stage.
• So this is the location where we load the data from, but we can also use this stage to put some files into and this is called unloading.
• **Not to be confused with datawarehouse stages

Stages

• Internal Stage
○ Snowflake managed
○ Cloud provider storage
○ You cant manage it yourself.
• External Stages
○ AWS, GCP or Azure

Internal Stages

• Put command to move data from local to internal stage


○ Data by default is compressed.
○ Auto compress is set to true
○ Data will be encrypted autoamtically by 128-bit or 256-bit

• Copy into command to load data from Internal stage to Snowflake tables.
• We can use unloading as well
○ Create a file out of internal table and then move into srages.
• There are three different types
▪ User Stages
▪ Tied to a single user
▪ Cannot be accessed by other users
▪ Every user has default stage
▪ They cannot be altered or dropped
▪ Put files to that stage before loading.
▪ Explicitly remove files again.
▪ Load data to multiple tables
▪ Referred using @~ (At the rate and tilt sign)
▪ Table Stages
▪ Automatically created with a table
▪ Can only be accessed by the table.
▪ Canont be altered or dropped
▪ Load to one table.
▪ Remove the files once its loaded.
▪ Referred to with '@%TABLE_NAME'

▪ Internal Named stages


▪ Snowflake database objects
▪ Everyone with privileges can access it.
▪ Most flexible
▪ Referred to with '@Stage_name'
• Use cases
▪ Use stages
▪ You want to load some files from your local storage into multiple snowflake tables and you do not have access to the cloud pr ovider.
▪ And for this we could use the internal stages.

22. External stages


• This is very similar to named internal stage.
• A database object which we create ourselves.
• Everyone with privliedges can access it.
• Reference with '@stage_Name'

Snowflake Certification Page 33


• Reference with '@stage_Name'
• When we need to create it we need to the URL.
• We need to add some additional properties since usually those containers or buckets are not publicly available.
• We need to add more credentails.
• For example, some keys, but that is not the recommended method.
• The recommend method
▪ Is to store this credentials to a separate object called a storage integration object.
• This is more secure because like this, we don’t give access to the information for everyone that has access to this informati on for everyone that has access to this stage.
• We could reuse this independently for multiple stages.
• Then we can add the file formats to the stages.
▪ We can use this file_format as a separate object and then reuse it for multiple integrations.
• Commands
▪ List
▪ List all files
▪ List @Stage_Name;
▪ List @~
▪ List @%Table_Stage_Name
• Refereing stages
▪ Copy from stage
▪ Copy into stage
▪ Query from stage

23. Hands - on Stages

24. Copy Into (Bulk Loading)

Snowflake Certification Page 34


26. File_Format

• Whenever you create a stage, there are some default properties which get's set including the file_format being CSV as default .
• It is not recommended to do it in a way without specficying the file_format parameter. You should always give the file_format parameter.

• Define the file_format differently.


• Step 1 - Define the file format

• Step 2 - Define the Stage

• Step3 - Copy into (Loading data)

• Step 4 - You can overrule the stage by specifying the file_format again during the Copy into command

Snowflake Certification Page 35


• Step 5 - You can also specify a different file_format_object if needed.

27. File_Format Object Hands on

• ** if you don’t specify the skip_header while creating the file_format, it will set to 0 by default. Which means that it will read the file again
• ** We can change the file_Format but we cannot change the property type.
▪ A Csv type file_format is different then a json type file_format.

28. Insert and Update

• INSERT OVERWRITE - It will truncate the table first and then remove the rows.

29. Hands on Stroage integration

▪ This stores the access information to an external cloud storage, so for example, to an Azure container.

▪ We can use it in multiple setup.

30. Snowpipe

• This is continuous data loading.


• What is Snowpipe?
▪ A pipe is a object that is created in our snowflake account. And with that we can load data files immediately and automatically when they appear.
▪ Loads data immediately when a file appears in a storage.
▪ According to defined copy statement.
▪ If data needs to be available immediately for analysis.
▪ Snow pipe uses serverless features instead of warehouses.
▪ No user created warehouses are needed.
▪ Compute is managed by Snowflake.

Snowflake Certification Page 36


• ** The pipe definition should have the copy statement. The data will be loaded according to the copy statement.
• We can also have the stage defined in the copy statement, which will load the data based on a file format object and our stor age integration.

• There are two different ways on how snowpipe can be triggered.


▪ Cloud Messaging
▪ For e.g. An azure cloud event notification
▪ Works with external stages
▪ RestAPI
▪ Works with Internal as well as external stages.

Exam important features.

Snowflake Certification Page 37


• Serverless - No dedicated warehouse, compute is managed by Snowflake.
• Cost is calculated as Per-Second / Per-Core-Granularity
▪ Core Granularity is calculated based on
▪ Number of cores in a compute cluster and
▪ How long does this takes depends on a few factors, like how large the file is, how many files are available and so on.
▪ Overhead costs
• ** The file size can be between 100 MB to 250MB to avoid too many small files bcoz the amount of time needed to load the smal ler files is higher.

• **A pipe can be paused and resumed. This is often done to alter the ownership of the pipe.

31. Copy Options

• ** In general, the copy options will be mentioned in the copy command, but actually those are properties of our stage object.

• If we don’t specify any property_value, it will take the property_default. For e.g. ON_ERROR by default is ABORT_Statement.
• ** When create the stage, we can give few copy option.

Snowflake Certification Page 38


• However, generally, the copyoptions are defined within the copyinto command, and it will overwrite those stage copy options.

Copy options details

• ** Skip_File_<num> - Skip_file_10 : Skip files when number of errors are greater than 10.
• **Skip_file_<num>_% - This is based on percentage errors.
• ** Default value for bulk loading is a bold statement.
• ** What's the difference between snowpipe and Bulk_Loading?
▪ Skip_File is default for snowpipe whereas ABORT statement is default for Snowpipe.

• So after this limit is reached, no more files will be loaded.


• For e.g.,

Snowflake Certification Page 39


Snowflake Certification Page 40


32. Hands on - Copy Options

33. Validation Mode

Snowflake Certification Page 41


33. Validation Mode

• This will just validate the data in the copy command, instead of actually loading the data.
• ** Return_n_Rows - e.g. Return_5_Rows: Validates <n> rows (return error or rows)
▪ This would validate the first five rows and would actually return those five rows. If there are no errors, then all the rows will be returned.
▪ If there are errors, then only the first error will be returned.

34. Validate

• This function validates the files that have been loaded in the previous execution of the copy command.
• And then all the errors that have been encountered will be displayed.
• The default setting is ON_ERROR = ABORT_STATEMENT
▪ If the default setting is used, you wont get any output.

• Here , "_last" is a dynamic parameter used to identify the last immediate history.

35. Unloading

• The same copy command is used only that rather than copying into a table, we copy into a stage.
• This is basically copying from a table into a file which will be stored in a stage.
• Important parameters

Snowflake Certification Page 42


▪ Default - We have per default that the files will be split into multiple files based on the size of the file.
▪ This is because the option single = false setting is the default setting.

▪ Max_file_size is the parameter used to indicate the point at which the file will split.
▪ So for example if its 16MB, then every file will be 16MB long.
▪ So Default is 16MB. It can be changed it 5 GB.

▪ Only select can also be used to specify specific columns.

▪ There is a possibility to change the file_format.


▪ Default is csv.

Snowflake Certification Page 43


▪ A file prefix/suffix can be used to store the files in the specific formats.
▪ Default is data_
▪ ** By Default the files will be compressed.

Snowflake Certification Page 44


Udemy - Data Transformations
07 February 2024 23:13

37. Transformations & Functions

• Data Can be transformed when loading data.


○ Simplify ETL pipeline.
• There is possibility of using following while using the copy command.
○ Column reordering.
○ Cast data types
○ Remove Columns
○ Truncate (TruncateColumns)
• We can use Subset of SQL functions in the copy into command.
• There are certain things which are not supported as well.

• In general there are lot of functions available when we want to query data .
• There is a Scalar functions which means it returns one value per row.

• Then we have aggregate functions such as Max and Min i.e. one value per grouping.

Snowflake Certification Page 45


• Then we have Window Functions - They operate on subset of rows.
• Here the Partition By subcategory will pick up the max value for that particular subcategory.

• Then we have table functions - Typically we use this functions to obtain information about snowflake features.
• Table functions return a table, allowing them to be used in the FROM clause of a SELECT statement. These functions can be used for various purposes, including
generating result sets that can be queried like a regular table.
• Example: FLATTEN() function takes an array or a variant column and produces a lateral view (i.e., a virtual table) of the elements.

• Then we have system functions.


System functions provide information about the Snowflake environment, such as session information, user information, etc. They do not operate on data from tables but rather on the
system metadata and properties.

Example: CURRENT_VERSION() function returns the current version of Snowflake

Snowflake Certification Page 46


• We also have UDF and External Functions.

38. Estimating Functions

• Exact calculations on huge tables can be quite compute or memory intensive.


• But sometimes we are not interested in exact value but an approximation is good enough and therefore Snowflake has implemented some algorithims into functions that we can use to
get approximation in much shorter time.

• If I want to find out number of disticint rows and I am ok with 1.6% of error, I will use this function to get the numbers more quickly.

HLL Function 5.6 seconds



Count 12 s
• Table size is 150 million rows.

Snowflake Certification Page 47


• Estimation of Frequent Values
○ Space saving algorithm is used to estimate the most frequent values along with their frequency.
○ This is used to estimate the top_k_values.
○ Here k means that how many top values you want to fetch
▪ So if K=1 , then the most frequent value will be shown
▪ If k = 10, then the top 10 values will be shown.
○ Counters = Maximum number of distinct values that Snowflake tracs during the calculation.
○ Counters value should be always higher then the value of K.

• So if we look at the example.


• The normal function took 18minutes to execute wheras the approx_top_k took 2m59 seconds.
Counters - This is the maximum number of distinct values that can be tracked at a time during the estimation process. For example, if counters is set to 100000, then the algorithm tracks
100,000 distinct values, attempting to keep the 100,000 most frequent values.
• To understand counters better, here is an example.

Let's imagine a table where people vote for their favorite fruit. This table has many rows, each representing a vote, and let's say there are 10 distinct fruits people
voted for, but the table has thousands of votes in total.
If you set counters to 5 when using APPROX_TOP_K, the algorithm doesn't randomly pick 5 fruits to track. It starts with an empty list and adds fruits as it encounters
votes for them. If more than 5 distinct fruits are found, it tracks the ones with the highest counts so far, potentially replacing less frequent ones as it processes more
data. With only 5 counters for 10 fruits, it's trying to keep the most frequent ones in its limited "memory" of 5, but it might miss some of the actual top fruits due to this
limit.

If your table has 1000 rows with votes for 10 different fruits and you use APPROX_TOP_K with counters set to 5, the algorithm will scan all the rows. It attempts to track the
most frequently voted fruits within its capacity to track 5 distinct fruits at any time. Through this process, it aims to estimate the top 5 fruits based on the highest counts, even
though it's only keeping track of 5 fruits at a time during the scan.

• Percentile Values

○ T-digest algorithm is used to estimate the percentile values.

○ Here the Approx_percentile is 50th Percentile value of Total price.


○ SO if total price is say 300 pounds, it will give the 50th percentile of that to us.

Snowflake Certification Page 48


• Similarity of Two or more sets

○ Uses Minhash algorithm to estimate the similiarity.


○ Traditionally this was done using Jaccard similiarlity coefficient.

• The output will be a value between 0 and 1, wherein


○ 0 means they are very very different
○ 1 means that the two dataset will be identical.

39. UDF's

• Supported langagues - SQL, python, Java and Javascript.

• Using Python , need to call it out.

Snowflake Certification Page 49


• A function is a schema level object.

40. Stored Procedures

Snowflake Certification Page 50


• Default is owner.

41. External Functions

Snowflake Certification Page 51


• User defined functions that are stored and executed outside of snowflake.
• Code is executed outside of snowflake.
○ For e.g., there is an azure function which we call within our code but will be external and will be executed in the azure environment.
○ Lambda Functions
○ Azure Functions
○ Https servers

• ** Additional langagues can be used including GO and C#


• Accessing 3rd party libraries such as machine learning scoring libraries.
• Can be called from snowflake and from other software.

42. Secure UDF's & Procedures

• Secure UDF's & Procedures will make it secure meaning unwanted users wont be able to see the definition or access the data using optimizer.
• It can be enabled using the Secure Keyword.
• ** How is the security related information of an external function stored? - Via an API integration.

Snowflake Certification Page 52


43. Sequences

• *** If you run the select sequence.nextval.

44. Semi Structured Data


• *** very important from exam point of view.

• Supported Formats
○ JSON
○ XML
○ Parquet
○ ORC
○ AVRO
• Data Types in Snowflake to manage semi-structured data
○ Object
○ Array
○ Variant

Snowflake Certification Page 53


• Variant Data type is unique to Snowflake.


○ Can store values of any other data type including Array and Object
○ Standard data type suitable to store and query semi-structured data.
• Use Case scenarios
○ Either you can define the hierarchy of Arrays and Objects.
○ Let snowflake convert semi-structured data into Hierarchy of ARRAY, OBJECT, and Variant data stored into VARIANT
▪ For example, there is a json data we need to store.
▪ So we will create a JSON file format and Snowflake will automatically convert it hierarchy of arrays and objects and stores.
• Important cosniderations for Variant data type
○ It will store null values as Null strings.
○ Non-native strings like date will also be stored as Strings only.
○ *** Variant data type can only store 16MB per row.

45. Hands on.

46. Query Semi Structured data.

• So you need to call the nested structure within the sql with a colon.
• Query - Select row_column:courses from Variant_table
• Query - Select $1:courses from variant_table

Snowflake Certification Page 54


• In the boave image , We can also use to choose an element within the hierarchy

• Here the information will be fetched from the Array with the formats keyword.
• *** The array starts its positioning from 0 and then 1.

• If the respons is in double quotes, the values can be changed to a VARCHAR datatype by using two columns.

48. Flatten Hierarchical Data

• Within semi-structure data, you have hierarchies, and we need to flatten those hierarchies.
• So the flatten function is used to convert the semi-structured data into relational table view.
• It’s a table function.

Snowflake Certification Page 55


• *** Flatten function cannot be used in the copy command.


• It basically converts the multiple hierarchial data into a table view.

49. Insert JSON Data


• To inset JSON data into a variant datatype, we need to use a select within the insert as follows
Insert into semi_structured parse_json(' {"key1": "value1", "key2": "value2", "key3": "value3"} ');

50.Unstructured Data
•Snowflake supports URL's which has images, videos and that's how Snowflake access the files.
•The same URL's are needed to share files with other users.
•The URL's can be used for both internal as well as external stages.
•There are multiple URL's
○ Scope URL
▪ Encoded snowflake hosted URL with temporary access to file.
▪ We can access the file without granting access to the entire stage and this URL expresses weather the persisted query results period ends.
▪ The URL expires when the persisted query period ends.
▪ This means when the results cache expires and currently this is at 24 hours.
▪ We can call it by using sql file functions BUILD_SCOPE_FILE_URL
○ File URL
▪ Permanent access to the file that we have specified.
▪ So this is where we want to give permanent access to the file and it doesn’t expired.
▪ Called by using BUILD_STAGE_FILE_URL
○ Pre-Signed URL
▪ Https URL that can be directly used to access the file via a web browser with expiration time.
▪ It can be added as a parameter to the URL.
▪ Called by using BUILD_PRESIGNED_URL
• All the URL's can be accessed by doing sql file functions for the scoped URL.

• When each of the functions are called, it will give us build URL's which can help generate the necessary URL's.

• We can add an operational parameter in seconds after which the URL will get expired.

Snowflake Certification Page 56


51. Directory Tables
• This can also be used to get access to the file URL's in our stage.
• A directory table is just storing metadata about our staged files.
○ Its layered on a stage.
○ If we have sufficent priviledge we can set in our stage.
○ This needs to be enabled. By default, its disabled.
• *** Once stage is created and when you try to query it for the first time, it wont give any results as it first needs to get refreshed.
• To refresh the metadata, we need to manually refersh it for the 1st time by using the command.
○ Alter stage stage_azure refresh;

52. Data Sampling


• When you run the queries on small set of data, it’s the data sample.
• Use cases - Query development, data analysis, analytics, etc.
• Less compute resources.
• Methods in Snowflake
○ ROW or BERNOLLI method
▪ Write the normal select and then sample row (<p>) SEED
▪ P is the percentage of rows
▪ Select * from table sample row (10) SEED (15)
▪ The above select will give 10% of the entire random dataset.
▪ SEED will create reproducible results so that other people can also execute the same query to get exactly same results.
□ Like this we can take randomness out of the situation if needed.
○ Block or System Method
▪ Similar to Row , just need to add system in the query.
▪ It works on micro partitions.
○ How are they different?

○ ALTER SESSION SET USE_CACHED_RESULT = FALSE;


This means that Snowflake will execute all queries directly against the database, without using any previously cached results , ensuring that you get the most up-to-date data for
each query execution. This can be particularly useful in scenarios where data is frequently updated, and you need to ensure t hat your queries reflect the latest state of the data.

53. Tasks
• Used to schedule a SQL statement or Stored procedure.
• Often combined with streams to setup continuous ETL Workflows.

Snowflake Certification Page 57


• A DAG can also be created.


○ Directed Acyclic Graph

54. Streams
• A stream is an object that can be used to record data manipulation, language changes.
○ Its similar to goldengate.
○ Used in ETL to track the DML's.

Snowflake Certification Page 58


• The process is called the change data capture.
• It basically stores metadata.
• Stream can be consumed as well.
• So if there are records in the stream, we can use those records and populate it in other tables.
• Once stream is consumed, those records will be emptied.
• Three different types of streams
○ Standard
○ Append-only
▪ Only internal tables
○ Insert-only
▪ Only external table

• Staleness
○ Stream becomes stale when offset is outside the retention period of source table.
○ For example, if the retention period is set to 7 days, then data before 14 days (default)within a stream cant be used.
○ Maximum retention period is 90 days.

Snowflake Certification Page 59


Udemy - Additional Snowflake Tools & Connectors
15 February 2024 23:39

56. Connectors, Drivers & Partner Connect

• Snowsight - WebUI
• SnowSQL - Command line tool
• Drivers - PHP, JDBC, GO, .net, node.js, etc.
• Connectors - Python, Spark, Kafka

• Partner Connect

57. Snowflake Scripting


• Extension to Snowflake SQL with added support for procedural logic.

Snowflake Certification Page 60


• *** Difference between procedure and transaction is that procedure starts with Begin and End, whereas transaction start with Begin Transaction and End Transaction.
• Objects can be used outside of the block, variables can be used only inside the block.

58. Snowpark

Snowflake Certification Page 61


58. Snowpark

• Snowpark is a set of libraries and runtimes within Snowflake that enables you to securely deploy and process data using various programming languages like Python, Java, and Scala.

• No need to move data. Data can be queried from outside.


• The code gets pushdown to Snowflake to execute the necessary code.
• Benefits?

Snowflake Certification Page 62


Udemy - Continuous Data Protection
16 February 2024 00:17

59. Time Travel

• What is possible with Time Travel?


○ Query from data in the past.
○ Restore tables, schemas and databases that have been dropped.
○ Create clones of tables, schemas and databases from previous states.

• Time Travel SQL


○ Timestamp
▪ Provide the timestamp in the select query after AT keyword
○ Offset
▪ Provide the offset in the select query after AT keyword
▪ So if we want to give back by 10 minutes , we use minus 10 multiplied by 60
○ Before
▪ Provide the Statement with a query_id after select statement.

Snowflake Certification Page 63


○ Undrop
▪ Undrop table table_name
□ This will recover the table if its within the retention period.
▪ Undrop Schema schema_name
▪ Undro database database_name

○ Considerations

61. Hands-On Undrop

62. Retention Period

• Number of days for which historical data is preserved and Time travel can be applied.
• Its default is 1. This is set on the account level.

Snowflake Certification Page 64


• Its default is 1. This is set on the account level.
• Its configurable for table, schema, database and account.
• Can be set to 0. 0 will disable it.

• An important parameter is MIN_DATA_RETENTION_TIME_IN_DAYS = 2.


○ This parameter sets the minimum data retention at the account level for all the objects.
○ If the default value is change to 0, still the min_data_retention will be configured as 2.

Snowflake_Edition Retention period


Standard 0 to 1 day
• Enterprise 90 days
Business Critical 90 days
Virtual Private 90 days

64. Fail Safe

Snowflake Certification Page 65


• If there is any data which is not possible to recover from time travel we can go for fail safe scenarios.
○ Fail safe is always 7 days and its non configurable.
○ For permanenet tables its 7 days.
• Fail safe is 7 days beyond the time travel.
• We need to reach out to snowflake for restoration purpose.

• This contributes to the storage cost.

65. Hands on - Storage cost

• There is no cost for using time travel or fail safe, but there is some cost involved for storage.
• So if out of 1000 rows only 10 rows got changed, only those rows will take be charged for and not the rest of them.
• So only the amount of modified data will be the time travel storage.

Snowflake Certification Page 66


• So only the amount of modified data will be the time travel storage.
• Time travel data is also part of database usage stats.

66. Table Types

• Permanent
• Transient
○ They are not fail safe.
• Temporary tables
○ 0 or 1 day of retention period.
○ Not supporting fail safe.

77. Multi factor authentication

Snowflake Certification Page 67


• By default enabled at the account level, but a user needs to enrol


• Securityadmin can disable the MFA for a user.

78. Federated authenticaion / or SSO

• User to login with SSO


• Federated environment
○ Service provider
○ External identity provider - Maintainging the credentails and are responsible for authentication.

Snowflake Certification Page 68


• SSO - Login Workflow

Snowflake Certification Page 69


79. Key Pair Authentication

80. Column Level security

• *** Dynamic data masking is something where the data is masked at run time, and not at the database level.

Snowflake Certification Page 70


• The create masking policy statement reveals


○ That first we define the policy
○ It’s a schema level object we can create.
○ (val varchar) is the original column value which will be returned.
○ The case statement says
▪ That for a specific set of single role or multiple roles,
□ The original column value will be visibile.
▪ For rest of all
□ The value won't be visible.
▪ And then the policy can be applied to multiple columns.
□ You can set it
□ And you can unset it as well.

Snowflake Certification Page 71


• The other option is external tokenization

82. Row - Level Security

• Its an enterprise edition feature.

• The row level policy is always defined boolean as it will always return a true or a false.

84. Network Policies.

Snowflake Certification Page 72


84. Network Policies.

• Security Admin role is needed to create the network policies.

85. Data Encryption

• All data is encrypted at rest and in transit.


• Fully managed by Snowflake.

Snowflake Certification Page 73


• Tri-Secret Secure - Allows customers to use their own keys.
○ Business critical edition
○ This feature can be enabled by reaching out to snowflake support.

Snowflake Certification Page 74


86. Account usage and information schema

• Object metadata
• Historical Usage data

Account Usage

• Information_schema - Its available to all the users.

Snowflake Certification Page 75


• Account_Usage -
○ Two types of views
1. Object Metadata
2. Account usage views
□ Copy_history View
○ Data is not real time . Views have latency is 2 hours.
○ Data retention is upto 365 days.

87. Release Process

Snowflake Certification Page 76


Snowflake Certification Page 77


Udemy - Performance Concepts
17 February 2024 00:56

88. Query Profile and History

• Query profile is availabel for all queries.

Snowflake Certification Page 78


• What is Data Spilling?


○ For a given SQL operation, if the data is not able to fit in the warehouse memory, that data gets spilled into the local stor age.

Snowflake Certification Page 79


89. Caching

• Result Cache cannot be used if the underlying data has changed.

Snowflake Certification Page 80


91. Micro Partitions

• Micro partiions is the way data is stored at the storage layer which is with external cloud providers.
• Each Micro Partition contains around 50 to 500 MB of uncompressed data.
• Because of Micro partitions , it allows partition pruning
○ Partition pruning means that when a query is executed all the unnecessary partitions are skipped.

Snowflake Certification Page 81


• The micro partitions are immutable.
○ These means that they cannot be changed once the new idata is arriving.
○ Every new data gets added to a new micro paratition.

92. Understanding Clustering Keys

Snowflake Certification Page 82


• Clustering table on a specific column redistributes the data in micro-partitions.
• This improves partition pruning.
• Some of the partitions can be ignored at the query runtime.

• When a table is well-clustered?

Snowflake Certification Page 83


○ If you have a lower average depth, you have a more well clustered table.
○ And also, if you have fewer overlapping micro partitions, it also means you have a more well clustered table.
○ Every partition will have a constant state, which means that it cannot be further improved.

93. Defining Clustering keys

Snowflake Certification Page 84


• Automatic reclustering is a serverless feature.


• Clustering is not a good choice for every table.

• How to determine which columns and which tables can benefit from clustering keys?

Snowflake Certification Page 85


94. System functions for Clustering.

• There are two functions which are used to give clustering depth
○ System$clustering_information
○ System$clustering_depth

96. Search Optimization Service

• Serverless feature.
• Only availabel in Optmization service
• Alter table table_name add optimization service.
• Ownership privledges.

Snowflake Certification Page 86


97. Materialized Views (Enterprise Edition)

Snowflake Certification Page 87


• Resource monitor cant control snowflake managed warehouses.

99. Warehouse Consdierations

Snowflake Certification Page 88


Udemy - Zero Copy cloning & Sharing
16 February 2024 12:30

67. Zero Copy cloning

• It will always be metadata operation.


• So the clone will just still reference the micro partitions in so-called blobs in the cloud provider.
• So there is not additional storage cost when the clone is created.

• *** Database clone will not inherit the privileges but all of the child objects do. So the child objects privileges will be inherited but not for the database itself.
• *** This is same for the schema as well. Schema privledges are not inherited but all the objects within it are inherited.

Snowflake Certification Page 89


• *** Load history metadata is not copied.
• Cloning from specific time is also possible.

69. Data Sharing

• Data can be shared without actually making the copy of the data.

• Available for all of the editions.

Snowflake Certification Page 90


• The consumer can only read only database.
• You can be the provider as well as consumer across different accounts, customers or within the same account or schema or consumer.

• *** Best practice is to use secure views to make sure that confidential data is not shared mistakenly.

• *** A normal view cant be shared. So always need to create a secureable view to avoid any accidental data share which you might do.
• When you need to share , the account needs to be in the same region and same cloud provider.
○ In case if its in different region, than replication needs to be enabled between different regions or different cloud providers.

72. Database Replication

Snowflake Certification Page 91


72. Database Replication

• Feature available for all editions.

• The difference here is that the data is actually extracted and copied over to other account.
• Data needs to be periodically synchronized.

Snowflake Certification Page 92


Udemy - Account & Security
16 February 2024 20:55

73. Access control in Snowflake

• 4 key concepts
○ Securable object - Access can be granted
○ Privilege - Granted to role
○ Roles - Roles to other roles or users.
○ User - Identiy will login to account.

Snowflake Certification Page 93


74. Roles

• *** A privilege will always be granted to a role and then to the user.

Snowflake Certification Page 94


• Types of Roles.
1. Orgadmin

Snowflake Certification Page 95


1. Orgadmin

2. Account Admin

3. Security Admin

• Can manage any object grant globally.

Snowflake Certification Page 96


4. Sysadmin

5. Security Admin

Snowflake Certification Page 97


6. User Admin

7. Public

8. Custom Roles

Snowflake Certification Page 98


75. Privileges

• Granular level of access.

Snowflake Certification Page 99


Snowflake Certification Page 100


Snowflake Certification Page 101


Snowflake - Architect - Important Links
20 February 2024 11:56

• https://www.linkedin.com/pulse/how-crack-snowpro-advanced-architect-exam-ruchi-soni/

Snowflake Certification Page 102

You might also like