Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders

March 31, 2026  |  Live

Idan Novogroder
Idan Novogroder Author

Idan has an extensive background in software and DevOps engineering....

Last updated on August 11, 2024

Snowflake has many advantages, but its security and scalability are arguably the leading magnets for data practitioners. More and more businesses are migrating their data to Snowflake from big data systems like Teradata and Hadoop. 

A single Snowflake account can include up to ten databases, each with thousands of Views, Tables, and Columns. To address business requirements, users from various departments will execute tasks and run queries. 

This implies that the right handling of the user-run queries and data is important. The relationships between several tables and views, a table’s most crucial columns, the most often accessed columns, and other information, are all key. 

The easiest way to accomplish this is via an enterprise data catalog for Snowflake. What is a Snowflake data catalog, and how do you implement it? Keep reading to find out.

What is a Data Catalog?

A data catalog is an organized record of data assets that makes use of metadata to serve as a metadata management tool in an organization. These data assets include unstructured data kept in emails, documents, mobile data, videos, audio and reports, and structured data kept in tables.

Data catalogs help users find and manage data held in e-commerce, ERP, HR, and finance, among other web platforms. They facilitate data-driven business insights and improved data source management for enterprises. 

The development of big data and the introduction of increasingly stringent data privacy laws in recent years have made it extremely difficult to inventory all dispersed data. In addition to adding data governance and asset management features to enhance data access and productivity, data catalog solutions help organizations break down data silos for more efficient data usage.

How Does Snowflake Data Catalog Work?

To help you understand the value of the Snowflake data catalog, here are a few examples of questions it will help you answer: 

  • Is my organization using this kind of data?
  • What is the relationship between the tables and views?
  • When was the last update to the data?
  • Which columns in a table are most crucial?

Essential Components of Snowflake Data Catalogs

Technical, commercial, operational, social, and custom data would all be in the perfect Snowflake Data Catalog. Additionally, the data catalog has the following components:

Native Connectors

The catalog should fetch metadata from the Account Usage Views or the Information Schema and interface with Snowflake natively. It should take minutes, not months, to complete the setup.

Keyword Search for Data Discovery

The catalog should be the only reliable source for all your assets. Furthermore, data asset searches must be simple and intuitive and include sophisticated filtering options and suggestions. Catalog users should be able to search across dashboards, glossaries, READMEs, metrics, etc.

Business Glossary

Every data asset should have a 360° context provided by a business lexicon integrated into the catalog. A business glossary functions as a knowledge network for your company, letting you establish and decipher connections between measurements, assets, and meanings.

Automated Column-Level Lineage

The catalog should provide column-level data lineage for all data, including non-Snowflake assets and Snowflake assets, to track data flow, transformations, and effects on downstream applications. 

Additionally, you may spread rules using the visual lineage map. For example, you can propagate a column description or a “Critical” tag from your dashboard to source tables upstream.

Embedded Collaboration

The most optimal data catalog for Snowflake seamlessly integrates with your regular procedures, facilitating the exchange of data and granting access to essential resources. 

It should let you use integrated collaboration to raise support requests, write notes for your coworkers, and quickly see information without ever leaving the catalog platform. 

Active Data Governance

The secret to successful data governance is a decentralized, community-led strategy. The data catalog should automatically classify, tag, and hide sensitive data assets. 

Data lineage mapping should also be supported for the auto-propagation of rules. Additionally, you may tailor your policies based on user roles, projects, and data domains with active data governance.

Benefits of Snowflake Data Catalog

Exploration and Discovery

The robust search features in the Snowflake Data Catalog help users locate data assets quickly and simply.

Minimal Costs of Data Integration

The processes and expenses associated with traditional ETL data input and transformation are essentially eliminated by having direct, regulated access to ready-to-query data.

Quicker Access to New Information

The Snowflake data catalog removes the trouble of copying state data and transferring it to Snowflake using Snowflake’s secure data-sharing technology. It makes Live, Shared, and Governed data sets easier to access, and users are also sent real-time changes to the data.

Monitoring Data Quality

Advanced quality control tools in the Snowflake catalog allow you to verify that the organization’s data is free of missing values, formatting problems, and duplicate entries. This helps boost the organization’s quality data.

Information Chain

Data journeys such as Data Origin, Transformations, and Destination are simple to trace using a Snowflake catalog. This aids in monitoring the modifications made to the data to enable impact and root cause investigation.

Top Snowflake Data Catalogs

1. Iceberg Polaris

Iceberg Polaris data catalog
Source: Iceberg Polaris data catalog

With complete enterprise security, Apache Iceberg is interoperable with Amazon Web Services (AWS), Confluent, Dremio, Google Cloud, Microsoft Azure, Salesforce, and more. Snowflake has introduced Iceberg Polaris Catalog to give businesses and the Iceberg community new levels of choice, flexibility, and control over their data. 

You can use Iceberg Polaris to interoperate several engines on a single copy of data from one location, saving you the trouble of transferring and duplicating data for various engines and catalogs.

The open-source data catalog solution can be hosted on your preferred infrastructure or managed infrastructure from Snowflake.

2. Dataedo

Dataedo data catalog
Source: Dataedo data catalog

Dataedo is a tool for managing metadata and the catalog on-premises. It helps you classify, describe, and understand your Snowflake data through ERDs (Entity Relationship Diagrams), a business glossary, and data dictionaries. Dataedo can easily define each data element by reading your data schema. 

3. Alation Data Catalog

Alation Data Catalog
Source: Alation data catalog

With data intelligence capabilities like Data Governance, Data Search & Discovery, Digital Transformation, and Analytics, Alation is a great Snowflake data catalog. 

It also features open APIs, a powerful behavioral analysis engine, and built-in collaborative features. It tackles data management problems by fusing human understanding with machine learning capabilities. 

4. Lumada Data Catalog

Lumada Data Catalog
Source: Lumada data catalog

This is another Snowflake data catalog that automates data discovery, classification, and management via AI, machine learning, and proprietary fingerprinting technology. Lumada makes data easier to obtain and improves teamwork, ultimately letting the entire organization better use its data.

5. Tree Schema

Tree Schema
Source: Tree Schema

Along with tagging your assets, assigning technical owners and data stewards to your datasets, data lineage, rich-text documentation, and more, the Tree Schema data catalog has all the capabilities you need for a complete data catalog. 

You can point to this catalog and fill it to the brim in less than five minutes. The Tree Schema data catalog supports S3, DynamoDB, Kafka, and other contemporary data sources.

6. Atlan

Atlan data catalog
Source: Atlan data catalog

Atlan is a cloud-native data catalog with a user-friendly design suitable for a wide range of users, including engineers, data stewards, and business users. 

Atlan automates stewardship tasks including autonomous data profiling, glossary tagging, and data quality warnings using an ecosystem of bots and machine learning. Teams of various sizes can utilize Atlan since it has an Open API design and a pay-as-you-go pricing structure.

7. Stemma

Stemma data catalog
Source: Stemma (acquired by Teradata)

Amundsen completely maintains and drives this Snowflake Data Catalog. Stemma helps customers trust their data by bridging the gap between data providers and consumers. Enterprise management and enhanced metadata are also provided. 

Thanks to Stemma, organizations can locate reliable data with ease. It provides users with an up-to-date snapshot of their data utilization by automatically documenting trends of data usage.

Snowflake Data Catalog Tools

There are several data catalog tools on the market for the Snowflake Data Cloud. We’ll examine them from the perspective of metadata, namely active versus passive metadata management. However, you can also group them according to their features, architecture, and other factors.

Active vs Passive Catalog Tools

Passive Catalogs

Technical metadata, such as schema, data types, models, owner name, and so forth, make up the majority of passive metadata. Similar to “expensive shelfware,” passive data catalogs gather metadata from several tools and store it in a different tool, creating yet another silo.

Active Catalogs

Active metadata discloses all changes made to a data asset. In addition to technical metadata, it comprises operational, commercial, and social descriptive metadata. 

Active catalogs send enriched metadata back into each tool in the data stack, enabling two-way metadata movement. By using the tools that are already part of your everyday processes, you can quickly get context and avoid having to jump between solutions.

How to Evaluate a Data Catalog Tool for Snowflake

Creating an assessment criteria framework that maps your needs helps you find the right tool for the job. 

Native interaction with your Snowflake Data Cloud must be a main requirement.

The next steps are to watch demos of the solutions you’ve narrowed down to a small list and test your best use cases using proofs of concept (POCs).

As you carry out the POCs, pay attention to the following:

  • The architecture of the tool
  • The configuration process
  • The tool’s capacity to extract information from other technologies in your contemporary data stack, as well as the Snowflake Data Cloud
  • Ease of use for both technical and business users

Steps to Set Up Data Catalog for Snowflake

Step 1: Create a Database Role in Snowflake

Namespaces, roles, and permissions are implemented differently in each database and data warehouse. Configuring a catalog will differ depending on how it’s implemented. 

The two access control models in Snowflake—DAC and RBAC—cooperate to give your data a very adaptable and detailed control framework. 

Only roles, not users, may be assigned permissions in Snowflake. In a Snowflake warehouse, you may create a role with OPERATE and USAGE rights as follows:

CREATE OR REPLACE ROLE data_catalog_role;
GRANT OPERATE, USAGE ON WAREHOUSE "<warehouse-name>" TO ROLE data_catalog_role;

The role can examine any queries that are now running or have previously run on your warehouse if you permit it to OPERATE. It also enables a warehouse to be started, stopped, suspended, and resumed by the role. Be sure to check if the permissions you have been given are not unduly liberal for the catalog.

Step 2: Create a Database User

After creating the database role, create a database user and associate it with the database user. There are three ways to establish a database user: public key, password, or Single Sign-on (SSO).

Method 1: With a Password
CREATE USER data_catalog_user PASSWORD='<password>' DEFAULT_ROLE=data_catalog_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
Method 2: With a Public Key
CREATE USER data_catalog_user RSA_PUBLIC_KEY='<rsa_public_key>' DEFAULT_ROLE=data_catalog_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
Method 3: With SSO

There are two methods to authenticate on Snowflake using SSO: native SSO from your identity provider (available exclusively for Okta) or browser-based SSO.

Step 3: Identify the Snowflake Databases, Schemas, and Objects That You Want to Crawl

Data transformation, cleaning, and transfer are sometimes quite dirty processes. In the process of doing internal experiments and testing queries, you may wind up building a large number of playground databases and schemas.

Although it’s ideal for everything to happen in a development environment, that isn’t always the case. So, you need to decide which schemas and databases to get data assets from.

Explore your data assets using commands like SHOW DATABASES, SHOW SCHEMAS, SHOW OBJECTS, SHOW TABLES, and so on prior to establishing the initial connection. One approach is to directly query the INFORMATION_SCHEMA.

Database and object names frequently carry a prefix that denotes the maturity of the data they contain or a step in the data pipeline. These signs might help you decide which assets to retrieve in the end.

To enable the data catalog to begin retrieving metadata from your Snowflake warehouse, you must provide read access to your data_catalog_role in the next step. Therefore, carefully review the list of databases, schemas, and objects to be crawled.

Step 4: Assign Relevant Permissions to the Role

You can grant the data_catalog_role permissions in one of two methods:

  1. Catalog data assets and view structural metadata
  2. Catalog data assets, view structural metadata, preview data, run queries against data assets

The options available depend on your catalog’s capabilities.

This is where Snowflake provides database metadata in addition to the INFORMATION_SCHEMA schema, including two additional schemas: ACCOUNT_USAGE and READER_ACCOUNT_USAGE. Whichever option your catalog offers, you can select it.

Permit Access to Crawling Metadata 

To enable the catalog to get metadata, you must provide extra access to DATABASES, SCHEMATA, TABLES, VIEWS, COLUMNS, and PIPES using the ACCOUNT_USAGE schema. 

Moreover, you may take advantage of the default ACCOUNT_USAGE roles Snowflake offers:

  • OBJECT_VIEWER
  • USAGE_VIEWER
  • GOVERNANCE_VIEWER
  • SECURITY_VIEWER  

Similarly, you can give the READER_USAGE_VIEWER role to your data_catalog_role to leverage the READER_ACCOUNT_USAGE schema for retrieving query history, login history, etc.
Since the most popular and comprehensive technique for crawling metadata is to use INFORMATION_SCHEMA, let’s take a closer look at it. The following provide statements that can be used to provide crawling access for current data assets to the data_catalog_role using the INFORMATION_SCHEMA:

GRANT USAGE ON DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT USAGE ON ALL SCHEMAS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL EXTERNAL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL MATERIALIZED VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON ALL STREAMS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT MONITOR ON PIPE "<pipe_name>" TO ROLE data_catalog_role;

The data_catalog_role rights are granted by these GRANT statements to get the metadata for every existing data item. You must include the FUTURE keyword in the GRANT statements in order to be able to bring the metadata for any future data assets as well, as seen below:

GRANT USAGE ON FUTURE SCHEMAS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE EXTERNAL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE MATERIALIZED VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON FUTURE STREAMS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT MONITOR ON FUTURE PIPES IN DATABASE "<database_name>" TO ROLE data_catalog_role;

You can display all the data assets for the database if you stop here, but you won’t be able to query or see the data in your data catalog.

Permission to View and Query the Data

Thanks to the integrated development environment (IDE) offered by many Snowflake data catalogs, you can explore the data and even work on it by writing queries, storing them, and sharing them with your team. 

With a small modification, you can allow users to preview and query data from your data catalog by executing the identical commands you used in the previous phase. You will need to GRANT SELECT on TABLES, EXTERNAL TABLES, VIEWS, and MATERIALIZED VIEWS instead of GRANT REFERENCES
You must perform an identical task twice: once for the data assets now available and once for those not. SELECT on TABLES, both current and prospective, can be granted as follows:

GRANT SELECT ON ALL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON FUTURE TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;

You should now be able to access the data catalog and examine and query your Snowflake data.

Additional Privileges

Depending on the data catalog’s capabilities, you can apply a wide range of additional grants and privileges to the data_catalog_role

You might need to offer improved permissions, like IMPORTED rights to the SNOWFLAKE database to the data_catalog_role, if your data catalog, for instance, includes data lineage capabilities.

Step 5: Assign the Role to the Database User

After giving the role the necessary rights, you must use the following GRANT statement to assign the role to the data_catalog_role:

GRANT ROLE data_catalog_role TO USER data_catalog_user;

At this point, your data catalog should allow you to create a connection to your Snowflake account.

Step 6: Configure the Snowflake Connector and Start Crawling Data

After running the GRANT commands, you may check to see whether you can retrieve data from your data catalog. You need to give the Snowflake connection the database credentials to configure it in your data catalog.

Before moving on to the next stage and starting to crawl the data from your Snowflake database, most connectors allow you to test the connection. You should investigate whether any networking or security settings were overlooked if you cannot establish a connection to your Snowflake warehouse. To assess your network connectivity, use the SnowCD (Snowflake Connectivity Diagnostic) tool.

After resolving connectivity problems, you may begin data crawling. Most data catalogs provide you with three distinct options to execute the crawler:

Execution Option What it Does
Ad-hoc crawl Manual crawl using the data catalog console or with a CLI command
Scheduled crawl For example, using a cron expression
Events-based crawl Crawl that is started by an event that the data catalog is able to listen to

You may locate more data sources and integrate them with your data catalog once you’ve added your Snowflake data to it.

Conclusion

Data teams are bound to find it easier to locate and understand the data saved in Snowflake with the help of a Snowflake Data Catalog. Since it looks for discrepancies in the data, such as data duplication, missing numbers, and more, the catalog is also useful for assessing data quality. 

When searching for a Snowflake data catalog, you’ll find several alternatives. The feature you’re looking for will determine your ideal choice.

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy