Trifacta – People. Transforming. Data

What’s New in Designer Cloud 9.7

2023-01-05T23:32:23Z

What’s New in 9.7

We’re excited to share our latest capabilities from the and 9.7 release. As always, there’s a wide range of new features to discuss:

Restart Plans from Failed Tasks

In 9.7, if a plan fails, users are now able to restart the plan from the first point of failure, running only the failed task and all downstream tasks after the failed task. This means users will no longer need to rerun an entire plan in the event of failure, saving resources and allowing for faster execution.

Toggle Histogram Option

Users can now toggle the Histograms and Data Quality Bar in the Transformer view, allowing for faster performance when conducting quick updates – especially when working with larger data samples. The column histograms can be enabled or disabled from the Transformer page. To activate this feature, it must first be enabled in workspace settings (Experimental Settings > Enable/Disable Data Grid from view options).

Schema Drift Detection Updates:

In 9.7, existing column drift detection capabilities have been extended to support the detection of new, removed, or moved columns in delimited source files (such as CSV, TSV, pipe delimited, etc.). With schema drift detection enabled, users are always notified about schema changes. Jobs can be stopped automatically if any changes in schema are detected, and the respective configuration is enabled.

Users Can Now Cancel Plan Runs

Users with viewer permission on plans can now cancel plan runs from the Run Details page. This allows users to cancel plans if needed without needing to contact an admin.

Support for Ubuntu 20.04

With 9.7, will have support for Ubuntu 20.04 (upgraded from Ubuntu 18.04).

Google Cloud Dataprep Specific Features:

In 9.7, we have additional updates specific to Google Cloud Dataprep:

Support for Complex Data Types When Publishing to BigQuery: Users can now publish Objects and arrays of Objects to BigQuery Structs. Previously, these were published as String values.
In-VPC Support for Conversion Service for Private Data Planes: 9.7 adds support to allow Conversions to be run within Containers within Google Private Data Planes.
Default Settings for Dataflow Execution: We’ve added the ability to create default settings at the workspace level for Dataflow Execution, allowing admins to skip the step of manually assigning settings to each new user. With this update, admins can also lock the ability to override default settings, giving them full control of all execution settings if needed.

New connector with 9.7 Release

We continue our journey to help you connect to any data source, enabling additional use cases. With our 9.7 release, we support the following new connector:

AlloyDB (Read and Write)

For more information, see Early Preview Connection Types. You can learn all about our connectivity updates here.

If you haven’t done it already, it’s a great time to sign up for a free trial with Designer Cloud. Join us today on your journey to the cloud.

Easily Send Messages Through Microsoft Teams Using Plans’ HTTP Task

2022-12-28T19:20:30Z

Plans offer a variety of features that allow our users to orchestrate data pipelines. One of the key elements within Plans is the ability to connect with external messaging applications so that you can automatically communicate information to other platforms whenever you’d like.

Alteryx Designer Cloud has a native integration with Slack, making it easy for you to deliver messages to accessible Slack channels. But have you ever wondered how you can do this with other platforms? We’ll walk you through it! After reading this article, you will be able to use our built-in HTTP Task to send messages through Microsoft Teams as well.

The Challenge: Let’s imagine you’ve been using Alteryx Designer Cloud to send summaries of your data to your sales department via Slack for a while now. You have recently switched to a different role, and your new team communicates via Microsoft Teams. You now need to send similar Plan updates to specific Teams channels.

Your first flow takes in data and transforms it into a clean and usable format. It removes duplicate values, calculates averages, identifies max and min points in the data, etc. If the flow runs successfully, it will update our general dashboard with these new values. If it fails, you want it to inform the Teams channel of data scientists of all the needed information from that Plan run so that they can understand and fix the problem.

The Solution: To use the HTTP Task to connect with Teams, you will need to first obtain the correct endpoint URL. Using this guide, you can see that this is the correct URL format:

POST https://graph.microsoft.com/v1.0/teams/{team-id}/channels/{channel-id}/messages

From here, you can obtain your Team and Channel ID in 2 different ways. The first would be to go directly through the Microsoft Teams app, as explained here. The other option is to use Microsoft’s Graph Explorer – Where you will input the following links (using “GET”) and copy the correct ID from the Response Preview.

Team ID: https://graph.microsoft.com/v1.0/me/joinedTeams
Channel ID: https://graph.microsoft.com/v1.0/teams/{team-id}/channels

[Note: you need permission to access information from specific chats or channels – if you run into any permission errors, it means you don’t have adequate permissions for the Teams channel you’re using]

Your next step is to add headers, as explained in this guide. You can copy your access token from the Graph Explorer and input it as one of the header values: Bearer {access token}

Finally, you must add a body – this is the message that will be sent to your team. Fortunately, you can easily use metadata from the Plans to send them specific information from the run that can help them identify the cause of the problem.

Now that your configuration is saved, hit ‘Test’ in the Response tab and see the response. This ensures that your configuration is successful.

That’s it – Mission accomplished! You’ve now learned how to successfully integrate your Plans with Microsoft Teams using Alteryx Designer Cloud’s HTTP Task! Now, you can automatically have error reports sent to your team instead of having to manually send them this information.

Read more about Plans here: Plans – The Command Center for Orchestrating Data Pipelines

Read more about our HTTP Task here: Create HTTP Task

What’s New in Designer Cloud 9.6

2022-12-14T17:52:58Z

What’s New in 9.6

We’re excited to share our latest capabilities from the Designer Cloud and Google Cloud Dataprep 9.6 release. As always, there’s a wide range of new features to discuss:

New Support for Snowflake on Azure

For customers that have hosted Designer Cloud on AWS, it is now possible to read and write data from Snowflake on Azure using JDBC connectivity (including pushdown support). This gives additional options to our customers who operate in a multi-cloud environment. This update requires Designer Cloud to be hosted on AWS (there is not a fully-hosted SaaS offering on Azure at this time).

Flow Editors Can Now Edit Custom SQL Datasets

Editors of flows can now be granted edit access to modify SQL queries and custom SQL datasets as needed (previously, editors could only be granted view access). This will allow more flexibility in allowing users to tweak SQL as needed without the need to go back to the original owner of a dataset. To access this feature, editors can right click a SQL dataset in a flow and click “Edit custom SQL”.

Data Previews Can Now Be Toggled On/Off

Users now have the option to toggle the data-grid preview on and off when building a recipe, allowing users to make updates without waiting for the data grid to refresh. Toggling the data-grid preview off can help improve performance and reduce the loading time needed to reload the grid when generating a preview. This can be particularly useful when applying the same changes to multiple different recipes.

Job History Page Now Defaults to “Run by me”

The job history page now defaults to “Run by me” rather than “All jobs”. This will allow users to more easily access the flows that are the most relevant to them and improve the performance of the job history page.

Google Cloud Dataprep Specific Releases:

In 9.6, we have an additional updates specific to Google Cloud Dataprep:

Full BigQuery Pushdown Support for Merge Operations: MERGE operations in Dataprep can now be executed using BigQuery pushdown processing (previously, MERGE operations were only supported via Dataflow). This will work for both file-to-table and table-to-table scenarios, allowing for faster loading times and reduced costs when performing merges and bringing in new data records from applications.

New connectors with 9.6 Release

We continue our journey to help you connect to any data source, enabling additional use cases. With our 9.6 release, we support the following new early-preview/read-only connectors:

Microsoft Dataverse
Adobe Analytics
Google Contacts
Workday (Added support for OAuth connection)

For more information, see Early Preview Connection Types. You can learn all about our connectivity updates here.

If you haven’t done it already, it’s a great time to sign up for a free trial with Trifacta. Join us today on our journey to the cloud.

The 4 Factors that Separate “Good” from “Great”

2022-12-28T19:27:06Z

Originally appeared on Alteryx Community.

—

Do you love your job? For most Trifacta Community members, the answer is a resounding “Yes!!” Getting to work with data to answer your business’s most important questions is no doubt an exciting occupation.

But … could your “good” job become “great?” Could your “great” job become “amazing??” Let’s explore.

What’s slowing you down?

As a data professional, you likely navigate many challenges throughout your day. For example, perhaps you have to wrangle dozens of different data sources and outputs. Did you know that, according to the International Data Corporation (IDC), the average analytical process involves 6 inputs and 7 outputs? That’s a lot to data sources and data destinations keep track of.

Actual footage of a data analyst at work.

Plus, the IDC finds analysts typically use anywhere from 4 to 7 different tools to get their analytic work done. How many different tools and technologies do you use every day??

In some cases, it’s not just the data or technology that might prove challenging to manage—but also people. For some analytics projects, you may have had to petition repeatedly for help from hard-to-reach (and slow-to-respond!) experts.

All this trouble to get from data to insight can be exhausting, with IDC reporting that data professionals worldwide spend a full 44% of their workday on unsuccessful data activities. In fact, a brand-new IDC report shows a whopping 93% of organizations are not fully using the analytics skills of their employees.

So, if you’re tired at the end of your workday, know you’re not alone!

Making positive change

You may be asking yourself: “how you can I solve these kinds of issues?” Naturally, you’ll find a million-and-one ideas in the pages of this very Community, from the Academy to the Discussion Forums!

However, some issues can’t be fully solved with a clever workflow. The truth is company policies and organizational design may be the root cause of some of your woes. After all, data silos, insufficient privileges, and lack of necessary support for analytic work are often the result of business-level decisions.

Organizational concerns may sound like a murky area for an analyst to explore. After all, you’re a champion of measurable, quantitative data! Happily, these process and situational issues can be broken down into clear, measurable dimensions. So, they don’t have to remain a mysterious, subjective area any longer.

Let’s see what we can discover here.

Enter the “Analytics Maturity Model”

The International Institute of Analytics (IIA) has been studying the analytics practices of organizations across the globe for more than a decade. As a result of benchmarking hundreds of businesses, the IIA has created a model that measures how well (or poorly) organizations leverage analytics to drive insights and make decisions. They call it (unsurprisingly) the “Analytics Maturity Model.” It measures businesses along 4 different dimensions:

Data Maturity: Data is the raw ingredient, the foundational element for your analytics strategy. Do the right teams have the right access to the right quality of data?
Organizational Dynamics: Effective organizations have a clear analytics strategy. How is success defined? What resources, processes and structures have been set up to execute the strategy?
Analytic Team Dynamics: For analytics success, analytic teams must find the balance between control and freedom. Has analytics leadership identified and prioritized data-driven business opportunities? How well have they orchestrated their teams into action?
Usage and Technology: The set of tools, techniques, architectures, methods, and practices in use. How well do they connect analytics professionals to the rest of the organization? How well do they help the business realize its analytics strategy?

What Analytics Maturity does for you

Knowing your organization’s levels of Analytics Maturity across these 4 dimensions may sound abstract but having a clear scorecard of how things are going today lays the foundation for future improvements. In particular, knowing your company’s current maturity scores as described above helps identify where your company is doing the right things, and where they should work to improve their analytics processes.

It’s not that kind of assessment, we promise!

The good news is that this isn’t like taking a test, where your company “passes” or “fails.” It’s about measuring your organization’s progress over time. And every business has room to grow—according to the IIA, the average organization today has an analytics maturity score of just 2.2 out of 5.

How does your business measure up?

Now you know what an analytics maturity assessment can reveal about your business. Are you ready to find out how your company measures up? Good news: the IIA’s analytics maturity assessment is available free for you today.

More good news: it takes less than 15 minutes to complete. And it’s just multiple-choice questions—no essays required! You can complete a maturity assessment yourself, or you can wow your leadership by sharing it with them as well. Or do both!

With the Analytics Maturity Assessment, you can create the ultimate win-win scenario. You can help improve your company, and your company can help you—perhaps by taking down data silos, streamlining onerous processes, getting you the tools and technologies to meet your needs—the sky’s the limit.

Take a look at potential systemic challenges that could be making your job needlessly difficult. If you do, you help steer your organization down a path that takes your work life from “good” to “great,” and even to “amazing.”

What’s New in Designer Cloud Powered by Trifacta 9.5

2022-12-14T17:52:39Z

What’s New in 9.5

We’re excited to share our latest capabilities from the Designer Cloud powered by Trifacta and Google Cloud Dataprep 9.5 release. As always, there’s a wide range of new features to discuss:

Introducing Designer Cloud powered by Trifacta!

For AWS and Azure users, Designer Cloud powered by Trifacta 9.5 officially introduces a new product name and an updated product look. For more information about this change, be sure to read our full blog post on the rebrand.

Snowflake Enhancements

Our 9.5 update brings several enhancements to Designer Cloud’s Snowflake integration:

Snowflake Pushdown Support for Sampling: For AWS users, pushdown processing can now be used when creating samples from data stored in Snowflake tables. This makes the data preparation process even more seamless by greatly decreasing the time needed to create a data sample based on full scans of a dataset. With 9.5, this pushdown support is available for all sampling techniques other than Clustering and Stratified samples.
Snowflake Pushdown Support for S3 Data Sources: Users working with S3 data files can now process their workflows up to 12x faster by leveraging Snowflake pushdown. Previously, Snowflake pushdown processing was only available when working with data sourced from Snowflake tables. With 9.5, Snowflake pushdown processing can now be used with data sourced from S3 buckets that is being written to Snowflake tables. This allows for ELT from S3 to Snowflake, and can greatly increase the run time efficiency of workflows for those using S3 data sources.
Upsert Support for Snowflake Publishing: Upserts can now be used when publishing to Snowflake, allowing users to add individual rows without processing and replacing entire tables.
Snowflake JDBC Connector (Private Preview): We’ve implemented a common JDBC framework so that users can connect to Snowflake wherever it is located (either in AWS, Azure, or another cloud deployment). This allows users to ingest data from Snowflake on any cloud, as well as making it possible to write data to Snowflake on Azure (using pushdown processing!). This feature is still in private preview. Reach out to your Customer Success Manager or Account Executive to get access.

Individual Asset Transfer Between Users

Previously, in cases such as a user leaving a company, admins could bulk transfer all assets from one user to another via API.

In 9.5, the ability to transfer asset ownership has been expanded from admins to include individual users. It’s also now possible to transfer the ownership of individual assets between users (as opposed to the bulk transfer of all assets) – including all 1st class objects such as flows, connections, imported datasets, plans, macros, and UDFs. This allows for helpful use cases, such as transferring developed assets to a centralized operations account for scheduling.

We’ve also made asset transfer accessible via dropdowns in the UI, making it possible for non admins to transfer assets without having to write code in an API. To top it off, we’ve added a table to record the transfer history of assets, giving admins a record of who has owned which assets over time.

Edit Recipes With Datagrid Disabled

In 9.5, users now have the option to launch the transformer page (recipe view) with the datagrid disabled. This allows users to edit recipe steps without waiting for data samples to load, allowing for faster edits when moving in and out of individual recipes. This is particularly useful when working with large datasets/recipes, or in cases where users find themselves in environments that have poor internet connectivity, causing slower sample loading times.

New Flow Parameter Type – Selector

In 9.5, we’ve added a new flow parameter type – Selector. The Selector Flow Parameter allows users to define a parameter based on an enumerated list of values with a single selection option. This can be used to define an override key, where an overridden value applies to all references of the parameter within a flow.

Flow Import and API Connection Mapping

When importing flows to a new environment or workspace, users can now specify connections, allowing for more plug-and-play usability. Using this feature, users can simply change their connections, and datasets will be replaced properly without any additional steps.

Refresh Excel, PDF, and Google Sheets Files

In 9.5, dataset refresh has been further expanded to include Excel files, PDFs, and Google Sheets. This builds on existing dataset refresh support for relational, delimited, schematized, and JSON files. When the underlying schema for a supported dataset changes, dataset refresh allows users to upload fresh data and refresh their datasets without the need to create a new dataset object and replace it in the flow. Dataset refreshes can be used to address schema changes, or to add or remove columns of data from a dataset. This makes your datasets more durable, reusable objects and helps to avoid workspace clutter and versioning issues.

Speeding Up the Job History Page

To speed up performance on the job history page, admins now have the option to change the default number of days displayed (180 days / 120 days / 60 days). Changing this default can reduce page rendering time by up to 20%, providing even faster performance.

Enable OAuth for Sharepoint

For enhanced security, users can now leverage OAuth 2.0 connectivity to access Sharepoint lists.

Google Cloud Dataprep Specific Releases:

In 9.5, we have some additional updates specific to Google Cloud Dataprep:

Enable Sort Transform: Dataprep users can now sort dataset samples in the transformer grid. Samples can be sorted based on columns in ascending or descending order or based on the order of rows when the dataset was created. This was an existing feature in Designer Cloud powered by Trifacta on AWS that has been brought to Google Cloud Dataprep.
Enable Service Accounts for In-VPC Batch Jobs and BigQuery Execution: For Dataprep users, Service accounts can now be used to execute transformation jobs within your VPC and within BigQuery. This enhanced security measure removes calls to the Trifacta VPC for credentials and reduces timeouts on longer-running jobs.

New connectors with 9.5 Release

We continue our journey to help you connect to any data source, enabling additional use cases. With our 9.5 release, we support the following new early-preview/read-only connectors:

Workday
Google Calendar
QuickBooks

For more information, see Early Preview Connection Types. You can learn all about our connectivity updates here.

If you haven’t done it already, it’s a great time to sign up for a free trial with Trifacta. Join us today on our journey to the cloud.

Announcing Designer Cloud Powered by Trifacta

2022-12-13T06:00:17Z

Welcome to Designer Cloud

Notice anything different about Trifacta? If you haven’t already heard, we’re pleased to share that Trifacta is now Designer Cloud!

Designer Cloud 9.5 introduces a new product name and an updated product look. These small changes provide a glimpse into an exciting future for Trifacta that will bring a variety of new, powerful features you’re sure to love.

So, what is Designer Cloud?

In February 2022, Trifacta was acquired by Alteryx. Since then, we’ve been hard at work behind the scenes building something truly extraordinary for our customers.

Trifacta has long been well-known for providing the world’s most advanced self-service cloud data engineering platform. Our cloud-first focus has allowed us to build an infrastructure that combines infinite scalability with strong data governance and security.

At the same time, Alteryx has become a household name in the data community. Well known for its best-in-class Analytics Automation Platform, Alteryx has been making data professionals’ lives easier since 1997.

Our vision is to bring together Trifacta’s cloud-first, enterprise-grade capabilities with Alteryx’s best-in-class workflow and canvas in an enhanced, world-class experience we’re calling Designer Cloud powered by Trifacta.

Today marks a major step along this journey.

In Designer Cloud powered by Trifacta 9.5, you will see a new logo as well as a new product look.

Coming soon, Trifacta users will gain access to an additional interface that brings together elements from the Alteryx Analytics Automation platform to complement the Trifacta experience. Soon, Trifacta users will be able to take advantage of this suite of new, additional capabilities.

When it arrives, this new combined offering will allow users with Designer expertise the opportunity to leverage many of Designer’s capabilities when building data workflows in the cloud. The combined cloud platform will serve the needs of entire enterprises, from data analytics teams and IT/technology teams to line of business users.

Rest assured, the Trifacta application isn’t going away. As these new features are added, the Trifacta platform will continue to be developed, and users will have the option to continue using Trifacta as they currently do today.

An End-to-End Portfolio: The Alteryx Analytics Cloud

With these changes, Designer Cloud powered by Trifacta is joining the Alteryx Analytics Cloud – a suite of cloud analytics tools that will allow users to accelerate their analytics journey like never before.

The Alteryx Analytics Cloud includes powerful tools such as Alteryx Machine Learning and Alteryx Auto Insights. As Designer Cloud powered by Trifacta’s development is continued, users will also see emerging integrations across this suite of products, making the Alteryx Analytics Cloud a truly unified, end-to-end analytics platform.

Ready to get started? Jump into Designer Cloud and take a look around!

What’s New in Designer Cloud 9.3/9.4

2022-12-14T03:33:35Z

What’s New in Designer Cloud 9.3/9.4

We’re excited to share our latest capabilities as part of the Designer Cloud 9.3/9.4 releases. As always, there’s a wide range of new features to discuss:

Designer Cloud?

To reflect our ongoing product development as part of the Alteryx Analytics Cloud, Trifacta is now Designer Cloud. This name change will be reflected in future versions of the product, along with some exciting new features. The name of our product has changed, but it’s still the same great Trifacta experience – and more!

What’s new in Designer Cloud 9.4:

JavaScript UDFs Can Now Be Executed With BigQuery Pushdown

In Designer Cloud 9.0, we announced the ability for users to create custom transformations using Javascript User-Defined Functions (UDFs). In 9.4, we’re happy to announce that Javascript UDFs are now generally available, with an update. For Dataprep users, Javascript UDFs can now be executed with BigQuery Pushdown, allowing for faster and more efficient data transformations across your entire Dataprep job.

Access Plans from Flow Output Panel

In 9.4, it’s easier than ever to identify and access plans that are triggering jobs. If a job run is triggered from a plan, a link to the plan will now appear next to the job in the flow output panel, allowing users to easily navigate to the plan. This link will take users to the job’s task node within the plan, making it easier to review your jobs in context.

Faster Access to Data Quality Information After Job Runs

It’s now easier to identify data quality issues when you run your jobs. If a job successfully completes, but some Data Quality Rules do not pass, or there are column data mismatches, users will now be notified directly in the body of the Job Run notification email, without the need to open any attachments to see the pertinent information.

Job and Plan Emails Now Send by Default

We don’t want our users to miss a single important update, so email notifications will now be turned on by default when users run a new Flow or Plan. This will automatically notify users by email on job success or job failure. Existing workspace-level overrides and flow-level overrides have been preserved. Users can change the default at any time by navigating to their workspace settings.

Users Can Now Refresh JSON Files

In Designer Cloud 9.0, we added the ability for users to refresh relational, delimited, and schematized files. In 9.4, we’ve expanded dataset refresh to include JSON files. When the underlying schema for a JSON dataset changes, users can now upload fresh data and refresh their datasets without the need to create a new dataset object and replace it in the flow. Dataset refreshes can be used to address schema changes, or to add or remove columns of data from a dataset. In future releases, Dataset refresh will be expanded to support Excel and PDF files.

What’s new in Designer Cloud 9.3:

9.3:

Improved Transformer Loading Experience

Users should now notice faster performance when entering the recipe mode/transformer grid. Rather than waiting for a sample to load, which could block users from taking action for a few seconds upon entering a recipe, sample loading now occurs asynchronously.

New Expandable and Collapsible Left Navigation Bar

The left navigation bar has been updated. You can now expand the left nav bar to display full-text options for each menu item, or collapse it to reclaim additional screen area.

Improved Support for Synapse External Tables

We’ve improved support for Synapse external tables. Users should see improved performance when reading large data files from Synapse external Serverless Pools. Users can also now publish to Synapse external tables.

Dataprep: Run SQL Jobs in Customer VPC

Dataprep customers can now connect to data sources and ingest, publish, and execute SQL steps directly from their own virtual private cloud (VPC). Design time access to schema, metadata, and sample data can also leverage the customer’s VPC. For more information, see Run Dataprep in Your VPC.

Dataprep: In-VPC Processing for Photon

In addition to SQL steps, Dataprep users can now execute Trifacta Photon In-Memory jobs within their own virtual private cloud (VPC).

New connectors with Designer Cloud 9.3/9.4

We continue our journey to help you connect to any data source, enabling additional use cases. With Designer Cloud 9.3/9,4, we support the following new early-preview/read-only connectors:

SendGrid
SAP HANA
Denodo
Zoho CRM
DocuSign

For more information, see Early Preview Connection Types. You can learn all about our connectivity updates here.

If you haven’t done it already, it’s a great time to sign up for a free trial with Designer Cloud. Join us today on our journey to the cloud.

Trifacta Legend June 2022: Huong Do

2022-12-28T20:06:08Z

Trifacta Legends recognizes customers every month who are doing groundbreaking work with data using Trifacta.

We’re pleased to announce the Trifacta Legends for June 2022: Huong Do from Canopy

Huong Do is a product owner at Canopy. She has a strong background in the data space, including ingestion, aggregations, and ETL transformations. She has been in the data space for a very long time developing data products, including the bread and butter of Canopy software.

We talked to Huong about her experience building data products at Canopy. She shared some of the challenges Canopy faced, and how they overcame them with Designer Cloud’s self-service solution.

Trifacta: Huong, can you tell us a little more about yourself?

Huong: Sure! I graduated from National University of Singapore in 2017, and I have been part of Canopy since then. I’m in charge of designing and delivering automated functions for data ingestions and aggregations. I also help with evaluating business processes, anticipating requirements, and uncovering areas for improvement to develop solutions that help with automations in the company. Currently, I’m managing data-related products and I supervise other BAs on delivering requirements and solutions for our team and our clients.

Trifacta: Awesome! With that, can you tell us a little more about Canopy as a company?

Huong: Canopy is a cloud-based financial data aggregation and analytics platform. Our main clients are family offices, wealth managers, private banks, trustees, and self-directed traders. Currently, we report on more than $160 billion in assets across our 335 custodians. Our goal is to provide our clients with a more holistic view of their investment portfolios so that they are able to make better investment decisions. Currently, we offer 3 main types of services: data acquisition & aggregation, data cleansing & standardization, and analytics visualization & reporting. And Designer Cloud powered by Trifacta is playing a very important part in our data cleansing and standardization as a service for our clients.

Trifacta: That’s awesome. I’m glad the product is helping you achieve those goals. Do you mind sharing how you were doing things before, how Designer Cloud has simplified your process of standardization and cleansing, and how Canopy plays a role in the end product that customers touch?

Huong: So we used to have a multi-layered, time-consuming process, with large dependencies between the analysts and data engineers to do data cleansing and transformation. There was a lot of back-and-forth in communications between the analysts who understand the data and the data engineers who know how to code. As part of the normal process, the analyst had to put the logic down in a way that the developer could understand, then the developer would do the coding before going back to the analyst to do the testing of the transformation and the cleansing.

Trifacta: What were some of the challenges with that process?

Huong: The entire process was very human-dependent, and very time-consuming, because there was a lot of back-and-forth communication. It also depended on our development schedule as well. Because of that, we found that it was not effective for us to onboard new data sources quickly, smoothly, or effectively. So we needed to find a better solution and a better way of dealing with these issues so that we could reach a larger market and get in touch with more data providers without dependencies that would hinder us from onboarding more and more data sources in a timely manner for our clients.

Trifacta: That makes sense! And how did Designer Cloud help solve that back-and-forth problem?

Huong: Designer Cloud provided us with a very user-friendly platform for our data analysts to own and do the data transformation and cleansing by themselves without the dependencies on the developers who know how to code. So those loops of back-and-forth communication between the data analysts and data engineers are now totally removed, and data analysts can work directly on Designer Cloud to do the cleansing and transformation. And the result of their transformations are shown to them in real-time previews, and that helps to reduce our cleansing and transformation turnaround time, and reduces the turnaround time to onboard new data sources from weeks to hours.

Trifacta: That’s incredible! What have been some of the benefits Canopy has experienced following this implementation of Designer Cloud?

Huong: This has created great opportunities for us to reach out to new markets and new data providers with new sources because we are able to shorten the turnaround time of onboarding new data sources. This also allowed us to reallocate our resources to make sure that we use our resources on more impactful or more important features in the products, and it has also helped us with hiring as well, because now we just need one person and not multiple team members in a back-and-forth process.

Trifacta: That’s awesome! I love the fact that you are able to now scale and go into a broader market. Thanks so much for sharing your story with us, Huong!

Huong: It’s my pleasure!

Trifacta Legend May 2022: Mario Truss & Armin Meyer at Seibert Media

2022-12-28T20:06:17Z

Trifacta Legends recognizes customers every month who are doing groundbreaking work with data using Trifacta.

We’re pleased to announce the Trifacta Legends for May 2022: Mario Truss & Armin Meyer from Seibert Media

Mario Truss is a Product Owner of Customer Data Engineering, and Armin Meyer is a Service Owner of Tools & Data at Seibert Media. Besides being a data nerd, Mario loves music and teaching things to people.

Armin has been focused on agile methods for 10+ years, and has been working for the past 3 years to enhance the usage of data, tools and processes at Seibert Media. Aside from working with data, Armin is an avid skier.

We talked to Mario & Armin about their experience facilitating data modernization and democratization at Seibert Media. They shared some of the challenges they faced, and how they overcame them with Dataprep’s self-service solution.

Trifacta: Armin, can you tell us about your business at Seibert Media?

Armin: We provide some of the best selling apps in the Atlassian marketplace. Some of our solutions include draw.io, Linchpin, and Agile Hive. We also do consultancy, hosting, and license management for a lot of customers. We are well-known in the German-speaking region. We focus on team collaboration tools, and Mario & I work on the internal data management, data engineering, & business intelligence team.

Trifacta: Can you help us understand your data engineering journey, what technologies you use to help you achieve your objectives, and why?

Armin: As a Google Cloud Partner, we focus on doing this in Google Cloud, which is why we use Dataprep by Trifacta. But prior to that, we have always been hands on with our data, and so we had a lot of manual processes to do reporting and controlling. One advantage this brought was that we didn’t have a lot of on-premise things in the data field, so we could go directly to the cloud before it was a “hot” thing. But a disadvantage was that, when you do things manually, you face a lot of problems. It’s a lot of work, you have to regularly pull the data out of the systems, and your reports are static and quickly become obsolete. And everything you do on your reports is very costly. So we wanted to get better at using the data we had.

Trifacta: What were some of the first steps you took towards eliminating some of these manual processes to make better use of your data?

Armin: We started with some groundwork, using Kafka to extract data continuously from all of the systems that had relevant data for us. Then we pulled this data to Google BigQuery and systematically started transforming it and processing it with Dataprep. As an output, we sent this to Data Studio.

Trifacta: And under this system, you were able to automate tasks that were previously quite manual?

Armin: That’s correct. We were able to replace a lot of manual work, like pulling the data out of the systems, and Dataprep made the data transformations automated in a lot of cases, or at least a lot faster.

Trifacta: That’s great. What have been some of the business impacts of this shift, both now and going forward?

Armin: We have a BI team that creates data and does the job for the business people. They raise the questions, and we do the data transformation. The next step where we want to go, and where Dataprep is essential, is to provide self-service capabilities to our analytics and also to our data integration into the other operative systems. So the BI team can now focus on doing things like semantic models of our main data objects, where they model dimensions and common metrics, and then let the users do the rest of the job themselves to get the insights they need.

Trifacta: That’s wonderful. Sounds like you guys have gone a long way towards democratization! Mario, are you able to share any more details on what the end-to-end data engineering process looks like for you at Seibert Media?

Mario: First we have data sources, which can be APIs, CSV files, or other data. We often have to deal with a lot of different data and a lot of varying data quality. We utilize Apache Kafka to bring data inside of BigQuery. After that, we load that data inside of our BigQuery data warehouse. That’s just raw data coming from the systems. Almost in every use case we use some sort of transformation or enrichment rather than just our raw data. And that’s where Dataprep comes into place. Once we transform our data, we sync it back to our BigQuery data warehouse, and afterwards we facilitate our analytics purposes inside of Google Data Studio.

Trifacta: You mentioned that Dataprep is a key part of this process. What makes Dataprep so essential for your team?

Mario: Dataprep makes it possible for people like myself, who don’t have a computer science background, to make ETL transformations to the data and put it into the format that we need in order to use it afterwards. It allows us to make those transformations without code and makes it more accessible to people who maybe aren’t able to write perfect SQL or some other programming language to process data.

Armin: To add on to that, we started around 3 years ago, and on the first stage we did all of the transformation and processing in BigQuery itself. But what we faced is that you needed really experienced people to do this. And this is when we established Dataprep, which was really a game changer for us, because we could get in people who weren’t so experienced with writing routines and SQL queries. So it’s now much easier to find people to do the job, and it’s quicker to do the job.

Trifacta: So how have you achieved your goals as a company through this modernization and democratization process?

Mario: One of our goals as a company is to become a data-driven organization or company. And we believe that we can only become that if non-technical people have some sort of interface to interact with that data. And Dataprep’s low-code, no-code tool makes that possible. The democratization and the transformation of the data is extremely valuable in making the data accessible to business users so that they can get insights. The BI team can try things out by themselves without being dependent on us. And we can streamline the whole process so we don’t have to rely on a multitude of tools.

Trifacta: That’s wonderful! Mario and Armin, thanks again for sharing your story.

Trifacta Legend March 2022: Bud Johnson at Healthgrades

2022-12-28T20:06:28Z

Trifacta Legends recognizes a customer every month who is doing groundbreaking work with data using Trifacta.

We’re pleased to announce the Trifacta Legend for March 2022: Bud Johnson, Sr. Data Operations Manager at Healthgrades.

Bud Johnson is a Sr. Data Operations Manager at Healthgrades, where he and his team help match people with the proper care at the proper time to maximize their healthcare outcomes. Bud is a Trifacta fan, his background is in the data space, and he has been in this space for a very long time, working in interactive advertising specifically for over 25 years.

We talked to Bud about his experience and insight into interactive advertising. Bud shared some of the challenges he faced, and how he overcame them with automated processes powered by Trifacta.

Trifacta: Welcome Bud! Do you want to add anything to the introduction?

Bud: So, as you can tell, I’ve been around awhile. I took my first computer class utilizing an old IBM 360 mainframe with an acoustic phone coupler in 1975. So I’ve been playing around with this stuff for quite awhile, but went into sales and marketing, and came back to the dark side about 25 years ago directly in interactive advertising.

Trifacta: That’s awesome. You’ve probably seen a lot of variations and development in this space.

Bud: Oh, everything has changed.

Trifacta: That’s awesome! Bud, can you tell us about Healthgrades, and how we have probably interacted with Healthgrades without even knowing it?

Bud: So, we have the largest health marketplace in the country – possibly in the world. And our mission is to help match people with the proper care at the proper time to maximize their healthcare outcomes. So we can help you find the correct doctor – we have ratings from consumers, and we also bring in any public records that we can get. And the area that I work in the most is with internal customers who do interactive pharma advertising online.

Trifacta: Bud, can you take 30 seconds to explain what interactive advertising is?

Bud: So just think of interactive advertising as advertising on the web. If you’re looking at, let’s say, a dermatologist, you might get an acne ad for somebody who has a good pharmaceutical for that. So an interactive ad can help you find the right doctor, and possibly find the right treatments and medications. If you don’t know interactive advertising, this can be really complex to set up, maintain, and track, especially for our business, being in the healthcare industry, where we have much more government control over what we can and can’t say and do.

Trifacta: Excellent. So can you tell us about some of the biggest problems you face with interactive advertising, and how you use Trifacta to make your ads more effective and get more timely insights?

Bud: So one of the biggest problems that we have with interactive advertising from the data side is that most of our data is coming in as attachments to emails. Think CSVs and spreadsheets. We have no control over this, and they may come directly from our clients, or from their agencies, or it may be automated from a client’s Google Ad Manager or Google Campaign Manager via APIs or canned reporting. And that data can change with absolutely no notice, but we still attempt to make changes and get data published within the same work day it’s received. So we have to be very nimble.

Trifacta: So the accuracy and timeliness are absolutely important. Can you walk us through that a bit more?

Bud: Well, our data is used to ensure that we’re on pace for our KPIs. Using our data, we can tell if we are on schedule or behind, and this gives our trafficking teams the ability to go in and tweak campaigns as we go. Now, most of our customers have monthly goals. It would be very hard if we were only able to go in and make changes to their ads 3 times in that month. So frequency and recency of the data are very important – time is of the essence.

Trifacta: It sounds like the speed of insight is very important to you. What problems were you facing with your previous process that handled this incoming data?

Bud: So we were using a SaaS offering that was an all-in-one tool. And that tool helped us get out of doing a lot of things by hand and get to a single source of truth. But what we found was that if we brought in something new, as I discussed in the beginning, and if an agency suddenly changed the format of the data coming into us, even something as small as a name on a field, we would have to open a ticket and wait 1-3 days for resolution. And we don’t like our data being unavailable. So we made the choice to change from an all-in-one vendor to something that gave us a lot more speed and flexibility, but more importantly, more control over how the data was ingested, and how we joined the data. We needed to be able to bring in data from a lot of internal and external systems and be able to expand.

Trifacta: And how did Trifacta help to simplify this process for you and your team?

Bud: What we found is that Trifacta really helped us reduce the need for a lot of the things in how we originally envisioned writing this ourselves. We wrote a script to send information from our emails into S3, and Trifacta can immediately come and grab that information from S3 and standardize it. And that’s helpful, because we have certain types of data that come in from agencies we find the same problems with. And if we have a reoccurring class of data, we can add that into our template to easily check for it on any file we have coming in, just in case.

Trifacta: So you’ve been able to use Trifacta to help templatize your process and speed up the onboarding of new files. That’s awesome. That’s great value from a technology perspective. What is some of the value you’ve gained from using Trifacta from a business value perspective?

Bud: One of the big things that we do is leverage Trifacta to bring in data multiple times every day, and we’ve used this to set up business rule enforcement. We can capture data very quickly, and when things aren’t going out properly, we can get an automatic alert sent out to the team. Right now, we have 25 different alerts that we send out everyday. And that saves us I don’t even know how much when it comes to avoiding situations where we might have to give out potential refunds if ads aren’t showing where they were paid for. It also helps us avoid lost opportunities, because we can get data in very quickly. To give you an idea, using Trifacta, I was able to create end-to-end flows to bring in 27 new data sources in one week. That would have been impossible with our old SaaS provider. Trifacta is really the hub of our entire system.

Trifacta: That’s amazing. 27 sources in just one week. Thank you again Bud for walking us through your data journey!

Announcing the Designer Cloud 9.2 Release

2022-12-14T01:45:56Z

A month has passed by and it’s time for our latest software release. It’s our pleasure to share the new capabilities as part of the Designer Cloud 9.2 release. Let’s learn how we can make data work for us.

Easier iteration on Designer Cloud recipes with Lock/Unlock Column Data Types

You can now lock or unlock the data type of columns in your data, allowing you to iterate on the recipe without constantly resetting the data types. This also helps prevent the column’s data type from being re-inferred after subsequent transformations.

This new capability gives you the ability to define whether a column’s data type should be fixed or allowed to be inferred by the system, at various points within the transformation workflow. The capability can be initiated from multiple entry points including 1/ A single column through the Change Type column menu option within the Transformer grid; 2/ A single column through a recipe step as an option on the Change Column Type transformation; 3/ A single column through the column data type drop-down in the Data Configuration settings; and 4/ Multiple columns through a recipe step using the new Lock Column Type transformation step.

Learn more from this helpful article.

Publish Arrays to Google BigQuery

An important data type within the cloud data warehouse ecosystem is an array. Cloud data warehouses such as Google BigQuery support these arrays natively and recommend the use of nested and repeated data types to optimize space and reduce the complexity of dataset joins.

Now you can publish arrays natively into BigQuery. This is part of our initiative to support a wide range of data types that you can publish directly onto cloud data warehouses such as BigQuery.

Faster data loading with Designer Cloud’s in-memory engine

You can now experience improved performance when you use Designer Cloud’s in-memory processing engine. With faster caching, this performance improvement executes only the incremental changes when new steps are added to the recipes.

This ability increased the number of output tables that can be cached for a recipe. The caching is configurable on a per workspace basis.

New connectors with Designer Cloud 9.2

We continue our journey to help you connect to any data source, enabling additional use cases. With Designer Cloud 9.2, we support the following new connectors:

Marketo: A marketing automation platform that enables marketers to manage personalized multi-channel programs and campaigns for prospects and customers.
SFTP support for the AWS Cloud: You can now use the SFTP connector with Designer Cloud on the AWS cloud as well.

You can learn all about our connectivity updates here.

If you haven’t done it already, it’s a great time to sign up for a free trial with Designer Cloud. Join us today on our journey to the cloud. Lock Column Type transformation step.

The Power of the Merge: Bringing Together The Best of Data and Analytics

2022-04-11T20:37:33Z

I recall an old ad campaign for Reese’s Peanut Butter Cups whose tagline was “Two great tastes that taste great together.”

Two great things coming together to create a new, better thing? That’s what I call the power of the merge.

And there are two examples of this power of the merge I want to share:

The merging of two data and analytics powerhouses, Trifacta and Alteryx, to help our customers drive their analytics transformation at scale
The merging of two critical market studies from Dresner Advisory Services, Data Preparation and Data Integration Pipelines, into a single 2022 Data Engineering Market Study, the first of its kind

It turns out one merger validates the other. I’ll explain.

Trifacta and Alteryx Combine to Help Every Worker Get the Insights They Need From Their Data

The powerful combination of Trifacta and Alteryx brings together Trifacta’s game-changing integrations and cloud-native capabilities with Alteryx’s industry-leading analytics solution. Trifacta anchors and accelerates Alteryx’s journey to the cloud and opens new categories of users across IT within large enterprises. Together, Trifacta and Alteryx advance the development of an integrated end-to-end, low code/no code analytics automation platform in the cloud.

I couldn’t be more proud of this formidable new entity we’ve created together. No other company is better suited to meet the analytics needs of its customers. We’re going to propel analytics forward.

The power of this combination emanates from a single, shared goal: analytics for all: empowering every worker to get the insights they need from their data. We believe data analytics should be accessible to everyone. Trifacta focuses on the democratization of data engineering. Alteryx focuses on the democratization of analytics. When data analytics are accessible to everyone, everyone wins, from IT to lines of business.

Dresner Merges Two Reports to Recognize Data Engineering

Dresner Advisory Services also demonstrated the power of the merge in 2022. This respected industry analyst combined two of its earlier market studies, Data Preparation and Data Integration Pipelines, into a new independent research report debuting this year: the 2022 Data Engineering Market Study.

And I’m delighted to announce Dresner ranked Alteryx/Trifacta as the top data engineering vendor in its first-ever data engineering study.

Data engineering is recognized as a stand-alone space for the first time in this new study. It explores market requirements and priorities for data orchestration, integration, and transformations including advanced analytics in the data engineering pipeline workflow.

Like other Dresner market studies, its research is exhaustive. Dresner surveyed a diverse cross-section of 6,000 organizations worldwide to review data engineering market trends, dig deep into end-user requirements and features, and rank 28 data engineering vendors.

I encourage you to read the full study, but here are some highlights:

Data engineering is important. Sixty-one percent of respondents indicate data engineering is “critical” or “very important.” It’s clear this technology is becoming an indispensable part of the 21st century business landscape.
Data engineering is popular. Sixty-three percent of respondents say their organizations use data engineering capabilities today, and 20% have plans to use data engineering tools within the next 12 months.
Data engineering approaches leave room for improvement. Only 20% percent of respondents rate their current approach to data engineering as highly effective.
Data engineering tools are versatile. Organizations often purchase and use data engineering tools for more than one use case. Some of the most common ones include data integration, cleansing, and building transformation workflows for data warehouses that support dashboards and reporting.
Data engineering tools are used across the organization. They’re no longer the domain of one department or function, validating Alteryx/Trifacta’s goal to democratize data analytics. Interestingly, executive management teams reported using data engineering constantly.

These highlights validate the need for the combination of Alteryx and Trifacta, and they confirm what we know: The success of your business depends on the success of your analytics program, and the success of your analytics depends on the success of your data engineering.

Dresner isn’t the only industry expert to see the premier value of Trifacta’s data engineering capabilities: the Data Breakthrough Awards recently named Trifacta the Data Transformation Solution of the Year. And, customers again ranked Trifacta a Leader in Data Preparation, Data Quality, and ETL Tools with 9 different G2 Awards in Spring 2022. And, the power of Trifacta and Alteryx will only increase as we work toward delivering the world’s first data engineering-backed analytics solution.

How to Sort Data in Google Sheets

2022-12-28T18:53:48Z

How to Sort Data in Google Sheets

You’ve imported your data into Google Sheets—now you need to sort it. Thankfully, there’s an easier way than moving your columns up or down by hand. Google Sheets allows you to automatically sort your data numerically or alphabetically.

In this post, we’ll review how to sort data in Google Sheets as well as how to filter data in Google Sheets. Read on to learn more.

How to Sort Data in Google Sheets (Alphabetically or Numerically)

Any type of text data can be sorted alphabetically in Google Sheets. In this case, we have a list of names that we’d like to sort.

First, start by selecting your sheet by clicking the blank square in the upper left corner.
Under Data, select “Sort Range,” which will prompt a pop-up window to appear.
To exclude the header row in our sort, we’ll check the box that says “Data has header row.”
Under “Sort by,” we’ll select the “Name” column that we want to sort. We’ll keep the “A → Z” option selected since we want the names organized in ascending alphabetical order.

This same function also works with numbers. If we select our column of donation amounts, we could choose to organize those amounts from high to low (Z → A) or low to high (A → Z).

How to Sort Data in Google Sheets Across Multiple Columns

Google Sheets also gives us the option to sort multiple columns at once. For example, we could give our donor names first priority (sorted A → Z) and our donation amounts second priority (sorted Z → A.).

Setting that up in Google Sheets would look like this:

Before setting up this sorting logic, our “Names” column was sorted alphabetically, but there was no sorting preference for the coinciding donation amounts.

In the image below, we can see that donor Amelia made three donations in the month of September, but her donation amounts aren’t organized in any particular order.

Now, let’s watch how those donation amounts change once we apply our multi-column sorting logic:

Amelia’s name is still listed alphabetically, but her donations values have been reorganized to be listed from highest to lowest (Z → A). Theoretically, we could also add a third column to be considered, such as the date of the donation, and should there be any repeat donors and repeat donation amounts, the date of donation would determine their order.

Here’s more detailed instructions on how to sort data in multiple columns:

First, start by selecting your sheet by clicking the blank square in the upper left corner.
Under Data, select “Sort Range,” which will prompt a pop-up window to appear.
To exclude the header row, we’ll check the box that says “Data has header row.”
Select the first column that you’d like to sort and whether the values should be listed from high to low or low to high.
Click “Add another sort column” to add your second column. Repeat until you’d selected all columns you’d like to sort.

How to Filter Data in Google Sheets
Filtering data in Google Sheets is a great way to highlight certain data while removing (without deleting) other data that you aren’t interested in. It’s especially useful if you’re working on a shared document; there may be different questions that you’re looking to answer about the data vs. your colleagues. Filtering protects the integrity of the data while allowing you to quickly find the insights that you need.

Here’s how to do it:

1.Select the column or range of columns where you want to apply your filter.

2. In the upper right corner, click on the three dots and select the funnel “Create a filter.”

3. Now, the column(s) that you selected will be highlighted in green. Click on the green funnel next to your column name.

4. This will open up a pop-up window where we’ll decide how we want to filter the data. In this case, we’re going to be applying a conditional filter. A conditional filter allows you to apply certain rules; in this case we want to look at every donation amount above $75 so that we can analyze which donor has donated large amounts.

The resulting data looks like this:

Of course, there’s also the option to simply filter out certain values. For example, if you had a list of products and the states they were purchased in, you may want to filter out certain states to see the product’s popularity by region. Or, you could filter out products to see if they are more popular in certain states.

5. You have the option to save any filter you create so that other collaborators can reuse it. Simply return to the funnel in the upper right and click on the drop down arrow where you’ll select “Save as filter view.”

6. To close out of your filter view, click on the funnel once again so that it is deselected. When you want to return to your filter view, go back to the drop down arrow and select the name of your filter (in this case, “Filter 1”).

Where Sorting or Filtering Data in Google Sheets Can Fall Short
Sorting or filtering data can be the answer for those trying to find out how to organize data in Google Sheets. But it’s also often used as a means to explore and better understand the contents of data—and this is where these methods can come up short.

Though filtering and sorting data in Google Sheets does offer insight into your data’s trends, at the end of the day, you’re still looking at rows and columns. Which means it’s hard to get a clear picture of the data and any outliers, commonalities, etc. that it may contain. This difficulty will only amplify as the data increases and you’re no longer scanning tens or hundreds of rows—but thousands.

Finally, users must remember that sorting or filtering data will do little good if the data isn’t cleaned properly. For example, say you’re filtering for all mentions of “California” in your data. The filter will not bring up any misspellings or abbreviations of the word. While there are ways to search for alternative representations, such as using a conditional filter of “Text starts with” or “Text ends with,” this can still be a time-consuming (and ultimately imperfect) process.

The Alteryx Designer Cloud Data Preparation Platform

Though Google Sheets is an excellent tool for simple reporting or analytic tasks, many organizations are adopting data preparation platforms like Designer Cloud in order to prepare big or complex data for analysis—or to simply ensure that data of any size is free of errors.

The Designer Cloud platform automatically presents visual representations of your data based upon its content in the most compelling visual profile. This allows an immediate understanding of the data at a glance—no more searching or filtering through spreadsheets to find trends across your data.

It also alerts users to any data quality concerns, such as missing or invalid data, so that these data quality issues don’t slip through to the end analysis. And, since Trifacta is powered by machine learning, the platform is smart enough to recognize what the user is trying to do. If they want to standardize all versions or misspellings of California, for example, the tool will automatically suggest things like “CA” or “Calif.”

To be clear, there is no direct competition between Google Sheets and a data preparation platform like Trifacta—they are simply two great tools used for different purposes. In fact, the Designer Cloud technology can be found on the Google Cloud Platform as Google Cloud Dataprep by Trifacta. And while using Cloud Dataprep, it’s easy for users to pull in Google Sheets data to explore, join, and prepare for analytic use.

To learn more about Designer Cloud, kick off your 30-day free trial for free today!

What’s New in the Designer Cloud 9.1 Release

2022-12-14T03:03:01Z

Our software releases and updates come fast and furious. We’re excited to share the latest capabilities as part of the Designer Cloud 9.1 release. As always, we cover a wide range of features related to data engineering. Let’s dive into them.

General Availability of SSH Tunneling Connectivity Support

Hybrid architectures spanning the cloud and on-premises networks are common, especially for large enterprises with applications residing both on-premises and in the cloud. To support and strengthen hybrid architectures, we’re excited to announce the General Availability of connectivity using SSH Tunneling. This expands on our previous limited preview announcement of this capability with our 8.10 release last year.

To help connect to hosts such as database servers deployed within a private network, you can now enable SSH Tunneling within Designer Cloud. SSH Tunneling offers a secure solution where the SSH ports are open for access from public networks whenever needed. With this solution, you don’t need to whitelist specific IP addresses or open application ports to access these hosts. SSH Tunneling is a secure and widely accepted technology where all data is encrypted during transit, thereby maintaining a secure transport session.

You can learn more from our technical documentation.

Higher Data Accuracy with Schema Change Detection

Schema refers to the sequence and data types in a dataset. It is common for schemas to change over time, causing broken transformation steps or recipes that can cause data corruption with downstream applications. This new capability enables you to monitor schema changes in your dataset and helps you identify data sources where the schema has changed. Further, the job fails when this occurs. This is done by comparing the current schema of the data source and the schema that was previously stored in the database.

Schema changes are detected if columns are added, removed, or moved. You can configure the jobs to fail if schema changes are detected. This is supported for JDBC, BigQuery, AVRO, and Parquet file formats, with support for additional formats coming in the upcoming releases. Learn more about this new capability with our community article here.

Better Visibility with Sample Job IDs

With Designer Cloud, you can quickly start working with your dataset. This is accomplished by automatically generating a sample using the first set of rows of your dataset. Sample jobs are independent executions and you can always specify the type of sample you wish to create and initiate the job to create the sample. The sampling jobs run in the background.

Samples have unique IDs which previously could be accessed on the job history page. You can now visualize these IDs in the familiar Transformer view, helping you identify the samples easily. This also helps with better visibility showing all the samples along with their IDs on a single screen. The details of the sample including job ID can be accessed by clicking on a particular sample name.

You can learn all about sampling here.

Increased flexibility with dataset configurations

Datasets are the foundation for all data pipelines. Datasets often come from different sources containing extraneous columns, complex column names, or other inconsistencies leading to incorrect inference and inaccurate results. You can now overcome these hurdles by updating and preserving metadata configuration that can be reused consistently and applied each time the dataset is used in a new flow.

With this new capability, you can search for and select a subset of columns to be included or omitted out of flows whenever the dataset is used. You can manually rename columns as part of the reusable dataset and override system inferred Trifacta data types for ongoing saving and reuse. It is currently supported for relational, delimited, and schema files such as Parquet and Avro.

Learn more about configuration settings here.

Additional security with Customer Managed Encryption Keys (CMEK) for Dataflow

Customer Managed Encryption Keys are created, managed, and stored within the cloud key management service. These keys can be applied to individual objects. When used, data that is written for the objects that are scoped by the keys are automatically encrypted when written and decrypted when read.

Now, you can have user-specific CMEKs when using Google Cloud Dataflow. This will ensure that any intermediate files created by Dataflow will use the CMEKs. This capability is currently under Private Preview and please contact Trifacta support if you would like to use this feature.

New connector with Designer Cloud 9.1

We now have a new connector in Early Preview. We support connectors to Instagram Ads which is a method of paying for post-sponsored content on the Instagram platform. You can learn all about our connectivity updates here.

It’s never too late to sign up for a free trial with Designer Cloud. Join us today on our journey to the cloud with Alteryx.

Trifacta Legend February 2022: Bob Hall at The Home Depot

2022-12-28T20:06:37Z

Trifacta Legends recognizes a customer every month who is doing groundbreaking work with data using Trifacta.

We’re pleased to announce the Trifacta Legend for February 2022: Bob Hall, Sr. Manager of People Analytics at The Home Depot.

Bob Hall is a Sr. Manager of People Analytics at The Home Depot, where he and his team help support and drive efficiency for Home Depot’s field associates. Bob is passionate about process innovation and democratizing data to provide actionable and intelligent insights to his organization. During his time at Home Depot, Bob has played a key role in upgrading antiquated processes and replacing them with automation powered by a modern data stack. Bob has a bachelor’s degree in Business Management from the Georgia Institute of Technology.

We talked to Bob about his experience and insight into process innovation at The Home Depot. Bob shared some of the challenges he and his team faced and how he overcame those obstacles to create scalable processes using Trifacta’s Data Engineering Cloud.

Trifacta: First, thank you Bob for being a valued customer of Trifacta. It has been a pleasure to work with you and we look forward to the continued partnership. Can you tell us a little more about yourself and your role at The Home Depot?

Bob: Certainly. I am deeply passionate about process innovation. As you can imagine with a huge company like ours, there are some systems that are antiquated and some processes that are antiquated. Home Depot is really invested in making sure we’re staying up to speed with the times, and part of that is how do we make things better for our associates? All of our roles here at the Store Support Center are to support our field organization and our field associates and make their jobs easier.

Trifacta: That’s awesome. Can you tell us about one of the great use cases that you’ve delivered with Trifacta?

Bob: Sure. So we have a Paycard Reconciliation use case. This is one of our foundational use cases. And I say that because this is a use case that we identified very early on that really allowed us to test and learn.

So what is Paycard Reconciliation? Basically, as we’re delivering paycards to either terminated associates or people who have elected to get a paycard, we’re trying to ensure that this payment reconciles back financially. This is a very important process that has compliance aspects to it and associate satisfaction is also tied to it. It’s all about reducing errors, reducing fallouts, and ensuring accuracy and compliance. There’s also a big time saving component. Spending a couple hours doing this everyday was not going to be beneficial for our team when they could lend their expertise to something else and drive more value to our associates and customers.

Trifacta: So what was this process like before you started using Trifacta?

Bob: We were using a combination of Excel and Access. We would take Workday files and files from our banks that we leverage. In Excel, our associates would spend a couple of hours looking at an Excel spreadsheet and manually reconciling things that were fallouts or outliers. Then, we had to feed this back to the Access database so that we could re-reconcile the next day for the next run. So overall, it was a highly manual process with a lot of steps, pulling in a lot of data from different sources and disparate systems.

With Dataprep, until we get to RPA to feed Dataprep, we still have to take those reports and reconcile them from our vendors, our banks, but after that, the process is purely automated. So Dataprep takes them from storage, it does the reconciliation for us, it calls out any fallouts or errors, and then it feeds it back into the Dataprep cycle. We now have daily execution and automation through scheduling functionality in Dataprep. And we now save over 2 hours per run on this per day, saving us a significant amount of time while ensuring consistency and accuracy, which is probably one of the more important things for us.

Trifacta: That’s amazing. So when you talk about Trifacta, how would you articulate the value?

Bob: What’s the value? We’ve reduced the daily reconciliation process by 2 hours, and really annually by around 520 hours. That’s a huge measure of value for us. It’s also improved the quality of work. No one wants to do manual reconciliations for several hours a day. So we’ve improved that quality for them so that they can hopefully do the same for our associates that interact with our field customers. There’s also risk reduction. We now have full confidence in completeness and accuracy around this process, so that if we’re approached for audits we can now walk everyone through all of the steps. We’ve used Trifacta to scale up operations. Before, we had to manually scale down our scope, but with Dataprep, we can scale up virtually to audit the entire population systemically.

Trifacta: That’s incredible! Thanks again Bob for walking us through your use case and all the value you’re adding to The Home Depot. We really appreciate it.

Bob: You’re welcome!

What’s New in the Designer Cloud 9.0 Release

2022-12-14T01:43:07Z

It’s been a whirlwind of a New Year for us at Alteryx (formerly Trifacta) with all our big announcements and updates. On that note, our delivery of new capabilities showcasing our innovation in the data engineering space continues at a frenetic pace. 2022 is here and it’s time to get on board the new 9.x software release train. We have some innovative capabilities on the first release of this train, and we just got started on what is going to be an exciting journey ahead. Let’s dive right in!

General Availability of SQL-based ELT on Snowflake, the Data Cloud

Last year, we launched the private preview of SQL-based ELT on Snowflake with our 8.10 release at AWS re:Invent. We’re now excited to announce this capability is now generally available to all our customers.

With full pushdown on Snowflake, the data transformation logic also known as the data wrangling logic is converted into SQL, and the transformations are directly executed on Snowflake. During transformations, the data stays within Snowflake, resulting in a secure solution that efficiently uses the compute resources in the cloud while delivering a complete data transformation solution within the ELT architecture.

Read our technical blog post to learn more.

Keep your data fresh and updated with ‘Dataset Schema Refresh’

Our newest capability allows you to refresh dataset metadata with the latest source schema. A schema is a skeleton structure representing the logical view of a dataset. This dataset can be a file, a table, or a SQL query in a database. Schemas may apply to relational tables and schematized file formats such as Parquet and Avro.

Dataset schemas often change at different rates, and often columns could be added or removed from your dataset. This could result in incorrect results as headers may not align with the new underlying data. To overcome this undesired outcome, users needed to import new copies of the dataset to keep it in sync. If the same dataset is being used in several flows, users needed to manually replace the dataset on each flow to sync with the new version. This can be highly error-prone.

Now, “Dataset Schema Refresh” enables on-demand updating of your imported dataset schemas to capture changes to columns. This helps reduce the number of duplicate or invalid datasets that are created from the same source. Additionally, this helps reduce the challenges of replacing datasets and retaking samples with changes to dataset schemas. You can initiate multiple refreshes of different datasets concurrently, helping you with increased efficiency. This capability applies to relational schemas, schema files, and delimited files.

Increased flexibility with data transformations using Javascript User-Defined Functions

Today, Designer Cloud supports a wide range of functions to transform your data easily and efficiently. However, there may be scenarios to address specific use cases that need custom functions. To support these use cases, you can now create custom transformations using Javascript User-Defined Functions (UDFs). These are also referred to as custom user-defined functions. These functions can be imported into Designer Cloud to use in your recipes.

Once a Javascript UDF has been created and added by one user, it becomes searchable and available for all users within the familiar Designer Cloud Transformer interface. You can also combine the UDF with other generally available functions in the Transformer formula builder for further expansion.

With the 9.0 release, Javascript UDFs are available in private preview. To enable this capability in your environment, please email support@trifacta.com with details of your use case. We look forward to working with you and enabling your specific use cases.

REST API Connectivity

We now support REST API connections on Designer Cloud that provide a generic interface to relational data. REST APIs have gained popularity as they provide a flexible, lightweight way to integrate applications, and have emerged as a common method to connect endpoints in different architectures.

Here is a high-level refresher on what APIs and REST APIs are. APIs or Application Programming Interfaces are sets of rules that define how applications can connect to, and communicate with each other. REST APIs communicate via HTTP requests to perform standard database functions such as creating, reading, updating, and deleting records within a resource. Using REST API connections within Designer Cloud, you can now create connections to individual endpoints across hundreds of REST-based applications. This is an import-only connection type. REST API connectivity is available in private preview with the 9.0 release.

Click here for more information.

Easy resolution for missing or invalid connections

When a flow containing pre or post-SQL scripts does not have a valid connection associated with it, the scripts will fail. This could happen due to moving flows from one environment to another where connections might not exist. It could also be due to users importing flows containing pre or post-SQL publishing actions using private connections that have not been shared with them.

We now provide users the ability to resolve missing or invalid connections for pre or post-run SQL scripts by alerting the users on these invalid connections on the flow view canvas as well as on the side panel. When the publishing action is opened, you will see that the current connection is invalid and you will have the option to choose another valid database connection that you have access to.

New connectors with Designer 9.0

We continue our journey to help you connect to any data source, enabling additional use cases. With Designer Cloud 9.0, we support the following new connectors:

Zendesk: A service-first CRM company that builds software designed to improve customer relationships.
LinkedIn Ads: A paid marketing tool that offers access to LinkedIn social feed through sponsored posts.

You can learn all about our connectivity updates here.

With our recent acquisition by Alteryx, we are forging ahead in our cloud journey for the best of data engineering and analytics. If you have not done already, sign up for a free trial today and join us on this exciting ride. Onwards and upwards!

ETL Developer: Key Role in Determining and Supporting Data Systems and Data Storage

2022-12-28T19:34:55Z

Who Is an ETL Developer?

An ETL Developer is an IT specialist, well-versed in software engineering and database development, who designs, develops, automates, and supports complex applications to extract, transform, and load data. ETL stands for “extract, transform, load.” It refers to the 3-step process of preparing raw data so that data analysts and data scientists can use it to gain actionable insights about the business.

Step 1: Extract

Organizations generate massive volumes of data. This data may be stored across multiple systems and in a wide range of different formats. Data must be extracted from cloud environments, CRMs, or other external systems before it can be used in applications or for analytics or machine learning.

Step 2: Transform

After data is extracted and collected, it’s in a raw state and needs work to make it compatible with defined standards. Transforming data can involve:

Cleansing: removing inconsistencies and missing values
Standardizing: bringing datasets into a required format
Deduplicating: excluding irrelevant data
Verifying: removing data that can’t be used and marking aberrations
Sorting: organizing data by type

Step 3: Load

The final step is to load transformed data into data storage, such as a data warehouse, cloud data warehouse, cloud data lake, or data lakehouse, or into external systems or applications. These systems include automated tools to make data accessible for users, such as business intelligence tools for visualizing and reporting on data.

What Are the Responsibilities of an ETL Developer?

ETL Developers must have a big-picture view of their organization’s data needs and environment and are responsible for a wide range of duties and tasks.

Determining Data Storage and Management Needs

ETL Developers figure out the exact storage needs of the organizations they work for. ETL Developers need a clear, detailed picture of their organization’s current and future data architecture, environment, and needs.

Designing and Building Data Storage and Management Systems

ETL Developers design systems, such as cloud data warehouses, cloud data lakes, or lakehouses, to address their organizations’ data needs and work with development teams to build them.

Building Data Pipelines

ETL Developers create and manage data pipelines—that is, reliable tools and processes that deliver data to end users—to connect to data in different formats and move it between systems.

Extracting, Transforming, and Loading of Data

When building data pipelines, the goal of an ETL Developer is to extract data, prepare it, and move it —in full loads and/or incremental data loads— from a source file into a destination, such as a cloud data warehouse, cloud data lake, data lakehouse, or external application.

Testing and Troubleshooting

ETL Developers perform quality assurance tests to make sure their systems and pipelines are stable and run smoothly. ETL Developers also identify and resolve system problems that may arise within the warehousing system.

How Does Trifacta Help ETL Developers?

Trifacta significantly reduces the time, technical skills, and costs required for ETL Developers to access any type of data, wherever it resides, and automate the process of transforming data and building data pipelines.

The Trifacta Data Engineering Cloud helps ETL Developers transform data, ensure quality, and automate data pipelines, making data consumable at any scale. This intelligent, collaborative, self-service data engineering cloud platform helps ETL Developers:

Connect to data from any source. With universal data connectivity and a self-service architecture, Trifacta makes it fast and easy for ETL Developers to connect data from any source. This makes it easier for ETL Developers to support a wider range of data integration use cases and applications.

Transform raw data into ready-to-use data. ETL Developers can use Trifacta’s visual interface and predictive data transformation suggestions to greatly reduce the time it takes to detect and resolve complex data patterns and transform them into consumable data across the organization.

Create real-time previews of transformed data. Trifacta presents automated, visual, and interactive representations of data. ETL Developers can use these previews to explore data more deeply and understand it at its most granular level. Outliers in the data can be automatically identified and flagged for follow-up, helping ETL Developers easily eliminate bad data.

Build, automate, deploy data pipelines. With just a few clicks, Trifacta helps ETL Developers build automated data pipelines at scale. With Trifacta, ETL Developers can deploy and manage self-service data pipelines in minutes, not months.

Interested in learning how Trifacta can help your ETL Developers reduce the time, technical skills, and costs required to transform data and build data pipelines? Schedule a demo of Trifacta today.

Faster Data Processing with Designer Cloud’s In-Memory Engine

2022-12-14T04:15:34Z

Data Transformation has always been a tedious task for data processing engines. Distributed data processing engines can be very handy for processing very large datasets but it isn’t going to be competitive with running a single process on a single machine if the data fits in. Processing these small to medium sized datasets has been a problem with well known distributed data processing engines due to their initialization overheads. Keeping this in consideration, Photon targets empowering the user with a unique data transformation experience that is intelligent, productive, fast and efficient when working on small to medium sized datasets. First introduced at Strata in 2016, the latest developments in Photon create an enhanced experience for users.

Photon is an in-memory, batch data processing engine, designed to be fast and efficient for small to medium-sized datasets due to minimal initialization overhead. When you build your recipe in the Designer Cloud, you can see the effects of the transformations that you are creating in real time. When you wish to produce result sets of these transformations, you must run a job, which performs a separate set of execution steps on the data. Photon snaps into this Intelligent Execution architecture of Designer Cloud to run side-by-side with more resource-intensive distributed computing frameworks like Apache Spark and Google Cloud Dataflow that Designer Cloud supports for big data processing.

Why do we need another execution engine?

We set two goals while designing Photon, firstly to provide real-time feedback to users as they try to transform their sample data in the browser, and secondly to create a fast and efficient environment for job execution on the complete dataset. As already mentioned, Designer Cloud leverages Google Dataflow and Apache Spark to process very large datasets efficiently in a distributed manner. For small to medium-sized data, Photon’s single node, in-memory architecture reduces the overhead during initialization significantly and makes it the optimal choice, allowing us to provide our users with reduced job execution times and costs. In our internal testing, Photon jobs were 85-95% faster than Google Dataflow Jobs. This lightweight design of Photon also allows us to embed Photon directly in the browser and power Designer Cloud’s real-time transformation UI, which many of our customers love.

How does Photon work?

Photon is Trifacta’s built-in interactive, data processing execution engine that runs on the web browser providing users real-time transformations for their datasets.

Photon takes the data transformation steps (also known as the Designer Cloud recipe), converts it into a Protobuf representation which is Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. Further, it interprets the different transforms, prepares an execution graph, in which each node represents a transform to be applied on the Data.

The Data is then sent to each node in the form of multiple row batches (a continuous chunk of data) and the transform is applied on it, Execution is done in parallel by feeding it to the next node.

Photon leverages the above mechanism in 2 ways, one while designing the recipe step by step and the other is while running the entire recipe on a complete dataset.

While designing the recipe, Photon is built as a js module with help of the Emscripten toolchain that interacts with the UI. An individual recipe step is sent to Photon which checks if a corresponding result-table is present in the previously computed results in the “Photon cache” to avoid unnecessary computation. If not, it executes the recipe on the data shown in the UI, stores the results in the cache, and returns the results to the UI.

Photon also can run transformations on the whole dataset when it is chosen as an execution engine within Designer Cloud.

This uses a fully managed and scalable infrastructure Designer Cloud manages behind the scenes. Since Photon can be run as a standalone executable during job execution, it is easily containerized. This allows us to support Photon job execution directly in the user’s VPC, making job execution faster and more secure by bringing the execution engine to where the data resides.

In summary, Photon is ideal to process small and medium datasets with a faster and more efficient architecture, by overcoming the execution overhead that is typically observed by many mainstream processing engines. Users are provided with data transformation in real-time making it easy to use with Designer Cloud’s intuitive interface.

We would love for you to give it a spin today. Sign up for our free trial and experience the magic of Photon from Designer Cloud.