SAS Users

From question to clarity: how SAS Viya Copilot changes the way we work with data

Sasha Karpinski — Wed, 01 Apr 2026 14:04:45 +0000

Most analytical journeys start the same way – with a question.

When did we have the highest profit?
Which customers are driving growth?
What segment should we look at next?

In traditional analytic workflows, turning those questions into answers often requires navigating menus, configuring visuals, writing calculations and interpreting results before a story emerges.

In the last blog post in this series, we explored how SAS^® Viya^® Copilot changes the traditional analytical experience inside SAS^® Visual Analytics.

Embedded directly in SAS Visual Analytics, Copilot introduces a conversational AI assistant that helps users explore data, create reports, and interpret insights through natural language. By combining generative AI with the analytical capabilities of the SAS Viya platform, Copilot transforms business questions into analytical actions, recommendations, and explanations – creating a more fluid analytics experience.

Let's explore how Viya Copilot can help you move from question to clarity faster than ever.

Analytics that meet you where you are

Analytics doesn’t happen in a single step. It unfolds across stages:

Exploring and shaping data.
Building visualizations and reports.
Interpreting insights.

SAS Viya Copilot supports users throughout this journey – users simply need to ask.

A more natural way to work with data

Understanding the data is at the center of answering any business question. For many users, this early stage of analysis can take time – especially when working with unfamiliar datasets.

Within SAS Visual Analytics, Copilot simplifies this process by acting as a guide to the data – helping users explore, understand and prepare their data for further analysis.

To make the data fit-for-purpose, Copilot can change a data item’s properties – including the label, format and aggregation.
Copilot can also assist with common data transformations, including creating groups or bins from categories, defining hierarchies, and preparing data items for time-based or geospatial analysis.
Preparing data often requires creating entirely new calculations to answer a particular question. Instead of manually writing expressions to determine growth rate or profit margin, Copilot can generate new calculations from simple natural language.

This conversational interface lowers barriers for business users while accelerating exploration for experienced analysts.

From data to visuals – faster

Designing dashboards and reports often involves a series of manual steps – selecting visuals, assigning data roles, adjusting formatting, and refining layouts. Copilot helps streamline the process, enabling users to go from raw data to meaningful visual stories faster.

Copilot can generate new report pages from scratch, automatically creating relevant visuals - like charts, tables, and other content like titles, headers and footers – based on your business question or area of analysis.
Designing dashboards is an iterative process – Copilot can refine existing reports by adding new visuals to the report page, replacing or changing existing visuals to show data in a different way, or deleting visuals or other content that are no longer relevant.
Copilot can focus insights by applying filters, ranks, display rules or interactions to existing visuals, helping narrow attention to the most relevant information.

The goal is to remove friction from the workflow so users can work more efficiently. By automating many of the tasks involved in report building, Copilot enables designers to focus their time on communicating the story behind the data.

Insights only a question away

Reports and dashboards provide insights – and often lead to brand new questions. The conversational nature of Viya Copilot allows users to ask questions about their data and quickly receive tailored on-the-fly responses in real time.

Copilot can create ad-hoc visualizations that help answer a user’s question. The categories, measures, and any additional logic – such as applied filters or ranks – are visible to the user, ensuring transparency into how the insight was produced.
New questions often emerge mid-analysis. Copilot retains the conversational context from earlier interactions to handle follow-up questions seamlessly.
Copilot can summarize visualizations, highlight key trends, and generate visual explanations that help users better understand relationships between important variables.

In this way, curiosity becomes part of the workflow. Questions surface and can be explored immediately with Copilot acting as a collaborative AI assistant.

A faster path from curiosity to clarity

SAS Visual Analytics has long included augmented analytics capabilities that automatically surface insights, highlight drivers, and explain outcomes. Viya Copilot builds on that foundation by adding generative AI to the experience.

With Copilot, analytics becomes conversational. This creates a more collaborative relationship between the user and the analytics platform - one where the system actively helps users navigate their analytical journey.

In the next blog post, we’ll focus on the Copilot capabilities available outside of the chat pane – embedding GenAI experiences directly into the user workflow for targeted assistance.

Learn more about how SAS Viya Copilot helps organizations turn analytical power into sustained decision momentum.

From question to clarity: how SAS Viya Copilot changes the way we work with data was published on SAS Users.

SAS Innovate 2026 and you: the SAS user

Chris Hemedinger — Mon, 02 Mar 2026 22:01:26 +0000

If you're a SAS user who has not been to a SAS conference in a while, I want you to know that SAS Innovate 2026 was designed with you in mind.

Whether your day‑to‑day work happens in SAS 9, SAS Viya, or a thoughtful combination of both, SAS Innovate brings together the people, ideas, and practical guidance that help you get more value from SAS today —- while preparing for what's next.

This year's event doubles down on what SAS users have always valued most: learning from each other, sharing real‑world experience, and finding practical ways to work smarter with the tools you already rely on.

SAS Users Day: Powered by the community

One of the highlights of SAS Innovate 2026 is SAS Users Day -- an afternoon focused completely on SAS users as presenters and audience.

SAS Users Day celebrates the way SAS professionals learn best: through user groups, communities, and peer‑to‑peer knowledge sharing.

If you've ever picked up a tip at a local user group meeting, learned something valuable from a SAS Communities post, or borrowed code from a fellow user, SAS Users Day will feel like home. SAS Users Day takes place on Monday afternoon featuring experienced conference presenters, ending with a bit of social time for networking with your fellow users.

Expect:

Sessions led by active SAS user group members
Stories that reflect real constraints, real deadlines, and real wins
Plenty of opportunities to connect with people who speak your SAS language

It's a reminder that while software matters, the SAS user community is one of the platform's greatest strengths.

SAS 9: Productive today, Preparing for tomorrow

"Cooking with SAS 9" - throwback to 2003

If SAS 9 is still a big part of how you get work done, this SAS Innovate 2026 agenda is absolutely worth your time. Think of it as a chance to swap notes with people who've been there—sharing practical tips, hard‑won lessons, and smart ways to keep SAS 9 running strong while figuring out what comes next.

Presenters in the SAS 9 track include current SAS users, as well as developers in R&D's SAS 9 division who support the platform. What you'll learn:

How other teams are keeping SAS 9 fast, stable, and reliable for real production workloads
Simple, proven ways to clean up, manage, and future‑proof long‑lived SAS code
Where SAS 9 still shines -- and where it makes sense to complement it with newer tools (including AI!)
Practical patterns for integrating SAS 9 with cloud services, APIs, and open source
Real stories from customers who are successfully running SAS 9 today (and planning tomorrow)

SAS Innovate works because of SAS users -- and it's built to work for you. Register now, and we'll see you there!

SAS Innovate 2026 and you: the SAS user was published on SAS Users.

Don't crash this ship! DuckDB is heading straight for Iceberg!

Joe Cabral — Mon, 23 Feb 2026 20:37:10 +0000

Since its inception, DuckDB has been commanding respect in the data management sphere, carving its place as a highly performant data processing system. At SAS, the rapid advancements DuckDB has made have gone far from unnoticed; that's why, in the 2025.07 release of SAS Viya, we introduced SAS/ACCESS Interface to DuckDB. Over the last seven months, we're proud of the enhancements this has brought to the SAS suite, improving our compatibility and extensibility with open file formats like Parquet, regardless of your choice of data storage. But the job is far from done. In fact, it won't end until we've quacked our last... uh... if quack is the verb... what's the noun?.

Introducing Iceberg support

Today I'd like to share a very exciting development to the SAS/ACCESS Interface: enhanced Iceberg support! As the mountain of data organizations face continues to grow exponentially, it's no surprise that cost-efficient data storage solutions have become a major focus. This focus is two-fold: where do we store our data, for one, and equally important, how do we store our data?

Open file formats like Parquet and Avro are excellent solutions for the latter question, but depending on your choice of storage solution, you might still experience high latency, poor performance, and dangerously high storage costs. Your storage strategy, of course, is likely stratified based on data access frequency and purpose, so it's unlikely to see an organization's entire corpus of data sitting in the same solution. Nonetheless, wherever you keep your data, you'll want your storage affordable and your access efficient. DuckDB and Iceberg, used in tandem, can provide an easy, readable, and efficient methodology for accessing, querying, and even editing data while stored in cost-conscious locations. And now, as of SAS Viya's 2026.01 release, you can wield all that power yourself, from the comfort of your own LIBNAME.

Let's dig in on the what, where, and how!

Where's on First

To be clear, this demonstration is surely not the only way to combine SAS, DuckDB, and Iceberg, but I've found it particularly easy, both as an administrator setting it up, and as a user leveraging it. From an admin perspective, the location we'll be keeping our data in today is AWS's S3 - but not the traditional buckets for object storage. Instead, we'll be using S3 Table Buckets, which use Apache Iceberg to manage tables as objects, rather than just files. This is supremely important when working with open file formats like Parquet, because the number one detractor from Parquet is the computational difficulty in editing it. Parquet is highly effective for querying thanks to its columnar nature and paginated metadata, but it requires full file re-writes in order to effect change. Iceberg, as a table format, mitigates this by using metadata files to manage versioning (among many other benefits) of the data they govern. As the top layer, S3 Table Buckets abstract all of these mechanisms away from the user, so all we see are query-ready tables in a conventional database structure.

How's on Second

By this point, the how is probably pretty clear: the joint operation of the SAS/ACCESS Interface to DuckDB. SAS/ACCESS Interfaces have long been used to connect to remote data sources in SAS 9 and Viya alike. But the DuckDB Interface brings a slightly different modus operandi; rather than being specifically designed to connect to a SINGLE type of data source (i.e. ACCESS to Snowflake, Databricks, etc.), the DuckDB ACCESS Interface can explicitly extend to ANY cloud storage supported by DuckDB's vast extension mechanism. This extension mechanism is exactly what we'll be leveraging behind the scenes to establish a connection to Iceberg Tables, in this case those located on S3 Table Buckets.

To connect to Iceberg tables in an S3 Table Bucket, all we need is a LIBNAME statement, just as we would connect to any other remote data source. Here, duckdb is the engine name, and bowser is the library name:

libname bowser duckdb file_type=iceberg
  iceberg_catalog='arn:aws:s3tables:[region]::bucket/[bucket-name]'
  iceberg_endpoint_type=s3_tables
  s3_access_key=&s3key
  s3_secret=&s3secret
  s3_region='[region]'
  schema='[schema-name]';

Here's what you need to make this work:

Find the Amazon Resource Name (ARN) of your Table Bucket. This value (in quotes) will be the iceberg_catalog option.
Declare the file_type option to be iceberg and the iceberg_endpoint_type to be s3_tables. Note that while you can write a large portion of SAS code without any regard to capitalization, the values for these options MUST be lowercase.
Define your s3_access_key and s3_secret. For general best practice, I used macro variables that I declared elsewhere (in single quotes). These credentials should be tied to a user in your AWS account that has read & write access to the source Table Bucket.
Define the s3_region of your bucket. It's always good practice to keep your cloud resources in the same region when possible, as cross-regional data movement introduces extra latency.

The last thing you'll need is the schema value. When looking inside your Table Bucket from the S3 GUI, you'll notice each table has an associated namespace. This value is the schema in the full qualified table location - so any given table in the bucket could be accessed explicitly at: iceberg_catalog...
Once you have all this information, you're ready to test out your new DuckDB-powered Iceberg-backed LIBNAME! You'll notice that just like a traditional Library, your member tables are all present on the left-hand side of the GUI, as seen below. This presence is a slight distinction from some other SAS/ACCESS to DuckDB use cases, where connections to external cloud storage defined within explicit single-transaction PROC SQL / CONNECT statements would, by design, not be considered member tables. I consider the LIBNAME options as a best practice regardless of the data storage to which you’re connecting. Learn more about the options in the LIBNAME statement here.
What's on Third
What I've done to the classic Abbott & Costello skit at this point is unforgiveable, but I'm in too deep to stop now. Let's get some SAS code on the board and showcase the ease and speed that SAS/ACCESS to DuckDB to bring to your Iceberg tables. Simplest behaviors first: how do we read Iceberg tables?
PROC SQL outobs=10; SELECT * from BOWSER.target; QUIT;

In the above snippet, we're simply asking DuckDB to return the top 10 rows of the target table within the BOWSER library. It looks no different than any other PROC SQL implicit read. Typically, this Bowser target would be named Mario, but in this case, it's taxi data:
When we were defining the LIBNAME, I mentioned that the S3 Tables each had fully qualified names. This is useful here, because many programmers, including myself, like to use database-specific flavors of SQL. If you're a big fan of DuckDB's SQL flavor syntax, then explicit passthrough in PROC SQL is for you:

PROC SQL; CONNECT USING BOWSER; SELECT * FROM CONNECTION TO BOWSER ( SELECT * FROM iceberg_catalog.main.target LIMIT 10 ); QUIT;

Using the full name iceberg_catalog.main.target, we can use DuckDB-specific SQL on our S3 Table. Given the breadth of DuckDB's SQL capabilities, this serves as a powerful connection without too much of a change to the PROC SQL wrapping syntax. Note that in both the implicit and the explicit passthrough scenarios, we are required to pay attention to capitalization for the bucket-specific resources. It doesn’t matter whether the LIBNAME is capitalized, but the case of the schema and table names must match their definition in the bucket.

Thanks to advancements in DuckDB itself at the tail end of 2025, SAS/ACCESS to DuckDB can not only read from Iceberg tables, but also write!

PROC SQL; CREATE TABLE BOWSER.line_items AS SELECT * FROM WORK.line_items; QUIT;

Once again, there's no deviation from the standards of PROC SQL in table creation. This exact code might be useful if we wanted to transfer a SAS table (the above being from the TPC-H datasets) into an Iceberg Table in S3 as part of our storage strategy. In our S3 Table Bucket, we can validate that the new line_items table has been created:

Maybe we want to perform some changes to this table down the line. If this table were stored simply as a Parquet file in a bucket, this would be a computationally expensive hassle, but thanks again to Iceberg, edits on these table objects are easy!

/* How many records do we have to start? A: 6001215 */ PROC SQL; SELECT COUNT(*) FROM BOWSER.line_items; QUIT; /* Delete all the records with ship-date before 1993 */ PROC SQL; DELETE FROM BOWSER.line_items WHERE L_SHIPDAT <= '01JAN1993'd; QUIT; /* Validation: How many records now? A: 5242432 */ PROC SQL; SELECT COUNT(*) FROM BOWSER.line_items; QUIT;

In just 8 seconds, DuckDB executed the removal of all records with a shipping date before 1993 from the table. Similarly, we can add or update records with ease. Do note the intricacies of versioning and deletions within Iceberg that allow for such efficient editing of tables backed in immutable formats. I consider it important to understand not just the fact that DuckDB and Iceberg together add value, but why they do so behind the scenes.

The one caveat before we check into the performance of SAS/ACCESS to DuckDB + Iceberg is the current limitations on the editing side. With real-world data, adding, updating, and deleting records aren't the only things that happen. Often times, we want to mutate the structure of a table itself: add, update, or remove columns. This is currently not supported in DuckDB itself, though DuckDB.org explicitly lists "schema evolution" in their near future planning. In the meantime, I highly recommend reading through the November announcement of write capabilities in DuckDB itself to learn the full extent of its capabilities.

Bringing it Home

I've harped on the value-add of DuckDB & Iceberg from the lens of SAS Viya for three bases now. Let's use a more in-depth query to validate the benefits. We'll use a widely popular (and freely available) dataset: the Yellow Taxi Trip Records courtesy of NYC.gov - specifically a single Iceberg table encompassing data from 2024, titled yellow_tripdata_2024:

The 41 million row dataset occupies 6.27GB sitting as a local .sas7bdat to the Viya environment, backed on a disk with NetApp Files Premium. For this test, we've gone ahead and pre-converted the dataset into an Iceberg table sitting in the S3 Table Bucket, where it occupies 656MB. With both versions of the dataset available, the test consists of a single implicit PROC SQL statement, seen below:

PROC SQL; SELECT passenger_count, payment_type, count(*) AS num_trips, avg(trip_distance) AS avg_distance, avg(fare_amount) AS avg_fare, avg(tip_amount) AS avg_tip FROM [library].yellow_tripdata_2024 WHERE passenger_count is not NULL AND passenger_count > 0 AND passenger_count < 5 AND trip_distance < 100 AND trip_distance > 0 GROUP BY passenger_count, payment_type ORDER BY payment_type, passenger_count; QUIT;

This query serves as a solid litmus test for overall engine performance, leveraging both CPU & I/O through its calculated columns, filtration, and ordering clause. We first ran it using a standard SAS Library connection to the data on the disk; then, we used the DuckDB Library to execute the query against the Iceberg table on the S3 Table Bucket. The test found a sizable improvement in both real-time and CPU-time performance by leveraging the S3 Table Bucket and querying it with the DuckDB engine:

Engine + Storage Real Time (s) CPU Time (s)

Standard Engine + Azure NetApp Files Premium 25.82 18.96

DuckDB + Iceberg on S3 Tables Bucket 3.05 4.60

Efficiency Boost by DuckDB 8.46x 4.12x

Now, while these results are exciting, I want to point out that not every query, not every table, and certainly not every use case will reap the same benefits. For example, some downstream analytical procedures in the SAS software suite that can't be mimicked with a querying engine. But there are tangible, and sometimes massive, performance improvements that SAS/ACCESS to DuckDB can introduce to your pipelines. To boot, the S3 Table Bucket that we used to back this experiment is significantly cheaper than many of the high-performance storage disks. Here's a sample calculation based on current rates and size assumptions:

Migrating this individual workload not only noticeably improved performance, but it reduced the storage cost of the backing data by just about 99%, while also adding in the aforementioned benefits of Iceberg tables like time travel and versioning.

The last 7 months have brought incredible developments to SAS's integration with open file formats, and I'm personally thrilled to both witness and build with the resulting solutions in the SAS software suite. If you’re interested in learning more about the capabilities of SAS/ACCESS to DuckDB, I highly recommend taking the official SAS course: Working with DuckDB in SAS Viya®. It's quick, informative, and shows you a handful of tips, tricks, and best practices for maximizing the value of the engine. As always, thank you for reading, and I’ll see you at home plate.

Don't crash this ship! DuckDB is heading straight for Iceberg! was published on SAS Users.

Engine + Storage	Real Time (s)	CPU Time (s)
Standard Engine + Azure NetApp Files Premium	25.82	18.96
DuckDB + Iceberg on S3 Tables Bucket	3.05	4.60
Efficiency Boost by DuckDB	8.46x	4.12x

Engage for Managed Cloud Services: Strategic automation for the future

John Conoley — Fri, 09 Jan 2026 19:37:23 +0000

“Speed it up, lock it down, and cut the clutter. If we’re not moving forward, we’re falling behind.” Some iteration of this message is hard not to feel in today’s world where pressures are rising to innovate and deliver. Fast.

At SAS, our Managed Cloud Services (MCS) division strives to meet these demands head-on through a powerful automation engine: the Engage platform.

Engage is a strategic automation framework that enables SAS to deliver cloud services with speed, precision and scalability. Whether supporting custom enterprise deployments or standardized offerings, Engage helps us deliver value faster and more reliably to our hosted customers.

What is Engage?

Engage is SAS’ internal automation platform designed to streamline service delivery across cloud environments. It powers everything from infrastructure provisioning to life cycle management, enabling self-service workflows and near real-time results. Built on modern cloud principles like Infrastructure as Code (IaC), DevOps, FinOps and security governance, Engage is the backbone of our cloud management strategy.

Within SAS Managed Cloud Services, Engage supports three core service lines:

Managed Cloud Services Enterprise: Tailored infrastructure for high-revenue contracts.

Managed Cloud Services Fleet: Immutable, standardized offerings for scale.

Managed Cloud Services Developer Experience (DevExp): Automation and governance for internal teams.

Let’s explore how each delivers strategic value.

SAS Managed Cloud Services Enterprise: Custom Infrastructure at Scale

For customers with complex requirements, SAS Managed Cloud Services Enterprise provides tailored infrastructure and software deployments. Engage automates the sizing and provisioning process by pulling data from each customer’s Cloud Card and translating it into precise infrastructure builds using IaC.

Once the infrastructure is ready, build administrators use Engage to trigger software installations via auto-DaC workflows. This automation reduces deployment timelines and enables flexible customization, accelerating time to value without sacrificing control.

What’s the strategic value? Enterprise customers get access to bespoke environments faster, with fewer manual steps and greater consistency.

Managed Cloud Services Fleet: Immutable, Pre-Packaged Offerings

Managed Cloud Services Fleet is designed for scale. It delivers standardized SAS® Viya® environments, like  SAS® Viya® Essentials, that can be provisioned in under two hours. These offerings include pre-sized infrastructure and standardized software orders (e.g., Visual Analytics, Visual Statistics), all delivered through the Engage Catalog.

Because Fleet environments are immutable, lifecycle automation becomes predictable and reliable. Build administrators provide minimal inputs, and Engage handles the rest, right up to identity provider integration.

What’s the strategic value? Fleet enables rapid deployment at scale, making it ideal for SMBs, trials, and repeatable use cases.

Managed Cloud Services Developer Experience: Empowering Automation and Governance

Behind the scenes, our Developer Experience (DevExp) framework empowers SAS teams to build, deploy, and manage automation across hyperscaler environments like Azure, AWS, and Google Cloud.
DevExp includes:

Code repositories and CI/CD pipelines.

Governance tools for compliance.

Integration with platforms like Azure DevOps and GitHub Actions.

Self-service portals via MCAP, ServiceNow and Engage.

Life cycle management for infrastructure and operations.

A developer control plane for monitoring, logging, security and observability.

DevExp supports both Enterprise and Fleet offerings, ensuring automation is standardized, repeatable and scalable.

What’s the strategic value? DevExp enables internal teams to innovate faster while maintaining governance and transparency.

Continuous Innovation: What’s Next for Engage and Managed Cloud Services

Our journey with Engage is far from over. The platform is evolving to support AI-infused automation, deeper integrations with hyperscaler services, and expanded partner access. These innovations will further reduce friction, improve customer experience, and unlock new possibilities for service delivery.

As we continually modernize our cloud frameworks, Engage remains central to our strategy, helping us deliver smarter, faster and more secure cloud services.

Engage is setting a new standard for automation in SAS Managed Cloud Services. By combining self-service workflows, lifecycle automation and modern cloud principles, SAS is delivering faster outcomes and better experiences for our customers.

Interested in learning more? Read our white paper, Enabling data and analytics in the cloud.

Engage for Managed Cloud Services: Strategic automation for the future was published on SAS Users.

Update Data Live in SAS Viya: Integrating SAS Code with Interactive Reports

Danny Sprukulis — Thu, 08 Jan 2026 14:00:01 +0000

In SAS Viya 4, we can embed inputs directly on the reporting page with live results. These reports have code that takes user inputs and runs the program, which will run the dataset and update it with the most recent data. This gives the user the ability to create datasets on the fly and share the results with their teams.

We begin with our code, which creates macros that can communicate with our report and eventually the job that runs in the background of the report. Below (Figure 1), we can see the variables being brought into the code. They are then assigned values that will be used in our code and formulas (in my example, I have an optimization running in SAS code called Opt_Article.sas, which is where the macros are referenced). The code can then run in whichever program the user would like, but at the end of the program, there must be an upload process to get the final table into the CAS engine. Our table is named Project_Opt. This table will come into play later. For this example I am assigning values to project capacity, which could be used as budgets and time constraints.

Figure 1: Macro Variables being assigned

For this code to update a report live, the code must finish with an upload statement that takes down the previous dataset and replaces it with the updated dataset with the same name (Figure 2).

Figure 2: The table created from the program is called work.solution, it is then fed to a temporary table that exchanges the dataset with Project_Opt, leaving just the most up to date data in Project_Opt

The next step is to create a job in SAS. The user must create a new Job Definition and change the form to HTML. In the code, the user will need to create HTML code that reflects the inputs required by the macros. In Figure 3, the HTML for the inputs are shown. These allow the user to provide a base value and title for the input.

Figure 3: HTML of an input variable

Then, the program side of the job must reference the code the user created that contains the formulas that utilize the macros (Figure 4).

Figure 4: Code for referencing the HTML, creating an input interface, and referencing the original program using the macros (Opt_Article.sas)

On the right-hand side of the job interface, the parameters must be selected (Figure 5). This is where the macros need to be reflected for the report, matching the ones entered in the code and HTML. The additional parameters are for debugging purposes.

Figure 5: Input Parameters (left) and setting up the parameters (right)

Once this is completed, the job is ready for use. Copy the URL provided in the properties tab under Job URL (Figure 6).

Figure 6: Under the properties tab, copy the address of the Job URL

Then, in a VA report, select the Web Content object. In the options tab of the object, place the URL. The job interface should appear. The user must then bring in the table that is created from the program, Project_Opt (Figure 7).

Figure 7: The Web Content Object with the Job URL linked inside of it.

The data can now be shown in graphical interfaces. In the options tab for any graph using the table, select Periodically Reload Data and set a short timeframe. Then switch to non-edit mode and run the job (Figure 8).

Figure 8: Final report which will update live after each Submit on the Web Content object. The report must not be in edit mode to get the updates.

The report should update as you run the submission, providing the updated dataset and report. This allows the user to share project results live with their teammates and gives non-coders the ability to manipulate the datasets to generate value for their workforce.
Update Data Live in SAS Viya: Integrating SAS Code with Interactive Reports was published on SAS Users.

Uploading and visualizing custom shapefiles in SAS Viya

Danny Sprukulis — Wed, 07 Jan 2026 17:49:28 +0000

Photo Source: Getty Images

SAS Viya has a large list of mapping shapes for geographic visualizations, but sometimes business units have custom regions that need to be displayed. These can range from individual counties, sales regions, postal codes, and voting regions. To properly visualize these regions, SAS has a process to bring in regions with custom borders.

SAS already has many regions preloaded into the system. When creating a new geography item in Visual Analytics, Viya has three options: Geographic name or code lookup, Geographic data provider, or Latitude and Longitude in data. For building regions that are already in SAS, select the geographic name or code lookup option. A drop-down list will appear asking for a Name or Code lookup (Figure 1).

Figure 1: New geography item selection screen, selecting the name or lookup value

Depending on the user data, the context will change. If the column has full country names, use Country or Region Names. If there are 2 letter state names, use that option. There are quite a few options (Figure 2) but for regions that cover whole states and do not have custom borders/shapes outside of political boundaries, the option of Subdivision (State, province) Names is ideal. When that option is selected, another drop-down will appear for the country that the data refers to, giving worldwide options for region shapes (Figure 3).

Figure 2: Selecting the option for province names in the lookup

Figure 3: When the subdivision option is selected, the options will be country specific

The user can now visualize the provinces/states for individual countries (Figure 4). But, sometimes these regions are tied together. To combine these predefined provinces, the user can create a custom category. In the data tab, create a custom category. Add the geography item created before to be “Based on”. Then add each of the provinces to individual groups for each region (Figure 5). Then add a button bar or filter to individually select the regions to display (Figure 6). A data tip value can be added so when hovering over the map, the business region will appear.

Figure 4: A map of Canada created using the Figure 3 options—These shapes are preloaded by SAS

Figure 5: Custom Category creating groups of region shapes

Figure 6: Using the custom category to visualize the region/group of provinces

Now, if the region has custom borders, there is a process for this. To start this process, the user must have Geographic permissions from the admin of the Viya environment (Full admin permissions also work). Once that is completed, they can begin. To start, the user needs to have a shapefile of the region they wish to bring to SAS. The files required for this process to properly run are .shp, .shx, .dbf, and .prj files (Figure 7). These files are used to turn a shapefile into a dataset SAS can read.

Figure 7: Files needed to create custom regions in SAS

Once those files are in SAS, a SAS macro will need to be run. This macro (Figure 8) has <> brackets where the user will need to fill out. The shape file path is the path to the .shp file location. The .shp file must be stored with the other 3 files in the same folder. The ID column is the column with the names of the regions in the shapefile dataset. This file is what maps data to the file once in Visual Analytics by mapping the categorical column that contains the region name and joins it with the shapefile names. If the id column is unknown to the user, I have found a way that if the code is run with the other blanks filled, the code will run with errors but the columns in the .shp file are revealed with the rows. To do this, give the a random name and once the name of the column is discovered, replace the ID and rerun the code. The outtable is the name that the SAS dataset will take on after the code is run. The cashost and casport are values found in Manage Environment in the Servers tab, just copy those values over. Lastly, the caslib selection is which library the SAS dataset will be outputted in (Public or casuser typically).

Figure 8: SAS shapefile upload macro

Once the macro has run successfully, the user can navigate to Visual Analytics. This is where the shapefiles are connected to the dataset. Make sure the user’s data containing the region values is included in the report. Then select “New Data Item” in the data pane and “New Geography Item”. Select the region column that has the region names for the “Based on” item. Then select “Geographic data provider”. This is where the new region shapes are created. If the user has created these regions before, they can use a provider in the drop-down menu. In this case a new provider is needed, so click on the three dot icon. Select “New provider” (Figure 9).

Figure 9: Selecting the Geographic data provider and New provider options

The new geographic data provider menu will appear. Enter a Name and a Label; these can be the same or different. This determines how the provider will appear the next time the user wants to use these shapes. Leave Type and Server the same, unless the shapefile macro was saved elsewhere. Select the Library to where the macro in Figure 8 had caslib pointing to (Public in this case). Select the outtable created for the table selection. Select Polygon for geometry. Change the ID column to the value used in id for Figure 8. Then have _seq_, SEGMENT, Y, X for the next 4 options (Figure 10). Lastly, the coordinate space is needed.

Depending on where the .shp was found, it could be a variety of different values. WGS84 is the standard I like to work with but when shapefiles are sourced from outside of the user’s ArcGIS then custom options can be needed. If the EPSG value is known, select coordinate space as custom value and then “EPSG <4-digit number>”. The user may not get it correct on the first try so keep trying different numbers as the EPSG value (that are somewhat close to the estimated value. Googling the value is an option as well). A wrong EPSG value may put the shapes all over the map but as the guesses get closer, the shapes will appear closer to their actual location.

Figure 10: Inputs to create the custom regions in Visual Analytics

Once this is complete, the provider will be created. Now returning to the geography item page, select ID column as the same as the “Based on” column. Then the regions are complete. Select a geo-region map in the objects pane and select the new geography item as the geography (Figure 11).

Figure 11: The visualized custom regions

The new shapes are properly uploaded into SAS and can be visualized with data. The users now have the freedom to map any territory they prefer and can customize their mapping reports. By combining SAS Viya’s built-in geographic options with the ability to upload custom shapefiles, users can accurately represent business-specific regions that go beyond standard political boundaries. Whether you are grouping existing regions or bringing in fully custom shapes, this process gives you the flexibility to create precise, meaningful maps that better reflect how your organization views and analyzes geographic data.

Learn more

Mapping data over images

Utilizing SAS Viya’s Geocoding capabilities for multiple data configurations and languages

Simulating theme park wait times within SAS Viya, creating a live solution for hospitality operations

Uploading and visualizing custom shapefiles in SAS Viya was published on SAS Users.

The afterparty: Hyperparameter autotuning revisited

Stu Sztukowski — Wed, 17 Dec 2025 21:21:52 +0000

In my first article on hyperparameter autotuning, I used a cake analogy to show how to use hyperparameter autotuning with Optuna and the sasviya.ml package in Python to improve detecting Higgs bosons in a particle accelerator. SAS Viya Workbench now supports hyperparameter autotuning in SAS code with a variety of different machine learning models, helping you achieve greater accuracy with less code. Remember how using Optuna was like asking Gordon Ramsay to critique our cake? In Python, we had to get him into the kitchen and tell him what to critique and how to critique it. This time, he’s already here and ready to yell at you.

Let’s take a look at the absolute minimum amount of code needed to perform hyperparameter autotuning with a gradient boosting model using SAS code.

proc gradboost data=sashelp.iris; autotune; target species / level=nominal; input _NUMERIC_; run;

Yep. That’s it. It’s literally one extra line: autotune.

When you add this single line, SAS will run it through a genetic algorithm, partition it into a 70/30 train/validation split, run multiple parallel evaluations in a single iteration, and automatically stop after either 50 evaluations, 5 iterations, 10 hours, or stagnation in 4 iterations. Each hyperparameter has pre-configured default bounds and is fully modifiable. If you’re curious as to what these defaults are, check out the documentation for the AUTOTUNE statement in PROC GRADBOOST.

That’s a whole lot of things happening from that one little statement. This gives you a good baseline, and considering how well-tuned the default values are, it may end up being all you need.

Predicting Higgs bosons with SAS

Let’s go back to solving the Higgs boson prediction problem, and this time we’ll do it with SAS code. One thing I love about programming in SAS is how compact the code is for modeling. To show you how little code is needed to solve this problem in its entirety, I’m going to post the whole program below.

filename inpipe pipe "unzip -p /workspaces/myfolder/data/higgs.zip | gunzip -c"; %let colnames = signal lepton_pt lepton_eta lepton_phi missing_energy_magnitude missing_energy_phi jet_1_pt jet_1_eta jet_1_phi jet_1_btag jet_2_pt jet_2_eta jet_2_phi jet_2_btag jet_3_pt jet_3_eta jet_3_phi jet_3_btag jet_4_pt jet_4_eta jet_4_phi jet_4_btag m_jj m_jjj m_lv m_jlv m_bb m_wbb m_wwbb ; data higgs; infile inpipe dlm=',' dsd; input &colnames; drop m_:; run; proc gradboost data=higgs outmodel=low_level_model seed=42 earlystop(stagnation=0); partition fraction(validate=0.15 test=0.15 seed=42); autotune objective=auc searchmethod=bayesian targetevent='1' popsize=5 historytable=higgs_low_level_history tuningparameters = ( ntrees(UB=300) maxdepth(UB=30) learningrate(UB=0.1) ) ; target signal / level=nominal; input _NUMERIC_; output out=low_level_preds role; run;

That’s a grand total of 38 lines, including some extra whitespace for readability. In those 38 lines, we:

Define where our file is and how to unpack with unzip and gunzip:
filename inpipe pipe "unzip -p /workspaces/myfolder/data/higgs.zip | gunzip -c";

Define our column names and their order:
%let colnames = ...;

Unzip, untar, and read in the data:
data higgs; infile inpipe dlm=',' dsd; input &colnames; drop m_:; run;

Then, in one gradient boosting procedure:

Perform a repeatable train/validate/test split:
partition fraction(validate=0.15 test=0.15 seed=42)

Enable autotuning:
autotune

Set our autotuning objective, search method, and target event:
objective=auc searchmethod=bayesian targetevent='1'

Create 4 parallel evaluations, with 5 iterations and 5 evaluations per iteration:
popsize=5

Save the best model, autotuning history, predictions, and tag the role of each prediction:
outmodel=low_level_model historytable=higgs_low_level_history output out=low_level_preds role

Set a few hyperparameter bounds to try because we learned last time that more trees, lower depth, and a lower learning rate are important with this data:
tuningparameters = ( ntrees(UB=300) maxdepth(UB=30) learningrate(UB=0.1) )

Even though we changed the bounds of a few of our hyperparameters, we are still tuning the rest of the parameters as well. We just gave it some more suggestions.

There are a ton of other autotuning options available. We can go much further than this, including excluding hyperparameters from tuning, adding a second objective, setting a maximum time, using k-fold cross validation, setting a stagnation threshold, and much more.

Not only that, but PROC GRADBOOST handles both classification and regression in only one PROC. You simply tell it what type of target you have, and it sorts out the rest. You’ll find this to be the case for the rest of the machine learning PROCs too, such as Deep Neural Networks and Support Vector Machines. This simplifies the world of models you need to know.

Let’s run this model and see how it compares.

The results

How did we do this time around? Using the SAS autotuning framework, we achieved a test AUC of 0.801 for the low-level model. Let’s put this into perspective with the rest of the models we were against, including our previous SAS model that was tuned with Optuna.

Model AUC: Low-level Δ vs SAS (%)

SAS: Tuned Gradient Boosting 0.801

SAS: Tuned Gradient Boosting (Optuna) 0.795 -0.75%

Paper: Boosted Decision Tree 0.73 -8.9%

Paper: Shallow Neural Network 0.733 -8.5%

Paper: Deep Neural Network 0.880 +9.9%

Here’s what all the evaluations looked like:

Overall, impressive results for how little code we needed, and we even got our best result just 9 evaluations in. This shows how well SAS has tuned its hyperparameter autotuning framework, looking at optimal combinations of hyperparameters to test without the end-user trying to find a good starting point for each one.

Going bigger

If you’re ready to scale out with huge data, you don’t need to learn a whole new language. Many SAS modeling procedures automatically change where the calculations happen, bringing massively parallel, multithreaded processing to your data without any extra effort on your part. All you need to do is load your data to CAS and change your input dataset, then SAS handles the rest.
Don’t believe me? Give it a try yourself on SAS Viya with our first example using Iris:

cas; caslib _ALL_ assign; data casuser.iris; set sashelp.iris; run; proc gradboost data=casuser.iris; autotune; target species / level=nominal; input _NUMERIC_; run;

All we did was load the data into CAS and tell PROC GRADBOOST that our data is no longer on disk, but in CAS. Just like that, the processing is now massively parallel and ready to scale to your demands.

Wrapping it up

We went through two examples of using sasviya.ml’s GradientBoostingClassifier and PROC GRADBOOST, showing that you can get impressive results in either language. You may be wondering which one is better.

Want to know a secret?

They’re both using the exact same engine.

sasviya.ml isn’t some watered down set of algorithms for Python data scientists. Far from it. This is the same high-performance, trusted, battle-tested math that backs other SAS Viya machine learning models. The SAS Viya Python API is designed to be Pythonic and work with other open source packages, letting you seamlessly integrate your favorite packages in the open source ecosystem, like Optuna, with virtually no learning curve.

The flexibility of SAS Viya Workbench gives you the option to run your models in multiple languages while using the same underlying engine. Whether you’re using Python or a PROC, you can be assured that you’re going to get the performance and results that you expect from SAS. The language you use, and how you mix them together, is entirely up to you.

Links

Previous article: Boost ML accuracy with hyperparameter tuning (with a fun twist)

Documentation: Overview of Hyperparameter Autotuning in SAS Viya

UC Irvine Machine Learning Repository - Higgs boson dataset

References

Whiteson, D. (2014). HIGGS [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5V312. Licensed under CC BY-SA 4.0.

The afterparty: Hyperparameter autotuning revisited was published on SAS Users.

Mapping data over images

Danny Sprukulis — Tue, 16 Dec 2025 20:39:06 +0000

Source: Getty Images

Mapping in SAS Viya gives the user plenty of options for maps to use. Signing in to ESRI on SAS Viya gives us many more. But what if the map that is needed is not available and is technically not a map at all? Well, not to worry, as there is a solution for that. Mapping on images is a lot like creating shapefiles on regular maps; the main difference is how that map is uploaded to SAS. This solution gives the user full control of what they would like to see in their reports and gives them full customizability for mapping visualizations.

The first step is having an ArcGIS Online account. This account needs to be set up with some credits to use as data storage currency so that the image the user uploads can be saved in the system storage. Once this step is complete, the user must also have a desktop version of ArcGIS Pro, where the maps can be built. Within the desktop application, the user must be connected to their ArcGIS Online account through the sign-in page to ensure everything is connected.

Now the user can begin the process. Once in ArcGIS Pro, a new project must be created. This will bring up a map of the world. In the contents pane, right-click Map and select Add Data. A box will pop up to browse the files, and the user can select the image they wish to use (Figure 1). For this example, the user will be using an image of a stadium.

Figure 1: Selecting the image file in ArcGIS Pro

The image name will pop up below Map on the Contents pane, and a warning will say “Unknown Coordinate System”, which is fine since the user will be specifying their own coordinates for the image. To provide these coordinates, go to the Imagery tab at the top of the application, from there Imagery options should appear (Figure 2).

Figure 2: Imagery tab

Right-click on the image name in the contents pane. Select “Zoom to Layer” and the user will be brought to where the picture is on the map (without a coordinate system, the image should originate in the middle of the Atlantic Ocean). In the Imagery tab, select the “Geo Reference” option. Then the system will allow the user to begin geo-referencing the image. The user must now choose where they would like the image to appear on a map. The user must select the “Add Control Points” option in the ribbon. Then the user can select a point on the picture. Here, a notable corner on the image is selected (Figure 3).

Figure 3: Selecting a source point on the map that can be easily identified on the base map

Once the initial point is selected, the user must then select a point on the map that this control point connects to (Figure 4).

Figure 4: Select the identifiable point on a map for a target point

The image will now be placed over the point in its original orientation (Figure 5).

Figure 5: The image is now connecting the target and source points. The image is still in its upwards position since it will default to that position when not enough control points are active.

The user must add another control point to get the image to follow the proper orientation of its place on a map. To add another control point, select the point on the image as a source point and then select another point on the map as a target point (Figure 6).

The image should follow its proper orientation now, but further reference points may be needed to be exact. In my experience, 3-4 is the maximum number of points needed, as the image will start to have distortion and become less accurate.

The image is now mapped and needs to be saved. Right-click on the Map in the Contents pane and select Properties. Give the Map a name, then close the Properties tab. The new name should be reflected where the word "Map" used to be. Right-click the new map name and select “Edit Metadata”. The user should now give the Item a Title, some tags (brief descriptions of the map), and a summary. Select “Apply” in the ribbon and close the metadata page. The user can now upload this image to their ArcGIS Online account. Go to the Share tab and select Web Layer – Publish Web Layer (Figure 7).

Figure 7: Publish Web Layer location

A share page will pop up. Select Layer Type to be Tile and Share with Everyone. Then select Publish. The image will now be uploaded to the ArcGIS Online account.

In ArcGIS Online, go to the My Content page. Find the name of the map that was just created, and the file should say “Tile Layer (hosted)”. Select the map, and an overview of the map will pop up. Go to Settings and scroll to Tile Layer (hosted). This is where the user chooses how detailed the image will be on the map's scale. In Figure 8, World to Country is the default, but depending on the size of the image, the user should change the dropdown from Country. For instance, if it is a map of a building, then go to the size of a small building; if it is a large park select the option that is closest to that size.

Figure 8: Selecting the Visible Range for the image, depending on the size of the image, the user can change how far out it can be seen from viewing the map

Next, do not select Build Tiles. First, select Save below the visible range selector (Figure 9).

Figure 9: Saving the tile

Once saved, select Build Tiles. The tile screen will appear, and once it has finished uploading, select Create Tiles (Figure 10).

Figure 10: Creating the tiles

From there, our tile/image is uploaded to the ArcGIS cloud. To use this in SAS Viya, go into SAS Viya Explore and Visualize. The user must be signed into their ArcGIS Online account from there. If they are not, they can go to their user icon while in Explore and Visualize, select Settings, and select Geographic Mapping, and sign in there. Once signed in, bring a map into the Explore and Visualize workspace. This can be any of the maps in the Objects tab. In the options on the right-hand side, select Maps (Figure 11).

Figure 11: Choosing a map, it will default an Automatic map

The user’s maps will show up under User Tiles here. Select the file folder icon and a list of mapping options become available to the user (Figure 12). Since the user created a tile in ArcGIS Online, the image will be in that folder.

Figure 12: All mapping folders available. The ArcGIS.com folders only appear if the user is signed into their ArcGIS Online account in Viya

When the image is selected, it will populate within the map. Depending on the Visual Range selected in Figure 10, the user may need to zoom to the proper level to see the map. These maps can have shapefiles overlap the image as well. I have added shapefiles to each of the sections of the stadium, so the map is more interactive with data (Figure 13).

Figure 13: The final map result in Visual Analytics using a geo-region map

Another example of the image mapping is in Figure 14, with mapped body sections over a diagram of the human body.

Figure 14: An image of the human body being mapped with custom shapefiles of body sections.

This is how the user can map over top of images. It allows them to customize their mapping interface for full freedom in their report visualizations.

Mapping data over images was published on SAS Users.

TouchDuck! Exploring unique features of SAS/ACCESS to DuckDB through college football

Joe Cabral — Wed, 10 Dec 2025 20:26:00 +0000

There's a chill in the air through much of the United States, and that can only mean one thing: the cold disappointment of your college football season as your team's playoff hopes fade away into the darkness. Back in September, I was curious about what data was readily available surrounding the CFB landscape: recruiting, rankings, gametime statistics, etc. Thanks to the brilliant data collection and export tools produced by CollegeFootballData.com, I was able to build a personalized series of datasets describing several areas of interest:

I was originally eyeing a predictive pipeline where I would use SAS Viya to make predictions on the season, but as the games distracted me from my code, I watched a wickedly entertaining (and arguably unlikely) season shake out before my eyes. I decided to pivot from a predictive approach to a more exploratory pipeline, and there's no better way to attack this than with SAS Viya's new ACCESS Engine to DuckDB. In this blog, I'll use my exploratory data analysis of college football data from the 2024-2025 off-season to highlight some neat features of DuckDB that can now be leveraged by the SAS/ACCESS Engine. Without further delay, let's dive into our data.

Data Ingestion

Before I start even viewing my datasets, I took the path of least resistance for ingestion and simply uploaded the datasets from my PC to the SAS Server on the Viya environment.

One neat capability of DuckDB is its versatility in connecting to data sources; if I were dealing with much larger scale datasets, I might have stored them on an S3 bucket, which SAS/ACCESS to DuckDB can query from with ease using DuckDB's installable extensions. That won't be the case here given the small-scale data at hand. As with any SAS pipeline, the first code we'll execute will set up a Library connection using the DuckDB Engine:

Like most SAS/ACCESS Engines, DuckDB LIBNAME statements have plenty of options for configuring your connection. In this case, all I've done is set an upper bound for the memory usage of the database and select a local path on the SAS Server for the DuckDB database file. A full list of options can be found here. Common configurations include the file_path and file_type options, which would instruct the library to load all files of a given type in the defined directory. I've opted against this as a best practice, since I don't want to load all my tables, nor will I want all the columns for every table. I'd rather only load what I need, when I need it—and DuckDB affords me that capability as well.

Loading And Exploring Our Tables

Based on what I exported from the College Football Database, I want to start with a "base table" of teams and their characteristics. Let's investigate the teams.csv table and see what we might want from it.

Using PROC SQL, I push-down a DESCRIBE SELECT * statement into the DuckDB instance, instructing it to tell me about the CSV dataset columns by just reading one line. In that sense, it's a SQL-writer's parallel to PROC CONTENTS.

Well, that's a lot of columns, and I'm not sure I care about them all right now. Let's use PROC SQL to load what we want into the DUKLIB library:

I'm doing a few different things in one procedure here. First, I use the CONNECT statement (as done previously) to tell SAS which data source to connect to (the DuckDB library, of course). The EXECUTE statement explicitly pushes SQL code down to DuckDB, meaning that I can write anything I want in DuckDB's flavor of SQL (closely related to the PostgreSQL dialect). This dialect allows us to leverage tons of features unavailable to the standard PROC SQL syntax. An easy but significant example here is the CREATE OR REPLACE syntax, which, if ran implicitly in PROC SQL on a standard SAS library, would not be recognized! This is obviously nice since I continually, without fail, forget to drop tables when I edit and re-run queries.

At the bottom of the query, the SELECT * FROM CONNECTION is used to generate an output in the SAS GUI. All EXECUTE statements do just as implied—they execute code in DuckDB—but their results aren't propagated back up to the SAS user interface. If you put a SELECT statement inside an EXECUTE statement, DuckDB would run it, but you, the SAS user, wouldn't see the results. Thus, the SELECT * FROM CONNECTION is what we use to retrieve results from a query.

We now have a TEAMS table under DUKLIB, which we can see in our Library pane, just as we would with any other type of data source:

I've talked a lot for just loading a single table, but it's important to lay the groundwork of how the syntax works when using PROC SQL with DuckDB. The benefits are substantial, especially for large and easily subsettable datasets, but—as with any new tool—there’s a learning curve to mastering the syntax. Let's move on and focus on some more data.

Investigation #1: Talent Gain & Loss Over the Offseason

In the modern age of transfer portals, NIL, and ever-increasingly wild coaching carousels, the number one burning question I had of my data was "who gained and who lost talent over the offseason?" I'm not interested in just the transfers, but also the graduates and the incoming freshmen. I have two CSV files pertaining to this, 2024_247_composite.csv and 2025_247_composite.csv, containing data procured from the 247 sports news site.

The first PROC SQL mimics what we did for the TEAMS dataset, and tells us we have three columns: “Year”, “Team”, and “Talent.” Knowing that we have two files, one for 2024 and one for 2025, I join BOTH tables to the DUKLIB.TEAMS table on account of team name, then create a calculated column, “Talent_Delta”, of the difference between the 247-assessed talent metrics. Lastly, I use a SELECT * FROM CONNECTION statement to preview my new dataset, DUKLIB.TALENT.

So far, everything I've done has been in PROC SQL, which makes sense, since DuckDB, after all, is operated with SQL. But can I leverage other PROC's from SAS' toolbox as I would with a different library connection? The answer is a resounding YES. Let's create two tables in our ephemeral WORK library by using PROC SORT on our DUKLIB.TALENT table, ordering our FBS teams by “Talent_Delta” in one output, and by “Talent_2025” in another.

Resulting DELTASORT Table:

You would expect the big Power 4 names to top the DELTASORT table, but, as we'll see in the TALENTSORT table, it's hard to post such large gains when your team was already so talented AND losing players to graduation and transfers. Sympathies do go out to Charlotte, who rounds out the bottom after being virtually gutted over the offseason. Perhaps that partially explains their 1-9 record on the year:

In terms of raw talent, the familiar leaders emerge atop the TALENTSORT table:

Writing this blog across October and November, I find this simple sort shows the pure beauty of college football: talent doesn't guarantee anything. Check #6 LSU, or #9 Penn State, or #12 Auburn, or #13 Florida, all three of which have missed their expectations so significantly that they've paid millions of dollars just to get rid of their coaches. My condolences go to the fanbases, but after all, it's the suffering that builds character.

Investigation #2: Offseason Recruiting

With a general idea of who gained "talent" between seasons, let's dive deeper and focus on the actual recruiting process. Down the line, it would surely be interesting to line that data up next to a list of publicly known NIL funds. I'd bet on a strong correlation. Money talks, in the end.

Anyways, let's investigate the recruiting using the 2025_recruiting_detailed.csv dataset. Just as in the previous investigation, I'll use the DUKLIB connection to select the results of a DESCRIBE statement to get the metadata on our columns.

The number one thing that sticks out to me after running the DESCRIBE statement alone is the PositionGroup variable—what ARE the Position Groups of recruiting, anyway? The SELECT DISTINCT statement/clause will parse exactly that.

The results tell us we have eight total Position Groups, and a summative group as well. That means every team has multiple records pertaining to their recruiting efforts (number of commits, average rating of commits) in different Position Groups, along with a cumulative record. I could simply use a WHERE clause to pull the cumulative records only, but that's no fun, and we'd lose plenty of valuable information—even though we don't need data to see that your school forgot to recruit a quarterback. Thankfully, DuckDB provides a key functionality that will allow us to transform this tall dataset into a wide format that will join one-to-one cleanly with our teams & talent data. Let's start by binning some of the Position Groups into categories:

The CASE clause within the EXECUTE statement does exactly what SQL programmers (and Python programmers alike) would expect—it buckets different situations into different results, and saves those results to a new column, called "Category." Here, we isolate four main categories, which closely align with the "three phases of the game": under Defense, we keep "Defensive Back," "Defensive Line," and "Linebacker." Under Offense, we keep "Offensive Line," "Receiver," and "Running Back." Quarterback and Special Teams get their own categories, the former of which could easily slot under Offense, but might be worth considering as its own entity due to its sheer impact on a team's performance. Here's what our DUKLIB.RECRUITING_RAW table looks like:

We haven't consolidated any records yet, but we can see the Category attributed to each record, and we can see that not every team has data in every position. Air Force, as seen above, has no data on Quarterback recruiting. There's some irony and a piloting joke in there, I'm sure of it.

In our next step, we introduce DuckDB functions, which match many of the push-down functions that PROC SQL can leverage in-database with different data sources. We are using explicit passthrough here to leverage DuckDB's functions themselves but note that an implicit passthrough query could leverage many of these same functions.

We've now consolidated our data into a maximum of four records per team: Defense, Offense, Quarterback, and Special_Teams using the sum() functions in DuckDB, and leveraging our GROUP BY and ORDER BY statements to keep everything lined up nicely.

With this consolidation done, we can introduce the highlight DuckDB feature of this article, something that can't be done using ANSI Standard PROC SQL: the PIVOT statement. The PIVOT statement in DuckDB, in conjunction with DuckDB's coalesce() function, allows us to restructure our data in one SQL statement, combining multiple records into multiple columns, grouped by the team.

Our newest table, DUKLIB.RECRUITING_PIVOT, now contains just a single record per team, with clearly named columns defining the number of commits, total rating, and (more importantly) average rating in each major category of the game.

The PIVOT statement is a powerful piece of logic that is now accessible to all SAS Viya programmers working with the SAS/ACCESS Engine to DuckDB. It's only one of many benefits that the newly introduced ACCESS Engine brings to the programmer experience, but it highlights SAS' paradigm of meeting your data where it is.

In my next article, I'll highlight more neat features that PROC SQL can take advantage of with DuckDB, and I'll engage a few more of our football datasets, updated for the end of the regular season, remarking along the way on how drastically the 2025 season's results have deviated from the expectations. In the meantime, you can learn more about SAS/ACCESS to DuckDB in SAS Viya 4 here.

Thanks for reading, and I'll see you in ten yards.

Learn more

Smarter Access to Open Data: Introducing SAS/ACCESS to DuckDB

Speeding Up Workloads with Ducks(?!): Case Study of New DuckDB ACCESS Engine

The Quack is Back: SAS/ACCESS Meets DuckDB

TouchDuck! Exploring unique features of SAS/ACCESS to DuckDB through college football was published on SAS Users.

Getting Started with Job Scheduling in SAS Viya

Kevin Bickford — Thu, 04 Dec 2025 14:44:09 +0000

Getting Started with Job Scheduling in SAS Viya

If you have ever had to manually trigger Viya jobs, you know the drill: it is tedious, and one forgotten click can throw everything off.

That is where SAS Viya job scheduling comes in. It lets you automate your programs, data loads, reports, and analytics workflows so they run exactly when you need them, no babysitting required.

What is a Job in Viya?

Think of a job as any piece of SAS work you want to run, such as a program, a data plan, or even a Visual Analytics report refresh. Once it is saved in Viya (for example, from SAS Studio or Data Studio), you can schedule it to run automatically at a set time or on a recurring basis.

The first step is to write your code. In this example, I am using SAS Studio to create SAS code that will be submitted to Viya CAS for execution.

After saving your job in your Viya content folder, open the More menu (the three-dot icon). From the list of options, select Schedule as a Job to create a scheduled job from your code.

After selecting Schedule as a Job, the New Trigger dialog box appears. From this dialog, choose the Frequency (how often the job should run) and the Interval (how often that frequency occurs).

Next, specify the Start Time for when you want the job to begin running.

Select the appropriate Time Zone.

Then, set the Start Date for when the schedule should take effect.

Finally, choose an End Date to define how long the job should remain scheduled.

To verify that your job has completed, open Environment Manager and navigate to the Jobs and Flows page. You should see your job listed there, and under the Scheduled column, a blue clock icon indicates that the job has an active schedule.

Scheduling Made Simple

SAS Viya’s Scheduling page (inside Environment Manager) is your main hub for automating jobs. From here, you can run a job immediately, set it to run daily, weekly, or yearly, and manage existing schedules — view, edit, disable, or delete them. You can even run a job as a different user (if you are an admin), which is handy when jobs need specific credentials. Currently, the only trigger type is time-based, so it is all about scheduling by the clock.
Environment Manager is also where you will need to go to edit any jobs you have scheduled using visual tools (Visual Analytics, Visual Statistics, Visual Data Mining and Machine learning, Visual Forecasting and SAS Studio).

The first step is to create a job from the SAS Code/Program you want to automate. Below are the steps to create a job in Environment Manager

Navigate to Environment Manager by selecting the more menu from the Viya Welcome page.

Then select Manage Environment

This is the Administrator’s view of Viya Environment Manager.

Navigate to the Jobs and Flows page. By default, the page will open in the Monitoring view. You will want to select the Scheduling View.

Select New -> Job from the pulldown menu.

The New Job window will pop up. Fill in a Name you want give your Job.

Select the location of the program you want to create a job for.

Navigate the folder of the Program you want to schedule.

Select OK.

Now that you have completed the job creation steps, you will want to schedule the job.

How to Schedule a Job in SAS Viya

Here is how it works in a nutshell:
Open SAS Environment Manager and go to the Jobs and Flows page. Select the Schedule View, Select the job you want to automate, right Click on the Job and Select Schedule.

If you have Admin credentials and you want to run this scheduled job as another user, then select the icon on the “Run As” this will bring you to the list of available users.

Select the user’s name you want to schedule and run this job for. This will only work if you have admin credentials. Once you have selected the id then select OK.

Now you can pick a time to schedule your job for, select the + (plus sign).

Name: Daily News Digest Trigger

(A descriptive name for the time-based trigger to easily identify it in the scheduler.)

Frequency: Daily

(The job will run daily.)

Interval: 1 Day

(It occurs every 1 day, meaning once per day.)

Run Time: 08:00 AM

(The job starts at 8:00 AM local time for morning execution.)

Time Zone: America/New York (Eastern Standard Time)

(Suitable for US East Coast operations; adjust if targeting a different region.)

Start Date:

End Date: you can select an exact date or never end the scheduled job.

(Runs through the end date period; set to a far-future date like "Never" if indefinite.)

This setup would trigger the job every day at the time you selected as well as the time zone AM EST, starting tomorrow and will run every week until you cancel the job. When complete, select save.

To verify your job has been scheduled, locate your job in the list of Jobs and Flows and verify that there is a blue clock icon next to the job under the Scheduled column.

Monitoring Your Jobs

Once your jobs are scheduled, you can track their progress right in the interface. View (visualize when jobs ran and whether they succeeded or failed). You can zoom in by time and even filter by job name or date range.

Going Beyond Jobs: What Are Flows?

Sometimes when scheduling a job, a date and time trigger just is not enough. That is where job flows come in. A flow lets you connect multiple jobs together with conditions, so they run in sequence. For example, Job A loads data and Job B run analytics. Job B will only run after Job A finishes successfully. You can even build logic into your flows using AND /OR gates — it is like chaining tasks together into a smart, automated process.

Job flows are essential for enterprise-level scheduling, enabling reliable, repeatable automation of data processing pipelines, ETL (Extract, Transform, Load) operations, reporting, and more. They go beyond single job scheduling by handling interdependencies and error handling.

Components of Job Flow

A flow can include:

Jobs (SAS programs, data plans, imports)

Job Flows Dependencies and connections (nested flows)

Logic Gates (AND, OR conditions)

Time Events (scheduled triggers)

Command Line Actions (bash scripts or external commands)

Below is an example of a job flow created in the Jobs and Flows page of a Viya Environment Manager. This flow includes five jobs — a mix of SAS code, Data Explorer import tasks, and a Data Plan project — connected by two logic gates (one AND and one OR) and multiple connectors. The connectors (arrows) define job dependencies and determine the execution order. In this example, Jobs A, B, and C run at the same time when the flow starts. Job D runs only after Job C completes successfully, and Job E runs when Job D succeeds and either Job A or Job B finishes successfully.

You can also include command-line action as part of a job flow. For example, this allows you to launch a SAS program in batch mode or run a Bash script directly from within the flow. Command-line actions can be saved to the Saved Actions area for reuse in other flows, making it easy to build on previous work. However, saved actions are user-specific and cannot be shared between different users.

This is an example of using the SAS Viya Command line as an Action in a Job Flow.

This is an example of using the SAS Viya Command line as an Action in a Job Flow.

This is an example of using A Lunix Bash command line as an Action in a Job Flow.

Creating a Job Flow

In Environment Manager navigate to the Jobs and Flow page, you will want to be in the Scheduling View. Just like when we create a new Job, we want to select New and from the list select Job Flow.

In the New Job Flow window, select objects that you want to add to the flow in the tree view and drag them to the flow editor.

In the New Job Flow window, select objects that you want to add to the flow in the tree view and drag them to the flow editor.

As you add objects to the flow, make connections between the objects to specify the sequence of the flow.

Click on the right side of the first object in the order and drag a line to the right side of the next object in the order.

Each object (other than gates) can have only one input connection and one output connection.

Objects change positions in the window automatically as you make connections.

Adding an And Gate.

Connect them in the order you want each job to run.

In this example, this is the order the jobs need to run; the first two jobs will be executed first and if they both run successfully then the last job will run.

In this example, you can see additional options available in the right-hand menu. The options displayed depend on the section of the flow that you select, allowing you to customize and control various aspects of the flow’s functionality.

You can see additional options available on the right-hand menu. Once you have made all your selections you can save your flow.

Once Saved, you can now schedule your flow jut like you scheduled your job.

For more information on Scheduling in Viya please visit this link SAS Help Center: SAS Viya Platform: Jobs and Flows.

Command Line Scheduling (For Power Users)

Prefer the command line? SAS Viya lets you run and schedule jobs using the CLI (Command Line Interface). You can create the same time-based triggers — down to minutes, hours, days, weeks, or months — right from your terminal. This is great for admins who like automation scripts or want to integrate Viya jobs into broader IT workflows.

For more information on scheduling Viya jobs using the command line click here SAS Help Center: Jobs and Flows: How To (CLI).

Getting Started with Job Scheduling in SAS Viya was published on SAS Users.

SAS Viya Workload Management

Kevin Bickford — Wed, 12 Nov 2025 17:13:33 +0000

What is SAS Viya with Workload Management

Managing workloads in modern analytics environments is not keeping systems running, it’s about making sure the right jobs get the right resources at the right time. As organizations move analytics to the cloud, powered by Kubernetes, balancing workloads across computer resources becomes a critical challenge.

That’s where SAS Viya Workload Management (WLM) comes in. By extending Kubernetes’ native scheduling and workload orchestration, SAS Workload Management ensures jobs run efficiently, users get predictable performance, and administrators maintain visibility and control.

This post, will break down what SAS Workload Management is, how it works, and why it’s a game-changer for SAS Viya environments.

What is SAS Workload Management?

SAS Workload Management is a framework that distributes and balances SAS computing tasks across Kubernetes clusters. It builds on Kubernetes’ workload orchestration but adds intelligence for SAS use cases.

With Workload Management, administrators can:

Define job priorities by user, group, or workload type.

Optimize compute resource usage.

Prevent system overload.

Ensure that critical work gets done quickly.

The result is a smarter, policy-driven approach to managing complex analytics environments.

Key Features of SAS Workload Management

Centralized Management: Administrators can centrally define policies that align with business goals. For example, critical users or workloads can be prioritized to ensure they always receive the resources they need. Policies also extend monitoring capabilities beyond Kubernetes admins—SAS administrators and even end-users can view and track their own jobs.

Multi-User Workload Balancing: In multi-user environments, job distribution is crucial. SAS Workload Management automatically directs jobs to the best available host, preventing bottlenecks. By avoiding situations where too many jobs run on a single node, the system ensures timely job completion across all users.

Parallelized Workload Execution: Traditional SAS programs are executed line by line. However, many steps in workflow are independent and can be executed simultaneously. With Workload Management, independent steps can be run as separate jobs in parallel, significantly reducing execution time compared to serial processing.

Enterprise Scheduling: While Kubernetes offers basic scheduling, SAS Workload Management introduces policy-driven scheduling. Jobs are not just queued, they’re executed on the most appropriate host, increasing the likelihood of on-time completion.

High Availability & Scalability: SAS Workload Management leverages Kubernetes to provide resiliency and elasticity. Jobs can automatically restart from their last checkpoint, ensuring business continuity even after interruptions. Additionally, clusters can scale up or down dynamically based on demand.

Monitoring & Administration: Administrators aren’t left in the dark. Through the SAS Environment Manager and dashboards in Grafana, SAS Workload Management offers detailed monitoring of jobs, queues, and hosts. Unlike Kubernetes’ general metrics, these dashboards deliver SAS-specific insights.

Administration Tools

Administrators can interact with SAS Workload Management through two main tools:

SAS Environment Manager: Provides a graphical interface to monitor and manage jobs, queues, and hosts. Both administrators and users can access dashboards tailored to their roles. SAS Workload Management | SAS

Command Line Interface (CLI): Offers flexibility for automation and scripting. Administrators can use CLI plugins to manage workloads remotely, making it easy to integrate into existing DevOps processes. SAS Viya Platform: Using the Command-Line Interface

Components of SAS Workload Management

These are the key components:

Workload Orchestrator: The central brain that manages, monitors, and collects data on jobs and hosts.

Jobs: Units of work submitted by users. Each job has states like RUNNING or COMPLETED and runs in its own Kubernetes pod.

Queues: Containers where jobs wait before being dispatched. Queues are governed by policies and priorities.

Policies: Rules that control how workloads are dispatched, ensuring resources are allocated fairly and strategically.

Hosts: Kubernetes nodes where jobs run. Each host has limits on how many jobs it can process concurrently.

Together, these components create a dynamic, policy-driven environment for managing SAS workloads.

Workload Manager Dashboard page

Workload Manager Jobs page

Workload Manager Queue Page

Workload Manager Host Page

Workload Manager Logs page

Workload Manager Configuration page

Workload Manager Log Levels page

How SAS Workload Management Works.

Here is what happens behind the scenes when a SAS job is submitted:

A user submits a job through a service like SAS Studio.

The job is sent to a queue, where it is prioritized relative to other jobs.

The Workload Orchestrator evaluates available Kubernetes nodes and selectsthe best host based on resources and policies.

The job is dispatched to the chosen host and executed in a pod.

If resources are unavailable, the job waits or in some cases, a lower-priority job may be preempted.

Once complete, the pod is shut down, freeing resources for future jobs.

Flow of how Workload Manager works:

Job Lifecycle

This is an example of the life cycle of a Viya Job once it is submitted to Workload manager.

High-level flow

This is an example of a Viya Job flow once it is submitted to Workload manager.

SAS Viya Workload Management vs. SAS 9 Grid Manager

If you have SAS 9.4 Grid Manager, this may all sound familiar. SAS 9.4 Grid Manager and Viya Workload Manager share the same fundamental mission, to orchestrate, balance and prioritize SAS Jobs across SAS computing resources.

The main difference is that SAS 9.4 Grid Manager was built for traditional on premises clusters, while Viya 4 Workload Manager is Kubernetes-native, making it more flexible for cloud and hybrid deployments.

For organizations migrating from SAS 9 Grid Manager, Workload Management in SAS Viya will feel familiar but more modern.

Both systems balance workloads and manage resources.

SAS Viya WLM leverages Kubernetes, making it more flexible and scalable in cloud-native environments.

The interface and functionality are designed to ease migration, maintaining a sense of continuity for administrators.

In short, SAS Viya Workload Management is the next step in workload orchestration—built for today’s hybrid and cloud-native analytics environments.

Below is a table for comparison between SAS 9.4 Grid Manager and SAS Viya 4 Workload Manager.

Category
SAS 9.4 Grid Manager
SAS Viya 4 Workload Management
Similarity / Difference

Core Purpose
Distributes SAS jobs across a cluster of servers
Distributes SAS jobs across Kubernetes nodes
Similar – both provide workload distribution

Architecture
Built on traditional cluster/grid infrastructure
Cloud-native, Kubernetes-based orchestration
Difference

Job Distribution
Uses LSF (Load Sharing Facility) or 9.4 (SWO) schedulers to distribute jobs
Uses SAS Workload Orchestrator (SWO) integrated with Kubernetes
Similar in concept, different implementation

Queues & Policies
Jobs submitted to queues, policies define priorities
Jobs submitted to queues, policies define priorities
Similar

Administration Tools
SAS Management Console (Grid Manager plug-in)
SAS Environment Manager, CLI, Grafana dashboards
Difference

Monitoring
Admins monitor jobs, queues, and nodes via Grid tools
Admins & users monitor jobs, queues, and hosts with Environment Manager
Similar, difference in tooling

User Access
Centralized control, mostly admin-focused
Broader monitoring: admins, queue admins, end-users can view their own jobs
Similar but Viya is more user-friendly

Parallelization
Supports splitting jobs into multiple sub-tasks for parallel execution
Supports parallelized workloads (independent steps/jobs run in separate pods)
Similar

High Availability
Provides job recovery/failover in case of node failure
Kubernetes-native HA: restarts jobs, Auto scales cluster, ensures continuity
Similar, Viya has stronger cloud-native resiliency

Scalability
Limited by physical cluster resources
Kubernetes allows elastic scaling (horizontal/vertical)
Difference

Deployment Model
Primarily on-premises clusters
Cloud, hybrid, or on-premises with Kubernetes
Difference

Target Users
Enterprises running SAS 9 workloads on grid
Enterprises moving to SAS Viya cloud-native analytics
Similar audience, different platform maturity

CAS vs Workload Manager

SAS Viya CAS (Cloud Analytic Service) Think of CAS as a turbo engine bult to crunch huge datasets quickly using in-memory computing. When you use Viya Visual tools such as Visual Analytics, Model Studio, or CAS actions, it takes advantage of CAS.

CAS is the Viya engine where heavy analytics work happens.

CAS loads your data into memory so it can process your jobs fast.

It runs calculations, models, and reports.

The more workers nodes you have, the faster it can run your jobs.

Workload Manager is like a traffic controller or project manager for SAS Jobs that takes advantage of the Compute processing using traditional CPU-based SAS code job execution.

It decides which jobs run first, and which ones wait.

It makes sure no single user or job hogs the system.

It manages queues, priorities, and resources so everything runs smoothly.

It is built on Kubernetes, which means it can scale up or down automatically in the cloud.

Real-World Benefits

Implementing SAS Workload Management translates into measurable improvements:

Efficiency: Resources are used optimally, reducing wasted capacity.

Performance: Critical jobs finish faster with prioritization and parallel execution.

Transparency: Administrators and users alike can monitor workloads in real time.

Resiliency: Built-in high availability keeps business processes running smoothly.

Conclusion

As analytics environments grow in complexity, managing workloads effectively is no longer optional—it’s essential. SAS Viya Workload Management provides the tools and intelligence needed to ensure that jobs are prioritized, resources are optimized, and business-critical tasks are completed without interruption.

Whether you’re migrating from SAS 9 Grid Manager or starting fresh with SAS Viya, Workload Management offers a future-ready solution to workload orchestration.

Since SAS Viya Workload Manager is now standard with Viya 4, now is the time to explore how Viya 4 Workload Management can help you get more value out of your analytics environment.

SAS Viya Workload Management was published on SAS Users.

Quickly and easily register Python models in SAS Viya Workbench

Stu Sztukowski — Tue, 30 Sep 2025 13:53:14 +0000

It’s done. It’s finally done. After many caffeine-fueled late nights and countless hours of feature engineering, you’ve built the ultimate model. But the journey isn’t over. Not quite yet. There’s one thing left for you to do: get that model into production and let your ModelOps team take the reins. That’s where SAS Model Manager shines. While there are many ways to deploy a model built in SAS Viya Workbench (even if you have no other SAS products), using SAS Model Manager to deploy a model within a SAS Viya environment is an excellent option and has many benefits. In this article, we’ll do a deep dive into how to do that.

When you’re in SAS Viya Workbench, you have access to powerful computer environments for complex experimentation that your thin client laptop just can’t handle. But that doesn’t mean it lives on its own isolated island. Once your model is ready to go, you need a bridge to SAS Viya to get it into SAS Model Manager. For SAS models, that’s PROC REGISTERMODEL. For open-source models, that’s sasctl, SAS’s open-source package with flavors in both Python and R designed to handle everything you need to register your model with SAS Viya.

With all the tools we have in our arsenal, let’s walk through a quick example of building, registering, and testing an XGBoost model that predicts loan defaults in the classic HMEQ dataset.

Hello, I’m SAS Viya Workbench. Can you hear me?

If you’ve ever registered a Python model from your own machine into SAS Model Manager, you’ll be happy to know that there’s no difference with SAS Viya Workbench. The same code that you would use on your local machine works the same way in SAS Viya Workbench. All you need to do is make sure that SAS Viya Workbench can talk with your SAS Viya Server.

The first thing you need to do is run a quick test. Run the code snippet below:

import requests host = 'https://my-viya-server.com' resp = requests.get(f'{host}/SASLogon', verify=False) if resp.status_code == 200: print('Status: 200. You can successfully communicate with the server.') else: print("WARNING: Received a non-200 status code:", resp.status_code)

If you receive no connection errors and a 200-status code, you’re all set: SAS Viya Workbench can talk with your SAS Viya server. If not, ask your administrator to allow a connection from SAS Viya Workbench to your SAS Viya server. And if all else fails, you still have the option to download your model as a zip file and upload it to SAS Model Manager – we’ll talk about that later.

If you’re already up to speed on exactly how to register a Python model to SAS Model Manager, then you can stop here. Seriously: that’s how seamless SAS Viya Workbench is. But if you’d like a refresher, a bit more explanation, and a notebook you can run yourself, then keep on reading.

Let’s register

We are going to use the pzmm (Python Zip Model Management) package from sasctl to do the heavy lifting. SAS Model Manager expects certain files and score code. You can technically write all of them yourself, but it’s much easier to let pzmm do it for you. Let’s set the stage with the model we’ll be registering.

The model

I’m going to intentionally skip past the model building details and get straight to what you’re here for: what is it that I need to do to register my Python model to SAS Viya? Just so you have a bit more context, here’s everything you need to know:

Our data is HMEQ

We’re predicting bad loans using the column "BAD"

We’ve built an XGBoost model with early stopping based on validation data

The model is:

xgb_model = xgb.XGBClassifier( objective="binary:logistic", random_state=42, n_estimators=xgb_eval.get_booster().best_iteration + 1 )

Now that you’ve got the background, let’s talk turkey.

It’s all about the prep work

SAS Model Manager requires specific information about your model that pzmm formats and organizes for you. You’ll need:

The name of the model

A description of the model

The project name for SAS Model Manager

The modeler’s name or user ID

The model instance to be pickled

The data used to train the model

The expected input column names

The target variable name

The expected target output values

The name of the target output variables

Where the model files should be saved

To keep it tidy and manageable, I like to put all of this into a single place:

prefix = 'XGBoost' # Model name model_desc = "XGBoost model for hmeq" # Model description project = "HMEQ Models" # Name of project modeler = input('Enter modeler username') model = xgb_model # Model instance data = df_hmeq # Data for model inputs = X.columns # Input columns target = 'BAD' # Target variable target_values = ["0", "1"] # Target values: 0/1 for HMEQ target_cols = ["EM_CLASSIFICATION", "EM_EVENTPROBABILITY"] # Model output variables model_path = '/workspaces/myfolder/models' # Path to model files

Once you have all of this, you can invoke the features of pzmm to automatically convert it into files that SAS Model Manager can use. Let’s take a look at each step.

Step 1: Pickle it

pzmm has a built-in method, pickle_trained_model, which will pickle your model and put it in same folder you’ll be writing all the other files to.

pzmm.PickleModel.pickle_trained_model( model_prefix=prefix, trained_model=model, pickle_path=model_path )

That’s all there is to it!

Step 2: Define inputs and outputs

SAS Model Manager needs to know what the model inputs and outputs are to correctly score data. A single method does this: write_var_json. As you might expect from the name, it saves all this information to JSON files.

Model Inputs

pzmm.JSONFiles.write_var_json( input_data=data[inputs], is_input=True, json_path=model_path )

Model Outputs

output_var = pd.DataFrame(columns=target_cols, data=[["A", 0.5]]) pzmm.JSONFiles.write_var_json( output_var, is_input=False, json_path=model_path )

Note that the output data might seem a little strange at first. What we’re doing is creating a DataFrame with (1) the expected output variable names and (2) examples of the expected output types. We’re using “A” and “0.5” as examples to indicate that the first column (EM_CLASSIFICATION) will be labels while the second column (EM_EVENTPROBABILITY) will be probabilities. If you open up outputVar.json, you’ll see it automatically interpreted the example output values of “A” and “0.5” to be strings and intervals:

[ { "name": "EM_CLASSIFICATION", "level": "nominal", "type": "string", "length": 1 }, { "name": "EM_EVENTPROBABILITY", "level": "interval", "type": "decimal", "length": 8 } ]

Are you wondering why we’re using EM_CLASSIFICATION and EM_EVENTPROBABILITY as the variable names? Seasoned SAS veterans already know, but I’ll leave that as a history lesson for you. Leave a comment if you know!

Step 3: What file does what?

You can upload a lot of different types of files into SAS Model Manager, and of the things in there, it needs to know exactly which ones will allow it to score your data. That’s where write_file_metadata_json comes in:

pzmm.JSONFiles.write_file_metadata_json(model_prefix=prefix, json_path=model_path)

If you look at the output file, file_metadata.json, it gives a role to each file:

[ { "role": "inputVariables", "name": "inputVar.json" }, { "role": "outputVariables", "name": "outputVar.json" }, { "role": "score", "name": "score_XGBoost.py" }, { "role": "scoreResource", "name": "XGBoost.pickle" } ]

This step is critical: skipping it means that the Model Manager won’t know what each file is supposed to be used for, and it will be unable to score your data.

Step 4: Build model properties

Finally, as the last step before registering, you need to write out the model properties. This gives SAS Model Manager information about the model’s name, the Python version, algorithm, and more. write_model_properties_json does this for you.

pzmm.JSONFiles.write_model_properties_json( model_name=prefix, target_variable=target, # Target variable to make predictions about (BAD in this case) target_values=target_values, # Possible values for the target variable (0 or 1 for binary classification of BAD) json_path=model_path, # Where are all the JSON files? model_desc=model_desc, # Describe the model model_algorithm="Ensemble", # What kind of algorithm is it? modeler=modeler # Who made the model? )

If you take a look at ModelProperties.json, you can view what SAS Model Manager will add to the model properties once you register it.

{ "name": "XGBoost", "description": "XGBoost model for hmeq", "scoreCodeType": "python", "trainTable": "", "trainCodeType": "Python", "algorithm": "Ensemble", "function": "classification", "targetVariable": "BAD", "targetEvent": "1", "targetLevel": "Binary", "eventProbVar": "P_1", "modeler": "Stu Sztukowski ", "tool": "Python 3", "toolVersion": "3.11.13", "properties": [] }

Step 5: Connect to SAS Viya

Use the Session function of sasctl to start a session on your SAS Viya server. You have a few different authentication methods available:

Username/password

Client ID and secret

OAuth token

Your administrator can help you figure out the appropriate authentication method. Here is one example of how you might connect using a username and password:

sess = Session( 'https://my-viya-server.com', username=input('Enter username'), password=getpass.getpass('Enter password'), protocol='https', verify_ssl=False )

Step 6: Register your model

You’re now ready to register your model to SAS Viya. Use import_model from pzmm to zip up all the files and register your model automatically. When you do this, lots is happening behind the scenes: a project is built, model files are loaded, and a plethora of checks are done to ensure that it all is going as planned.

pzmm.ImportModel.import_model( model_files = model_path, # Where are the model files? model_prefix = prefix, # What is the model name? project = project, # What is the project name? input_data = X, # What does example input data look like? predict_method = [xgb_model.predict_proba, [int, int]], # What is the predict method and what does it return? overwrite_model= True, # Overwrite the model if it already exists? score_metrics = target_cols, # What are the output variables? target_values = target_values, # What are the expected values of the target variable? target_index = 1, # What is the index of the target value in target_values? model_file_name= prefix + ".pickle", # How was the model file serialized? missing_values = True # Does the data include missing values? )

If successful, you will see the following message:

The model was successfully imported into SAS Model Manager as XGBoost with the following UUID: {UUID}.

Log in to SAS Model Manager and open the Projects page. Sort it in descending order by modified time and you’ll see your project at the top:

Open the project and you’ll see your model:

But what if I can’t connect to SAS Viya?

You still have an option available. import_model is also a wrapper for two methods that you can run yourself:

write_score_code

zip_files

Running both will generate your score code and zip up everything into a nice package that you can then download to your machine and manually import into Model Manager through the UI.

pzmm.ScoreCode.write_score_code( model_files = model_path, # Where are the model files? score_code_path= model_path, model_prefix = prefix, # What is the model name? project = project, # What is the project name? input_data = X, # What does example input data look like? predict_method = [xgb_model.predict_proba, [int, int]], # What is the predict method and what does it return? overwrite_model= True, # Overwrite the model if it already exists? score_metrics = target_cols, # What are the output variables? target_values = target_values, # What are the expected values of the target variable? target_index = 1, # What is the index of the target value in target_values? model_file_name= prefix + ".pickle", # How was the model file serialized? missing_values = True, # Does the data include missing values? is_viya4 = True ) pzmm.ZipModel.zip_files( model_files = model_path, # Where are the model files? model_prefix = prefix, # What is the model name? is_viya4 = True # Set to False if Viya 3.5 or earlier )

Simply download the zip file to your machine, then log into SAS Model Manager and create a new modeling project. Open it up and select Add Models. From there, you can import your zip file.

Now you know where the zip part comes from in Python Zip Model Management!

Run a scoring test

It’s a good idea to confirm that your model will run successfully and that SAS Viya isn’t missing any packages that the model needs. Running a scoring test on the model is a quick way to find out. You can run a new test from your project’s Scoring tab. Click the New Test button and fill out the form. Make sure that the data you upload is in the exact format that your model expects.

The test page will show you if it was successful. You can dive further into the results to confirm that it is working by looking at the log and output.

If it all looks good, then you’re done! Your model is now ready to be published to a multitude of destinations and put to work.

There’s no place like home

One of the most amazing things about SAS Viya Workbench is how you can spin up powerful environments in seconds yet still have that feeling of running code directly from your laptop. If you’ve been building Python models and registering them into SAS Model Manager already, then you already know how to do it on SAS Viya Workbench. It really is that easy. But we’ve just covered the basics. There’s so much more you can do, like build Model Cards, enable model performance monitoring, and loads of other features you need for the last mile of AI.

What models do you find yourself working with most? How do you use Model Manager with your Python models? Let me know in the comments below!

Links

Example Notebook: Registering Open Source Models to SAS Viya from SAS Viya Workbench

Model Management Resources for Open-Source Models

Open Source Models in the SAS Viya Platform

Registering MLFlow Models to SAS Model Manager using sasctl: A Comprehensive Guide

sasctl and pzmm Examples

sasctl and pzmm API Reference

Quickly and easily register Python models in SAS Viya Workbench was published on SAS Users.

Building seamless data pipelines across multiple languages: A life sciences and healthcare use case leveraging R and SAS

Emmett Smith — Thu, 07 Aug 2025 16:45:36 +0000

Managing multiple programming languages in a data science workflow often means jumping from one environment to another—adding friction to already complex processes. This slows down collaboration and innovation among teams. But what if there were a way to remove this friction between environments? Working in a single environment that supports multiple coding languages helps give teams time back for development, rather than managing tools. For example, being able to run Python, R, and SAS together in one environment eliminates the need to switch between two or three different platforms and allows experts to work in the language they’re most comfortable with.

In Life Sciences and Healthcare, working across different tools often means switching platforms or duplicating work. In this article, we explore how combining multiple programming languages within the same workflow can help reduce that friction—using SAS for data cleaning and exploration, and R for modeling and app development.

Step 1: Clean and explore data

The first thing you want to do is load your data. For this case, I’m using a dataset about classifying heart disease. This dataset includes 14 variables selected by researchers to describe each patient and their current health. The original researchers had over 76 features and chose 13 to build linear regressions and feed into an algorithm based on Bayes Theorem. The study is available online if you're interested in reading more about it.

In a SAS Notebook, I first import the data before jumping into data cleaning by checking for missing values and replacing them with a relevant value. In this case, we see that only ca has missing values—this variable represents the number of major vessels visible during fluoroscopy. It ranges from 0 to 3, and if values are missing, we assume the test wasn’t performed on that patient, so we assign it a value of 0.

Next, I checked to make sure that there is an even distribution of data between our two classes by plotting our target variable with a pie chart. Neither class has an overwhelming presence in the dataset, which means there is no need to balance the dataset. This means that our quick data cleaning and exploration is done and that we can move over into our modeling in R.

Step 2: Transfer data into R and train gradient boosting model

This step can be done in a couple of different ways. The first is a bit more roundabout—exporting the .sas7bdat files to a CSV and then reading that CSV into an R notebook. The second option is to download the haven library in R and use the read_sas() function to bring the data directly into R. I went with the second option before creating training and test datasets to train a gradient boosting model using the xgboost library.

Step 3: Display model performance with Shiny app

We can now evaluate how well our model performed by gathering metrics like Accuracy, Recall, Precision, and F1 Score. Once these are calculated, we can create a simple Shiny app that allows users to choose which metric they want to view. This app can be as complex as you'd like—with interactive charts—or as simple as a drop-down menu.

Conclusion

Bringing together the strengths of SAS and R creates a powerful, end-to-end data science workflow. SAS excels at data cleaning, transformation, and preparation—especially when working with large, structured datasets in enterprise environments. Once the data is clean and ready, R adds value as a powerful tool for building machine learning models using packages like xgboost. By integrating a Shiny app into the workflow, results become easily accessible and interactive. This seamless pipeline empowers teams to make data-driven decisions with clarity and confidence.

All this to say, there are three main takeaways from using a multi-language environment:

Choosing the best tool for the job: Each coding language can be used to excel in different areas of the pipeline.

Collaborate across unique skill sets: Developers from different language backgrounds can bring their expertise to build dynamic teams. Each team can communicate seamlessly between components.

Adapt to evolving tech trends: Each language brings new innovations, and by including multiple languages together, we can stay competitive and future-ready.

Ultimately, a multi-language environment empowers developers to collaborate on scalable solutions faster and smarter.

Learn more

Python ML pipelines with Scikit-learn: A beginner’s guide

CI/CD for Python and SAS: Build modern workflows with GitHub Actions

Unlocking the power of SAS Procedures

Building seamless data pipelines across multiple languages: A life sciences and healthcare use case leveraging R and SAS was published on SAS Users.

CI/CD for Python and SAS: Build modern workflows with GitHub Actions

Sean Ford — Fri, 01 Aug 2025 17:05:07 +0000

You have years of legacy SAS code dating back to the time of your great-great-great-great grandparents (okay, SAS hasn’t been around quite that long). But, you want to automate your production jobs on your server and get your company into the modern era with version control and things like CI/CD. Oh, almost forgot, half your developers now want to include Python in your processes, right? So, you’ll need to be able to handle SAS, Python, and a mechanism for CI/CD and automation.

Your SAS developers are most likely scattered around the company. They like to use desktop applications like SAS Enterprise Guide, with a handful in the cloud on SAS Viya Workbench or SAS Analytics Pro. They might even say SAS can’t be modernized – but it most definitely can! And it will work well with your old SAS server just as easily as with the modern SAS analytics platforms like SAS Viya while integrating smoothly with Python.

The Python devs, well… Python can be the Wild West – VS Code, PyCharm, Spyder, Jupyter… the list goes on and on. When developing across languages, I personally like to use Viya Workbench because I have instant access to on-demand compute and can run both SAS and Python in the same dev environment, plus handle a lot of other things like shell scripts and yml.

Side note – yml or yaml, what gives? Are they the same? Do I say “yammel” or “why em el” when speaking to people so I don’t sound like a noob? Since we’re going to be working with GitHub ((i.e., Microsoft)) you’re going to see .yml extensions here. For all intents and purposes, yaml and yml are the same. It’s like the tabs vs. spaces debate in the show Silicon Valley – hilarious, yet entirely pointless. If you want to dive down the rabbit hole, see this fun Stack Overflow conversation: Is it .yaml or .yml?

The example I’m laying out is not idle curiosity on my part, but one I’ve recently encountered with a company. I want to walk you through the steps we took to meet their needs, with a few additions and changes to help clarify how to modernize a legacy SAS workflow integrated with Python. The complete example code can be found at this GitHub repo.

The basic legacy process is shown above. The goal is to simplify this process by removing PuTTY and automating the script generation, while ensuring the production server automatically gets the most recent code. Because this is a legacy process using Enterprise Guide (EG) projects, we necessarily start there. However, the process outlined below will work just as well with code.

I’ll be using a slew of tools including EG, Viya Workbench, and GitHub to show you how you can collaborate across multiple SAS and Python development environments while taking advantage of automation features in GitHub via Actions and Workflows.

Export the EG project to SAS code

My example company here has been using projects within EG for many years and doesn’t want to lose their legacy work within those projects while moving forward using Python and SAS for advanced analytics; so we’re going to start our journey in EG. By the way, EG now integrates with both Viya and Viya Workbench. If you’re unfamiliar with EG, you can skip ahead since I’ll simply be exporting the project as raw SAS code and working from there.

As you can see, I have a relatively simple project with a few process flows using both code and tasks. I’m using this project out of a GitHub repository that has a host of other things, including my automation scripts and code. To ensure my project plays nice with a Git repository, I have made sure to enable the relative paths for the project: Properties → File References → Use paths relative…

I’m also using the integrated Git features of EG to manage my project in GitHub from within the tool when working on my project.

Since I am using legacy EG projects and don’t feel like pulling each piece of code out into its own file, I use the project level Share capability to Export All Code in Project.

Now I can continue to work with my EG Project, but collaborate with others using Viya Workbench or Viya through GitHub, and keep most of my legacy production process in place.

Let’s move into Viya Workbench now, which has both SAS and Python runtime environments and an IDE that can handle any programming language, which will come in handy here where we’re dealing with SAS, Python, shell (bash), and yml.

Integrate Python with the workflow

The team wants to begin leveraging Python for this workflow. SAS code can easily integrate with Python through system commands and proc Python, but in this case we’re going to take advantage of our GitHub action to run the necessary Python code and drop off data that our SAS code can pick up later. This approach has the advantage of letting the container running our action manage the Python package dependencies rather than having to manage those dependencies manually on our production server. Your IT team may either love you for this approach that doesn’t require them to maintain Python on the company server, or hate you because they are losing control.

Let’s add some Python code and test it out so we can see how our Python devs can join in the fun. To keep things simple, I’m going to create two files: one is a requirements file that will later inform our GitHub Action about the necessary Python dependencies, and the other contains the actual code.

And here are the contents of the files side-by-side. On the left is the requirements text file and the right is the Python script.

Your requirements file should contain all package dependencies needed by your Python code (in this case, just pandas and numpy). The code in our example here is overly simple, while in reality you could have anything from complex ETL code to machine learning models. You could even have serialized models here (i.e., pickled models, ONNX, etc.) that run before handing information back over to the workflow in the form of a file or database update.

Automate SAS scripts

Remember, my goal here is to allow my SAS developers to work in any environment they are comforable in, collaborate with Python users, and automate at least part of my clunky old legacy process. So, instead of manually creating new shell scripts to run on my production Linux server every time someone updates the main branch, I want to automate that through GitHub Actions so neither my SAS nor Python developers have to ever worry about creating shell scripts.

I want to create a script that will look in my code folder and generate individual scripts for each SAS code file. Each script will run the SAS program in batch and write the log to a file that matches the program name plus the date and time. I want the date and time to match when the code is run (not when the script was created). To help me test this in my EG installation on my Windows desktop, I also create a batch file that does the same thing.

Most of the first 20 or so lines of the generate script really belong in a config file to improve portability across systems. But for simplicity, everything is currently hard-coded into the script.

The main loop looks through the target directory and builds a script and batch file for each SAS code file found in the target directory. These get dropped off in the /scripts directory. To summarize:

$TARGET_DIR – user defined folder to look for SAS code.

-maxdepth 1 – only look in the current folder. Don’t recurse.

-type f – look for files (as opposed to directories).

-name “*.sas” – only include files that end with the 4 characters “.sas”.

Once the script is finished, I can run it within Viya Workbench to test that it does what I want. One thing to confirm is that the script is executable. Line 34 does this for us on each script it creates so we don’t have to worry about it later, but if you have security concerns, you could omit this line and run the command prior to runtime on your production server. You’ll need to make sure to run the same command on the generate_sh_script.sh file before running.

From the Terminal (just click on the Terminal tab at the bottom of Viya Workbench) navigate to the scripts folder in your repo and run the script:

>cd sas-workflow/scripts >chmod +x generate_sh_scripts.sh >./generate_sh_scripts.sh

Note that you need to run this command from within the /scripts folder. We’ll make sure to take care of that with our GitHub action later on.

Voila, I now have my scripts. The single script I created will automatically generate all the necessary scripts needed for automation. That’s great, but I want to be able to run this script automatically, regardless of who adds code to the repository, so that my production process always has the most up-to-date code.

Set up the GitHub Actions

GitHub Actions is a powerful tool that lets you automate nearly anything you need. In this case, I’m going to use an action to build my shell scripts that will be picked up by my production server. It’s also going to run that Python code to create data that may be needed by the rest of my process.

When I initially got going on this project, I started with a preconfigured simple workflow and then added in the custom components I needed. This is typically the easiest way to get started with a new workflow. Try it out by going to your repository and click the Actions tab. From here, you can create simple workflows using preconfigured templates, custom workflows, or a wide array of other actions already created by other developers within the GitHub marketplace. If you’re feeling particularly creative, you can even add your own to the marketplace.

Type “simple workflow” into the search and hit enter to find this easy initial workflow. Then commit it to your repo and GitHub will do the rest. You’ll notice it puts a .yml in a new folder called “.Github/workflows”. This is the default path in which GitHub looks for actions to perform. You don’t need to do anything else.

As I joked about earlier, the actual action is in a .yml (not .yaml) file. Once I create the basic template in GitHub for my workflow and push it to the repo, I can pull the repo into Viya Workbench and edit the file there. You could just as easily edit the yml file in GitHub, but I like the ease of Viya Workbench with its full Visual Studio Code IDE and the ability to use other extensions I installed on VS Code to do things like automatic linting so I don’t mess up the syntax.

Let’s peruse the action.yml file and note the things I added/modified from the simple workflow GitHub provides in the marketplace.

One of the most important components is the trigger, which tells GitHub under what conditions to actually run this action. The trigger I selected is any push to the main branch. There are a lot of options that let you get really specific with when you want to run the action. In our case, it might have been a good idea to use the paths option since we only want to recreate our scripts every time our SAS code is updated.

on: push: paths: - 'sub-project/**' - '!sub-project/docs/**

In the jobs section, note the runs-on section which specifies the OS to use. It also specifies the working directory. Remember above where we had to navigate to the scripts directory to run the generate script? This tells the container to nagivate to that same directory and work from there.

And now for the fun stuff. Within the steps section, you need to use name/uses or name/run pairs. Here, you can write custom commands similar to those you'd use in the command line, or leverage actions available in the ever-growing GitHub Actions marketplace. I use both approaches here so you can see how they work.

To summarize the first handful of steps which set up our container:

Line 26 – checkout the repo, using a predefined action from the marketplace.

Line 30 – set up the Python environment. Specify Python version 3.11.11.

Line 34 – install required Python dependencies listed in the requirements file within the repo.

The next section of steps runs the desired commands and updates the repo:

Line 38 – runs the generate script.

Line 42 – runs the Python data generation script.

Line 46/47 – this is important: tell the environment to use the default GITHUB_TOKEN. Without this, you won’t be able to push back to your repo.

Line 48 – Notice the pipe “|” which lets you run multiple lines.

Line 49/50 – this is required. For an auto action like this, I like to call the user some sort of bot and leave the email blank so it’s clear this was an automated action.

Line 51-53 – Git commands to add/commit/push.

And that’s it! This is a relatively straightforward action, but feel free to explore the documentation for additional options.

One final gotcha: if you push this updated file as-is, the action will fail. That’s because it tries to push the new files back to the repository but doesn’t have the necessary permissions.

In a Viya Workbench terminal, type:

>Git add –all >Git commit -m “added Github workflow” >Git push

Clicking on the attempt and looking at the failed action, we can see there’s a Permission Denied error on the add/commit/push step (line 45 in the actions.yml file).

To allow GitHub to push the changes made by the scripts back to the repo, you’ll need to make a change in your GitHub settings for your repo. Under the repo’s Settings tab, click on Actions → General and then scroll down to the Workflow Permissions section. Make sure you select the “Read and write permissions” radio button and save the changes.

There’s a note that this uses the default GITHUB_TOKEN. Remember, we specified in the actions.yml file (line 47) that the environment should use this token. You could set up your own more granular permissions and/or use something like a personal access token if you wish, but I’ll stick to the default GitHub token here for simplicity’s sake.

Now, on my production server, all I need is one script that is scheduled using cron. I don’t cover that here, but it controls the entire process instead of having to manually create scheduled actions for each script. This one script pulls my repository from GitHub, which contains all my code and scripts, and then runs each script in the desired order. I never have to remember to check for new code or rebuild scripts.

Summary

We started with a legacy SAS EG project where the process to get the code to run on the production server was completely manual. In this walkthrough, we automated much of the process using Viya Workbench and GitHub Actions. We also integrated Python into the process, enabling SAS and Python developers to collaborate through a shared GitHub repository. This setup combines elements from both environments and ensures the production process stays up to date automatically. This highlights how a legacy process can be quickly and easily modernized with no change to the original code in a way that will still work with open-source and more modern SAS programming and AI/ML techniques in SAS Viya Workbench and SAS Viya.

Learn more

Integrate Git repositories via SSH

Setting Up SAS Viya with GitLab CI/CD Pipelines: Step One

Install VS Code extensions offline: step-by-step guide

SAS Notebooks in VS Code: A developer’s guide

CI/CD for Python and SAS: Build modern workflows with GitHub Actions was published on SAS Users.

Boost ML accuracy with hyperparameter tuning (with a fun twist)

Stu Sztukowski — Fri, 13 Jun 2025 16:19:14 +0000

Building a machine learning model isn’t always as easy as running .fit() and calling it a day. Sometimes, you need to eke out a little more accuracy, because even a 1% improvement can mean a lot to the bottom line. Many machine learning models have a lot of buttons and knobs you can adjust. Changing one value here, tweaking another value there, checking the accuracy one at a time, making sure it’s generalizable and not overfitting… it’s a lot of work to find the right model. Needless to say, trying all of these different combinations by hand can be a tedious task. But it doesn’t have to be. We can have the computer do it for us, and more importantly, intelligently. Enter: hyperparameter autotuning. Let’s talk about what it is first.

Like baking a cake: a gentle introduction to hyperparameter autotuning

A hyperparameter is a configuration value that you set before training a machine learning model. It controls how the learning happens. Think of it like baking a cake:

The recipe is your model

The cake batter is your training data

The oven temperature, baking time, and pan size are your hyperparameters

Let’s say your standard recipe says to use a 12-inch diameter cake pan and bake it at 350 degrees Fahrenheit for 30 minutes. These are the suggested values that make a good cake in most situations. But what if you wanted a smaller cake? Should you change the temperature? What about the baking time? If you have prior knowledge of what works, then you can make some estimates of what cooking time and even temperature to use. If you had an unlimited amount of cake batter, you can try as many combinations of these as you wanted to, learning from what works and what doesn’t to bake the perfect cake. That’s what intelligent hyperparameter autotuning does.

When you perform hyperparameter autotuning on your model, you’re asking the computer to try different combinations of hyperparameters and estimate how it improves a performance metric. For example: sum of squared errors, accuracy, misclassification rate, F1 score, etc. Your goal is to maximize or minimize some metric against validation data.

Overfitting can ruin your cake

So why don’t we just find the absolute best set of hyperparameters for every model all the time? There’s a catch to this: you’re at risk of overfitting your data, you’re making your model harder to interpret, it can take a lot of time, and it’s computationally expensive. When you overfit your training data, the model is harder to generalize. Even worse, when you start changing values of hyperparameters, your model is harder to understand. To see why this is, let’s go back to our cake analogy.

You’ve made a bunch of cake batter using your favorite ingredients and derived the best instructions for the ultimate cake. You spent hundreds of dollars on ingredients, countless hours experimenting, and have one hefty electric bill. But this is all a one-time cost, right? You invite your friends over to try your ultimate cake, and they agree: it’s an incredible cake. One of your friends says they’re having a party and wants you to bake a cake for it. Your friend has all the ingredients you need and even offered their oven and kitchen. On the day of the party, you go to your friend’s house and bake the cake using your precise instructions. Your friend says to everyone, “This is the best cake you will ever eat!” When they all taste it, it’s…just okay. Not bad, not great, but certainly not living up to the hype. What happened?

You think back to what could have gone wrong. You know you used the same instructions as you did last time, and you definitely used the same type of ingredients. But wait: your friend had slightly different forms of ingredients. The flour was organic, the sugar was cane sugar, and she had a gas convection oven while you had a standard electric. Your cake recipe was great for your environment, not all possible environments. Not only do all these things not work together with your instructions, but you don’t know why they don’t. You’re going to need to do even more experiments to generalize it better. Time to break out the credit card and start baking up a storm once again.

Stress-testing the recipe

When you’re performing hyperparameter autotuning, it’s vitally important to validate your results to help generalize the model. Real-world data is full of variance, and your training data may only capture some of it. One of the most common ways is k-fold cross-validation: split the data into k parts, train the model k times against k-1 parts of data, reserving 1 for validation, then taking the average accuracy metric across all k trials. A well-generalized model performs accurately and consistently not only on the training data but also on new, unseen data, just like baking a cake in someone else’s kitchen with different variations of ingredients.

Imagine you have access to five kitchens, each stocked with all the ingredients and tools to bake a cake. Each kitchen is slightly different: one uses a gas oven while the other is electric, one has organic flour and the other has all-purpose flour, etc. You refine your recipe using four of the five kitchens, then try the final result in the fifth. You repeat this process four more times so that every kitchen is used at least once as the test kitchen.

The more consistent the results, the better your instructions generalize outside of ideal conditions. Cross-validation helps you avoid baking a cake that only works in your kitchen.

Validating your model under a variety of conditions is crucial to making sure it behaves in an expected way with real-world results. It’s even more important when you’ve tuned the hyperparameters. If it’s giving weird predictions, it will be hard to explain why it did what it did; and trust me, when an executive needs to know why your model missed the biggest and most important prediction of the year, “I don’t know” isn’t a great answer. You’re making a tradeoff between interpretability and performance when deciding whether to tune hyperparameters. If interpretability is important, use a simpler model. If accuracy is important, consider tuning the hyperparameters. If you need a bit of both, take a look at interpretability tools like LIME or Shapley to help you understand the results better.

All right, now that we understand what hyperparameter autotuning is, let’s get into an interesting use case in physics: colliding particles near the speed of light to find Higgs bosons.

Smashing particles like cakes: a real-world example

In June of 2014, P. Baldi, P. Sadowski, and D. Whiteson of UC Irvine published the paper Searching for Exotic Particles in High-Energy Physics with Deep Learning. Their goal was to use machine learning and deep learning to identify a signal vs. background particles after colliding two particles together: more specifically, Higgs bosons (the signal). According to their research, the Large Hadron Collider collides approximately 10¹¹ (100 billion) particles per hour. Of those particles, 300 of them are Higgs bosons – a 0.0000003% rate overall. That’s pretty rare.

To determine whether there is a signal or background particle, 28 measurements are used: 21 low-level features, and 7 high-level features. Low-level features are the basic kinematic properties recorded by particle detectors in the accelerator. In other words, these are raw values recorded by highly precise instruments. The 7 high-level features are derived by physicists to help discriminate between signals and background particles. These are manually derived, labor-intensive non-linear functions of the raw features. The overall goal of this research is to use deep learning to improve predictions of signals vs. background particles using low-level features. If proven true, this would eliminate the need to derive high-level features and allow a deep learning model to generate as good or better classifications as a result.

The researchers simulated 11M particle collisions with both low and high-level features. They used three types of models:

Boosted Decision Tree

Shallow Neural Network

Deep Neural Network

Within each model, they had three types of inputs:

Low-level only

High-level only

Complete: both low-level and high-level

They were able to successfully prove that a deep neural network of the low-level features outperformed the boosted decision tree and shallow neural network in all cases and even had equivalent performance when it included the complete feature set. Deep neural networks require a lot of computing power, time, fine-tuning, design, and in this case, GPU acceleration. They’re kind of like the multi-tier wedding cake of the modeling world: you need an expert who knows what they’re doing to build it, they have a ton of layers, and they take a lot of time and material to make.

Can we bake a simpler cake?

The paper tried out three models and ultimately landed on a deep neural network to achieve the highest level of prediction possible. What if we could use a simpler model with less computing power, like a smaller wedding cake whose taste still packs a punch?

Our goal: build a machine learning model with only low-level inputs that beats the boosted decision tree and shallow neural network for both low and high-level inputs. If we’re really lucky, we might even get close to the deep neural network. Let’s get started.

The ingredients

The data consists of 11M observations with 29 columns:

Signal: Binary indicator of a signal (1) or background (0)

21 Low-level features: Raw measurements from a particle accelerator

7 High-level features: Derived measurements from low-level features

All columns are numeric, which is quite convenient for modeling.

The kitchen

The researchers used a 16-core, 64GB machine with an NVIDIA Tesla C2070 GPU. What I need is the ability to scale with minimal interruption, and SAS Viya Workbench does exactly that. The nice thing about SAS Viya Workbench is that I can choose multiple environment sizes. I can start small and scale up as needed, and it launches almost immediately after I make the change. For this case, we’ll go with a similar environment that the researchers had, especially since we’re working with a nearly 8GB csv file.

Analyzing our ingredients

Before embarking on any modeling mission, we need to see what we’re dealing with:

Are there any missing values?

How balanced is the data?

Is multicollinearity going to be an issue?

Using a standard data summary, such as described in polars or pandas, we can see there are no missing values of any variables. Perfect.

It’s also nearly perfectly balanced with a 47/53 split of background (0)/signal (1).

And almost all variables have low correlation with each other.

The paper already describes the distribution of many of these variables, and in the interest of saving space, I’ll let you take a look at the full Jupyter notebook which graphs out many of these variables. All of them are fairly right-skewed which might mean we want to do some sort of transformation or scaling for our models, but it depends on which model we ultimately choose. But with all of these models out there, which one should we choose?

Bakeoff: what’s the best vanilla cake?

This data is fairly large, and if there’s one thing SAS truly excels at, it’s dealing with big datasets. This is especially true with multi-pass algorithms like gradient boosting or random forest. I did some initial tests with scikit-learn models, and the performance didn’t hold up; some models wouldn’t even finish. So, we’ll be turning to SAS models, which are also compatible with scikit-learn’s utilities to boot.

from sasviya.ml.linear_model import LogisticRegression from sasviya.ml.svm import SVC from sasviya.ml.tree import DecisionTreeClassifier, ForestClassifier, GradientBoostingClassifier

We’ll do a model bakeoff by running five SAS classification models through the gamut to see which comes out best. Here’s the setup:

Create a set of pipelines that run default Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosting

Transform with a Standard Scaler depending upon the model

Perform 5-fold cross-validation on all 11M observations

Compare the average accuracy and standard deviation

This is like testing out variations of vanilla cakes before we turn it into something magnificent. The tastiest one wins. Here are the results:

Gradient boosting is the clear winner here, followed closely by random forest. This isn’t too surprising, as tree-based models tend to really shine. Even more impressive is that our standard deviation for accuracy is negligible, meaning there’s strong consistency across all our models.

We’re going to decide on gradient boosting. We don’t care too much about the interpretability of our model. We just want the best-performing, most generalizable model possible. You already know where this is going. Let’s do some hyperparameter autotuning.

Fine-tuning the flavor

We’re going to try tuning the hyperparameters of the gradient boosting model to see if we can get more performance. Not only that, we’re going to do it intelligently: we don’t want to randomly choose values and hope for the best. We want to try values that work and move in a direction that makes sense. This is where Optuna comes in.

Think of Optuna like a smart food critic who looks at every attempt and says “Hmm, that last cake was too dense. Maybe try lowering the oven temperature a bit next time.” It learns from each cake you bake, gradually steering you towards the perfect cake instead of blindly trying every temperature and time combination.

From a technical perspective, Optuna is an open source model-agnostic intelligent hyperparameter autotuning framework that uses Bayesian optimization to choose hyperparameters that work well. It builds a probabilistic model of which hyperparameters might work and tests the most promising ones next. This makes it faster, smarter, and more effective. In a world where time and compute are money, this is more important than ever.

Here's our plan:

Split the data into 70% train, 15% validation, and 15% test

Autotune gradient boosting against the validation set for both low-level and high-level models

Verify the results with the untouched test dataset

Because we have so much data and because we saw our models performed consistently in k-fold cross-validation, we’re going to tune our hyperparameters against a validation dataset, then confirm it with a test dataset. This is a great way to reduce computing time. Because we have a test dataset, we can make sure that our model is generalizable and not just tuned to the validation dataset.

Optuna, our own personal Gordon Ramsay

To do this, we’ll create a performance objective for Optuna to maximize or minimize, and tell it to run a study on that objective. In other words, we tell Optuna a range of hyperparameters to try, how many times to try a combination, and see what we end up with. It’s kind of like if we had Gordon Ramsay critiquing our cakes and suggesting different ways to improve them based on how it came out last time, just without all the shouting and deeply personal insults.

We’re going to optimize the ROC Area Under the Curve (AUC), which is the same metric that is used in the paper (note that this is different from accuracy that we used above for our initial model selection). If you’re unfamiliar with this metric, check out this comprehensive tutorial from Jeff Thompson about ROC charts in SAS and how to interpret them. All you need to know right now is that it's a number ranging from 0 to 1. 0.5 means the model is as good as a coin flip, 1 means it predicts perfectly, and anything below 0.5 means it’s worse than random guessing. The closer to 1 you are, the better the model is at distinguishing between signals and background particles. According to the research paper, “small increases in AUC can represent significant enhancement in discovery significance”, so we’ll take any improvements we can get.

def low_level_objective(trial): n_bins = trial.suggest_int('n_bins', 10, 100, step=5) n_estimators = trial.suggest_int('n_estimators', 50, 300, step=10) max_depth = trial.suggest_int('max_depth', 5, 17) min_samples_leaf = trial.suggest_int('min_samples_leaf', 100, 1000, step=50) learning_rate = trial.suggest_float('learning_rate', 0.01, 0.1, step=0.01) model = GradientBoostingClassifier( n_bins = n_bins, n_estimators = n_estimators, max_depth = max_depth, min_samples_leaf = min_samples_leaf, learning_rate = learning_rate, random_state = 42 ) model.fit(X_train_low, y_train) preds = model.predict_proba(X_val_low).iloc[:,1] auc = roc_auc_score(y_val, preds) return auc

low_level_study = optuna.create_study( direction = 'maximize', study_name = 'Low-level variables: Gradient Boosting Autotuning', ) low_level_study.optimize(low_level_objective, n_trials=30)

Now all we do is sit back, relax, and let Optuna do all the work.

The final taste test

Two SAS gradient boosting models were tuned: a model with low-level variables and a model with high-level variables. Each model went through 30 trials, meaning we ran 60 variations of gradient boosting models. The SAS models handled all these combinations without any concerns, even as complexity grew drastically. The fact that they handled these dozens of training iterations with ease shows just how well-optimized the platform is for use cases requiring serious computing power.

Optuna comes with some cool graphics. Let’s take a look at the tuning history to see how it learned across trials.

Low-level gradient boosting model

High-level gradient boosting model

You can very clearly see how Optuna tended to make both models better over time. This is the power of Bayesian hyperparameter autotuning: we don’t need to go through every possible combination. We give it some guard rails, and it will continue trying values that tend to work well. One interesting thing to note is how the high-level variables do such a good job predicting on their own. Remember, these are only 7 variables compared to the 21 raw inputs. It’s a true testament to the deep expertise of particle physicists who can mathematically derive useful variables that improve predictive power over the raw variables alone. This is an important example of how feature engineering using domain knowledge can be massively helpful for improving model performance.

Let’s compare all our models together with both our Validation and Test datasets to see how we did.

Thanks to hyperparameter autotuning, our champion model’s AUC improved by 15.2% over the gradient boosting model, 8.9% over the boosted decision tree model, and 8.5% over the shallow neural network model. In addition, the results are almost identically consistent between the validation and test datasets. That’s excellent for generalization.

Here’s a breakdown of the tuned gradient boosting model compared to those in the paper:

Model AUC: Low-level Δ vs SAS (%) AUC: High-level Δ vs SAS (%)

SAS: Tuned Gradient Boosting 0.795 0.796

Paper: Boosted Decision Tree 0.73 -8.2% 0.78 -2.0%

Paper: Shallow Neural Network 0.733 -7.8% 0.777 -2.4%

Paper: Deep Neural Network 0.880 +10.7% 0.885 +11.2%

The tuned gradient boosting model couldn't match the performance of a fully optimized deep neural network, but it outperformed the boosted decision tree, shallow neural network, and default SAS gradient boosting model — all on a modest CPU server, no GPU required.

Blowing out the candles

Hyperparameter autotuning is a powerful tool for squeezing out extra accuracy, but it comes with trade-offs: increased model complexity, higher risk of overfitting, and reduced interpretability. Not to mention, you’re going to be using more time and computing power. That’s why validation is critical. The more diverse and representative your data, the better your model will generalize. A model that performs well on the training set but falls apart in the real world may have simply learned to predict noise. It’s kind of like fondant: sure, it makes your cake look pretty, but it tastes downright awful.

In our case, we had a large, clean, balanced, and diverse dataset. Our ultimate goal was to improve accuracy, not interpretability, so with all the right conditions in place, autotuning helped get us the most out of our model. Sometimes you really can have your cake and eat it too.

Links

SAS® Viya® Workbench Demo: Solving Your Productivity Challenges

Check out the full detailed project on my GitHub

UC Irvine Machine Learning Repository - Higgs boson dataset

Optuna - A model-agnostic hyperparameter optimization framework

References

Baldi, P., Sadowski, P., & Whiteson, D. (2014, June 5). Searching for exotic particles in high-energy physics with Deep Learning. arXiv.org. https://arxiv.org/abs/1402.4735.

Taylor, Lucas; CERN. Simulated Higgs to two jets and two electrons. Source: cds.cern.ch. Licensed under CC BY-SA 4.0.

Whiteson, D. (2014). HIGGS [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5V312. Licensed under CC BY-SA 4.0.

Boost ML accuracy with hyperparameter tuning (with a fun twist) was published on SAS Users.

Model	AUC: Low-level	Δ vs SAS (%)
SAS: Tuned Gradient Boosting	0.801
SAS: Tuned Gradient Boosting (Optuna)	0.795	-0.75%
Paper: Boosted Decision Tree	0.73	-8.9%
Paper: Shallow Neural Network	0.733	-8.5%
Paper: Deep Neural Network	0.880	+9.9%

SAS Users

From question to clarity: how SAS Viya Copilot changes the way we work with data

Most analytical journeys start the same way – with a question.

Analytics that meet you where you are

A more natural way to work with data

From data to visuals – faster

Insights only a question away

A faster path from curiosity to clarity

Learn more about how SAS Viya Copilot helps organizations turn analytical power into sustained decision momentum.

SAS Innovate 2026 and you: the SAS user

SAS Users Day: Powered by the community

SAS 9: Productive today, Preparing for tomorrow

Don't crash this ship! DuckDB is heading straight for Iceberg!

Introducing Iceberg support

Where's on First

How's on Second

What's on Third

Bringing it Home

Engage for Managed Cloud Services: Strategic automation for the future

What is Engage?

SAS Managed Cloud Services Enterprise: Custom Infrastructure at Scale

Managed Cloud Services Fleet: Immutable, Pre-Packaged Offerings

Managed Cloud Services Developer Experience: Empowering Automation and Governance

Continuous Innovation: What’s Next for Engage and Managed Cloud Services

Update Data Live in SAS Viya: Integrating SAS Code with Interactive Reports

Uploading and visualizing custom shapefiles in SAS Viya

Learn more

The afterparty: Hyperparameter autotuning revisited

Predicting Higgs bosons with SAS

The results

Going bigger

Wrapping it up

Links

References

Mapping data over images

TouchDuck! Exploring unique features of SAS/ACCESS to DuckDB through college football

Data Ingestion

Loading And Exploring Our Tables

Investigation #1: Talent Gain & Loss Over the Offseason

Resulting DELTASORT Table:

Investigation #2: Offseason Recruiting

Learn more

Getting Started with Job Scheduling in SAS Viya

Getting Started with Job Scheduling in SAS Viya

What is a Job in Viya?

Scheduling Made Simple

How to Schedule a Job in SAS Viya

Monitoring Your Jobs

Going Beyond Jobs: What Are Flows?

Components of Job Flow

Creating a Job Flow

Command Line Scheduling (For Power Users)

SAS Viya Workload Management

What is SAS Viya with Workload Management

What is SAS Workload Management?

Key Features of SAS Workload Management

Administration Tools

Components of SAS Workload Management

Workload Manager Dashboard page

Workload Manager Jobs page

Workload Manager Queue Page

Workload Manager Host Page

Workload Manager Logs page

Workload Manager Configuration page

Workload Manager Log Levels page

How SAS Workload Management Works.

Flow of how Workload Manager works:

Job Lifecycle

High-level flow

SAS Viya Workload Management vs. SAS 9 Grid Manager

CAS vs Workload Manager

Real-World Benefits

Conclusion

Quickly and easily register Python models in SAS Viya Workbench

Hello, I’m SAS Viya Workbench. Can you hear me?

Let’s register

The model

It’s all about the prep work

Step 1: Pickle it

Step 2: Define inputs and outputs