<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>The SAS Data Science Blog</title>
	
	<link>https://blogs.sas.com/content/subconsciousmusings</link>
	<description>Advanced analytics from SAS data scientists</description>
	<lastBuildDate>Mon, 08 Nov 2021 20:34:03 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.5.3</generator>
	<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/advanalytics" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="advanalytics" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">advanalytics</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">https://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>SAS opens its code editor interface to Python users</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/11/08/sas-studio-python-editor/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/11/08/sas-studio-python-editor/#respond</comments>
		
		<dc:creator><![CDATA[Marinela Profi]]></dc:creator>
		<pubDate>Mon, 08 Nov 2021 20:15:06 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[developers]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[SAS Studio]]></category>
		<category><![CDATA[SAS Viya]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9915</guid>

					<description><![CDATA[<p>[Editor's note: this post was co-authored by Marinela Profi and Wilbram Hazejager] Data science teams are multidisciplinary, each with different skills and technologies of choice. Some of them use SAS, others may have analytical assets already built in Python or R. Let's just say each team is unique. As part [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/11/08/sas-studio-python-editor/">SAS opens its code editor interface to Python users</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><em>[Editor's note: this post was co-authored by Marinela Profi and Wilbram Hazejager]</em></p>
<p>Data science teams are multidisciplinary, each with different skills and technologies of choice. Some of them use SAS, others may have analytical assets already built in Python or R. Let's just say each team is unique. </p>
<p>As part of our Continuous Integration/Continuous Delivery with monthly releases, we are always looking to extend <a href="https://www.sas.com/en_us/software/viya/open.html">SAS Viya integration capabilities</a> to support open-source users and technology. The goal is to enable all types of users to leverage their best skills, ensuring governance of assets, explainable AI and operationalization of models (<a href="https://www.sas.com/en_us/insights/articles/analytics/modelops.html">ModelOps</a>).</p>
<p>With the <a href="https://support.sas.com/en/documentation/whats-new.html">October 2021 release of SAS Viya</a>, we introduced the Python Code Editor. Data Scientists and Python programmers can now code, execute and schedule Python scripts from within the <a href="https://www.sas.com/en_us/software/studio.html">SAS code editor interface</a> (SAS Studio) or add Python steps to a SAS Flow quickly and intuitively. </p>
<p>Both options offer Data Scientists the flexibility to:</p>
<ul>
<li>use Python inside a SAS environment for query, preparation and analysis depending on users' skills, comfort and preferences, as well as the problem they are trying to solve </li>
<li>
efficiently create a flow to integrate SAS and Python code for consistent delivery of analytics-ready data pipelines</li>
<li>
support Information Governance initiatives by displaying tables created by Python, using SAS linage diagrams</li>
<li>
leverage the <a href="https://www.sas.com/en_us/webinars/need-sas-viya.html?utm_source=LinkedIn&amp;utm_medium=social-voicestorm&amp;utm_content=98d36dc9-502a-4614-9037-d79244f32719">benefits of using SAS Viya with open source</a> for achieving business value</li>
</ul>
<p>This post demonstrates the use of this capability with some simple Python code. You will learn:</p>
<ul>
<li>how to create a single program or a flow integrating python code in a SAS environment</li>
<li>
what happens from a high-level architecture perspective </li>
<li>
how to use SAS lineage diagrams to present Python table usage</li>
</ul>
<h2>Getting started</h2>
<p>A standard SAS Viya installation does not have Python support enabled, as the environment runs in a lockdown mode, for security purposes. Your Kubernetes and/or SAS Administrator can enable Python support in lockdown mode and configure the default location for the SAS environment. Suffice it to mention here, Python 3.x is supported and you can use any Python package, as long as it is available inside the directory structure that contains Python.exe. Details about these configuration options are described in detail in the section <a href="https://documentation.sas.com/?cdcId=sasadmincdc&amp;cdcVersion=default&amp;docsetId=calsrvpgm&amp;docsetTarget=n1a7ados7ybdn1n15f0td8twwca9.htm">Configure SAS to Run External Languages</a> in the SAS Viya Administration documentation.</p>
<h2>Intro to Python Code Editor</h2>
<p><a href="https://go.documentation.sas.com/doc/en/webeditorug/v_011/p07fdumu0y6e6on18jz1n7bsq68q.htm">The Python Code Editor</a> allows you to write, run, and save Python programs. In the example outlined below, we use data in a SAS table called sashelp.class and apply data transformation using Python. By the way, the SAS table could have been a SAS data set or a DBMS table that is supported via SAS/Access software, which includes support for ODBC and JDBC.<br />
<div id="attachment_9927" style="width: 1717px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonCodeEditor.png"><img aria-describedby="caption-attachment-9927" loading="lazy" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonCodeEditor.png" alt="" width="1707" height="1038" class="size-full wp-image-9927" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonCodeEditor.png 1707w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonCodeEditor-300x182.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonCodeEditor-1024x623.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonCodeEditor-1536x934.png 1536w" sizes="(max-width: 1707px) 100vw, 1707px" /></a><p id="caption-attachment-9927" class="wp-caption-text">Figure 1 - Python Code Editor in SAS Studio</p></div>
<p>When you run your Python code from the Python Code Editor, SAS Studio runs SAS code invoking the <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/proc/p1iycdzbxw2787n178ysea5ghk6l.htm">PYTHON Procedure</a> and embeds the Python code in a <code>submit/endsubmit</code> block. This procedure enables running Python statements within SAS code. Considering Figure 1, the Python editor appears similar to the SAS Code Editor; however, notice the Python.py filename, this is Python code! The <strong>Code </strong>tab is for editing your Python Code and the <strong>Log </strong>tab displays the executed statements and information from the Python console. When the Python code yields results stored in a SAS table, the table displays in the <strong>Output Data </strong>tab, where you can interactively browse its content. </p>
<h2>The PYTHON procedure</h2>
<p>The PYTHON procedure creates a Python subprocess from a <a href="https://go.documentation.sas.com/doc/en/calcdc/3.4/calsrvpgm/n00001viyaprgmsrvs00000admin.htm#:~:text=The%20SAS%20Compute%20Server%20enables,message%20to%20a%20SAS%20log.">SAS Compute Server</a> process and automatically imports the SAS code enabling interaction between SAS and your Python instance. The next section explains this in more detail.</p>
<p>This module provides callback methods enabling variable sharing between Python and SAS (SAS.symget or SAS.symput), move data between SAS data sets and Pandas DataFrames (SAS.sd2df or SAS.df2sd), invoke SAS and <a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=default&amp;docsetId=proc&amp;docsetTarget=n00v1o3xddpt7hn1x7kp52u78usz.htm">FCMP</a> functions (SAS.sasfnc), and submit SAS code within your Python statements (SAS.submit). The <a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=default&amp;docsetId=proc&amp;docsetTarget=p0sj9pq2ryjlphn1ceq7ntpc1ipp.htm">documentation for the Python procedure</a> provides examples for each of these.</p>
<p>Using these callback methods, you can create a workflow that mixes SAS and Python programming as needed.</p>
<h2>How is the Python code handled at runtime?</h2>
<p>What happens when you run your Python program in the SAS coding interface? Let's take an architectural deep dive to explain. The following diagram summarizes what happens when you run a program in the Python Code Editor.<br />
<div id="attachment_9930" style="width: 1290px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramArchitecture.png"><img aria-describedby="caption-attachment-9930" loading="lazy" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramArchitecture.png" alt="" width="1280" height="720" class="size-full wp-image-9930" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramArchitecture.png 1280w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramArchitecture-300x169.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramArchitecture-1024x576.png 1024w" sizes="(max-width: 1280px) 100vw, 1280px" /></a><p id="caption-attachment-9930" class="wp-caption-text">Figure 2 - Running Python code in SAS Studio - what happens?</p></div>
<ol>
<li>Run your Python code from the Python Code Editor.</li>
<li>
As SAS Studio works with SAS Compute Server, invoke the Python procedure on the SAS Compute Server, which starts a Python subprocess. </li>
<li>
The code provided in the code editor is put inside a proc python <code>submit/endsubmit</code> block, which executes inside the Python subprocess. </li>
<li>
Because the Python code contains a call to the SAS.sd2df() function, the SAS Compute Server reads the specified SAS table, and provides data in the specified Pandas DataFrame.</li>
<p>The sample Python code contains some statements calling a few Pandas functions, but you could use any available Python package in the Python directory configured by your SAS Administrator.</p>
<li>The sample code issues a SAS.df2sd() call transfering the results from a Pandas DataFrame back into a SAS table. </li>
</ol>
<h2>Integrating Python steps with SAS steps in a flow</h2>
<p>Now that we have developed and tested our Python code, we are going to embed it in a <a href="https://go.documentation.sas.com/doc/en/webeditorcdc/3.8/webeditorug/n0ekm94albqbi1n19u7owsw5qhqb.htm">Studio Flow</a>. This allows us to combine Python code and standard data transformation functionality in SAS Studio Flow. Moreover, we can view and define the data transformation using a graphical flow with a mix of SAS and Python transformations. And finally, we can easily schedule the flow.</p>
<p>In the next example, we modify the Python code so it contains no hardcoded table references. Instead, it uses tables connected to the input and output ports of the step.</p>
<p>In the following flow, we have dropped a Python Program step onto the canvas and connected tables representing the input and output tables. The table steps contain information about data location. Note that those table steps could represent a SAS data set but could also be a table that lives in a DBMS system.</p>
<div id="attachment_9933" style="width: 1286px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramNodeFlow.png"><img aria-describedby="caption-attachment-9933" loading="lazy" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramNodeFlow.png" alt="" width="1276" height="510" class="size-full wp-image-9933" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramNodeFlow.png 1276w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramNodeFlow-300x120.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramNodeFlow-1024x409.png 1024w" sizes="(max-width: 1276px) 100vw, 1276px" /></a><p id="caption-attachment-9933" class="wp-caption-text">Figure 3 - Flow using Python Program node, showing use of input and output ports in Python code</p></div>
<p>The Python Program step supports input and output ports (not networking ports!) and each has an associated variable representing the name of the table connected to that port. The user can overwrite the names of these ports.</p>
<p>Note: The Python Program step has a single input and output port by default, but you can add and remove ports using the popup menu.</p>
<p>When we run the flow, the code framework generates code for each step in the flow. For the Python Program step, it also creates Python variables, _input1 and _output1, which contain the name of the connected table using SAS table reference syntax <code>libref.tablename</code>.</p>
<p>Just like we observed previously with the Python Code Editor, the Python Program step invokes proc python SAS code and the native Python code provided in the Code tab of the step, and is embedded inside a <code>submit/endsubmit</code> block. </p>
<p>As a result, the generated code for the Python Program step, using input and output ports in Python code from Figure 3, looks like this:<br />
<div id="attachment_9987" style="width: 572px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/11/Generated-Python-code-for-Python-Program-step-12pt-font-cropped.png"><img aria-describedby="caption-attachment-9987" loading="lazy" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/11/Generated-Python-code-for-Python-Program-step-12pt-font-cropped.png" alt="" width="562" height="443" class="size-full wp-image-9987" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/11/Generated-Python-code-for-Python-Program-step-12pt-font-cropped.png 562w, https://blogs.sas.com/content/subconsciousmusings/files/2021/11/Generated-Python-code-for-Python-Program-step-12pt-font-cropped-300x236.png 300w" sizes="(max-width: 562px) 100vw, 562px" /></a><p id="caption-attachment-9987" class="wp-caption-text">Figure 4 - Generated code for a Python Program step</p></div>
<p>This code is visible in the Generated Code tab in Studio Flow. </p>
<p>Now that we have embedded our code in a Python Program step in Studio Flow, we can change the properties of the input and output Table steps and point to a different table. The Python code uses the name changes when we run the flow interactively. If your data requires pre- or post-processing, you can add additional steps in the flow. </p>
<p>In the example below, we added a <em>Query</em> step to join data from an additional source using an inner-join and applied a filter. We also change the properties of the Table step by adjusting the name of the table where we store the results of the Python processing. </p>
<div id="attachment_9939" style="width: 1217px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramInFlow.png"><img aria-describedby="caption-attachment-9939" loading="lazy" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramInFlow.png" alt="" width="1207" height="891" class="size-full wp-image-9939" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramInFlow.png 1207w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramInFlow-300x221.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/pythonProgramInFlow-1024x756.png 1024w" sizes="(max-width: 1207px) 100vw, 1207px" /></a><p id="caption-attachment-9939" class="wp-caption-text">Figure 5 - Using Python Program and other nodes in same flow</p></div>
<p>The flow is now complete and the green tick-marks on each of the transformation steps indicate the steps ran successfully. </p>
<h2>How to keep an overview of which tables are used where?</h2>
<p>Thus far we have created a single Studio Flow with steps that work on input tables and create output tables. When you have multiple flows, perhaps created by other users in the environment, and some of these flows use tables created/updated by our flow, how do you keep an overview of which tables were used where? This is where <a href="https://documentation.sas.com/?cdcId=dprepcdc&amp;cdcVersion=default&amp;docsetId=dmlinug&amp;docsetTarget=n0y1mz75np09g1n1cwjs97pnquqs.htm">SAS Lineage Viewer</a> comes into play.</p>
<p>So, let’s assume another user creates a flow that uses our output table to perform some further data transformations creating another table. How would we know about this?<br />
<div id="attachment_9942" style="width: 1142px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/sasLineage.png"><img aria-describedby="caption-attachment-9942" loading="lazy" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/sasLineage.png" alt="" width="1132" height="720" class="size-full wp-image-9942" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/sasLineage.png 1132w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/sasLineage-300x191.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/sasLineage-1024x651.png 1024w" sizes="(max-width: 1132px) 100vw, 1132px" /></a><p id="caption-attachment-9942" class="wp-caption-text">Figure 6 - SAS Lineage Viewer showing where tables are used</p></div>
<p>The lineage diagram above shows the CLASS and CLASSFIT tables are inputs in Flow 1b and this flow wrote data into table CLASS_PYTHON_NEW. In Flow 2, the table is input, and the flow writes data into table CLASS_PYTHON_FINAL. </p>
<p>This definition of each flow is known to SAS Lineage Viewer and we searched it for our flow, called Flow 1b. The result shows a diagram with all input and output tables. And if a table is used somewhere else, it has a (+) icon. In our example, by expanding the CLASS_PYTHON_NEW table, the diagram shows use by Flow 2.<br />
By using SAS Lineage Viewer, you can quickly see which tables are used where, even when your Studio Flows contain Python Program steps.</p>
<h2>Wrapping things up</h2>
<p>This Python code editor allows programmers and data scientists to code, execute and schedule Python scripts. The functionality is available from within the code editor interface in SAS Studio or by adding Python steps to or program in a SAS Studio Flow. In either case, it offers flexibility to use Python or SAS for query, preparation, and analysis depending on user's skills, comfort and preference, and problem they are trying to solve. You can now create a single program or flow to integrate code. Additionally, use steps and nodes from SAS and Python code for consistent delivery of analytics and ready data pipelines in a productive, efficient manner. In this post we introduced the module and how to get started, how it works, and the architecture behind it.</p>
<h2>Learn more</h2>
<p>SAS integrates with open source in every step of the analytics life cycle. Users can learn more about how SAS works with open source by downloading the eBook, <a href="https://www.sas.com/content/dam/SAS/documents/marketing-whitepapers-ebooks/ebooks/en/sas-open-source-integration-112134.pdf">“Drive Analytic Innovation Through SAS and Open Source Integration”</a> and by visiting <a href="https://developer.sas.com/home.html#filterlist=cbo744792">developer.sas.com</a>. </p>
<p>We have also developed specific packages available on Github and supported by SAS R&amp;D: <a href="https://blogs.sas.com/content/sasdummy/2017/04/08/python-to-sas-saspy/">SASPy</a> and <a href="https://blogs.sas.com/content/sgf/2020/02/19/building-machine-learning-models-by-integrating-python-and-sas-viya/">SWAT</a>, both aimed at a Python programmer wanting to use functionality available on a SAS server from their Python client environment.</p>
<h2>About the co-author</h2>
<h4>Wilbram Hazejager - Principal Systems Architect</h4>
<p>Wilbram Hazejager is a principal systems architect for data management products at SAS. He has been involved for many years in SAS’ data management products in various roles, including product management, working closely with customers, and authoring a number of papers on SAS data management products. He holds a Master of Science degree in Applied Mathematics from the Eindhoven University of Technology in the Netherlands.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/11/08/sas-studio-python-editor/">SAS opens its code editor interface to Python users</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=29RzgTytxz0:Q1a7rDJkf4g:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/11/08/sas-studio-python-editor/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/11/pythonEditor2-150x150.png" />
	</item>
		<item>
		<title>Using State Space Models for the Stability Monitoring of Streaming Data</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/10/26/using-state-space-models-for-the-stability-monitoring-of-streaming-data/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/10/26/using-state-space-models-for-the-stability-monitoring-of-streaming-data/#comments</comments>
		
		<dc:creator><![CDATA[Rajesh Selukar]]></dc:creator>
		<pubDate>Tue, 26 Oct 2021 18:58:37 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Analytics R&D]]></category>
		<category><![CDATA[PROC CSSM]]></category>
		<category><![CDATA[SAS Econometrics]]></category>
		<category><![CDATA[SAS Viya]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9798</guid>

					<description><![CDATA[<p>SAS' Rajesh Selukar introduces you to a new scoring feature.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/26/using-state-space-models-for-the-stability-monitoring-of-streaming-data/">Using State Space Models for the Stability Monitoring of Streaming Data</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>State space models (SSMs) have long been used in a wide variety of fields like econometrics, signal processing, and environmental science.  They are used to analyze data generated in a sequential fashion such as time series, panels of time series, and longitudinal data.  The <a href="https://documentation.sas.com/doc/en/pgmsascdc/default/casecon/casecon_cssm_toc.htm">CSSM procedure</a> in <a href="https://www.sas.com/en_us/software/econometrics.html">SAS<sup>®</sup> Econometrics</a>, the <a href="https://www.sas.com/en_us/software/viya.html">SAS<sup>®</sup> Viya<sup>®</sup></a> version of the SSM procedure in <a href="https://www.sas.com/en_us/software/ets.html">SAS/ETS<sup>®</sup></a>, provides a comprehensive set of tools for analyzing such sequential (streaming) data. Scoring is an important new feature in the 2021.1.6 release of PROC CSSM. Scoring can be used for efficient, model-based scenario analysis and stability monitoring of an ongoing data stream.  This post discusses this new scoring feature.</p>
<h2>Scoring with state space models</h2>
<p>Scoring is evaluating a previously fitted model at some new predictor setting.  For example, a bank might score a new loan application by applying a rule that is based on a previously fitted logistic regression model.  You might also want to use SSMs for scoring.  Because of the sequential nature of the observation process, it is important to realize that the scoring process for an SSM is inherently different from the scoring process for a model that is based on independent observations. Examples would be simple linear regression or logistic regression.</p>
<p>Think of a sequential data process like a long-running TV series, with a new episode every week. Watching the series for some time and learning about the main characters and their relationships to each other is akin to fitting a model to the initial portion of a sequential data stream.  After you have watched some episodes, you can make educated guesses about the subsequent plotline. You can also imagine different ways the series could play out, while still being consistent with the storyline.</p>
<p>For sequential data, this process of forecasting and what-if analysis, based on the fitted model and ever-increasing history, is called scoring.  Additionally, the scoring process enables you to monitor an ongoing observation process for additive outliers (that is, one-off observations) and structural breaks such as shifts in the mean level or other patterns.</p>
<h2>PROC CSSM</h2>
<p>The scoring process requires a model that is a good fit for the initial portion of the data stream.   After a suitable model is fitted, a score store is created.  The score store contains sufficient information for you to forecast subsequent observations of the data stream, without having access to its history. If scenario analysis is the only goal, this initially created score store is all that is needed for scoring different future scenarios.  On the other hand, if the goal is to perform stability monitoring of the ongoing data stream, the score store must be updated on a continual basis to incorporate the information from the incoming data.  The CSSM procedure enables you to easily handle all aspects of such a scoring process.  With PROC CSSM you can:</p>
<ul>
<li>Fit and diagnose a wide variety of SSMs for a wide variety of sequential data types</li>
<li>Create the initial score store, after a suitable SSM is found for the initial portion of the data stream</li>
<li>Without access to the historical data:
<ul>
<li>Perform forecasting and scenario analysis using a previously created score store</li>
<li>Create an updated score store based on the newly arrived data</li>
<li>Perform stability analysis of an ongoing data stream</li>
</ul>
</li>
</ul>
<p>Next, let’s illustrate this SSM-based scoring process with a real-life example.   The data used in this illustration are set to a monthly frequency and the computing time for scoring is not a constraint.  However, SSM-based scoring can be effectively used in applications where the scoring must be done at a much higher frequency, such as every minute or every second.</p>
<h2>Analyzing monthly traffic accident numbers</h2>
<p>The traffic accident data analyzed in this example consist of observations on four variables recorded at monthly intervals from January 1969 to December 1985.  The variables, all in the log scale, are:</p>
<ul>
<li>F_KSI: front-seat passengers killed or seriously injured</li>
<li>R_KSI: rear-seat passengers killed or seriously injured</li>
<li>logKM: average number of kilometers traveled per car per month</li>
<li>logPrice: real price of petrol</li>
</ul>
<p>The time series plots below show the F_KSI and R_KSI numbers over the years:</p>
<p><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/rajesh-SSM-time-series-figure-1.png"><img loading="lazy" class="aligncenter wp-image-9804 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/rajesh-SSM-time-series-figure-1.png" alt="State Space Models: time series plots showing the F_KSI and R_KSI numbers over the years" width="2000" height="1500" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/rajesh-SSM-time-series-figure-1.png 2000w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/rajesh-SSM-time-series-figure-1-300x225.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/rajesh-SSM-time-series-figure-1-1024x768.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/rajesh-SSM-time-series-figure-1-1536x1152.png 1536w" sizes="(max-width: 2000px) 100vw, 2000px" /></a></p>
<p>From the time series plots you can see that:</p>
<ul>
<li>F_KSI and R_KSI numbers don’t show a pronounced upward or downward trend but do show monthly seasonal variation</li>
<li>These two series seem to move together, at least till February 1983. Starting in February 1983, the mean level of F_KSI appears to drop whereas the R_KSI pattern seems to remain unaffected.</li>
</ul>
<p>The drop in the mean level of F_KSI is attributed to the enactment of the seat-belt law of February 1983. This law required the front seat passengers to wear seat belts.  It didn’t apply to the rear seat passengers.  These data have been analyzed by many researchers in the past.</p>
<h2>Stability monitoring of monthly traffic accident numbers</h2>
<p>In this section, we will see how you can perform a model-based stability monitoring of the bivariate series (F_KSI, R_KSI).  We will use data from January 1969 to December 1981 as the initial portion of the series. This is used for fitting a suitable model and to create the score store.  Note that the drop in the mean level of F_KSI, which happened in February 1983, is not part of this initial portion.  We hope to detect this drop during the stability monitoring of this series, which begins from January 1982.</p>
<p>Denoting by <i>Y</i><sub>t</sub> the bivariate series (F_KSI, R_KSI), and by <i>X</i><sub>t</sub> <span class='MathJax_Preview'>\(\beta\)</span><script type='math/tex'>\beta</script> the regression effects of logKM and logPrice, the model <i>Y</i><sub>t</sub> = <i>X</i><sub>t</sub> <span class='MathJax_Preview'>\(\beta\)</span><script type='math/tex'>\beta</script> + <span class='MathJax_Preview'>\(\mu\)</span><script type='math/tex'>\mu</script><sub>t</sub> + <span class='MathJax_Preview'>\(\gamma\)</span><script type='math/tex'>\gamma</script><sub>t</sub> + <span class='MathJax_Preview'>\(\epsilon\)</span><script type='math/tex'>\epsilon</script><sub>t</sub> has been shown to be a reasonable model for this initial portion of the series. Here <span class='MathJax_Preview'>\(\mu\)</span><script type='math/tex'>\mu</script><sub>t</sub> denotes the mean level, <span class='MathJax_Preview'>\(\gamma\)</span><script type='math/tex'>\gamma</script><sub>t</sub> denotes the monthly seasonal component, and <span class='MathJax_Preview'>\(\epsilon\)</span><script type='math/tex'>\epsilon</script><sub>t</sub> denotes the noise component (all the terms in the model are bivariate).  The data table mycas.train contains the initial portion of the series.  The following statements fit this model and create the score store, mycas.inscore:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">   <span style="color: #000080; font-weight: bold;">proc cssm</span> <span style="color: #000080; font-weight: bold;">data</span>=mycas.train;
      id <span style="color: #0000ff;">date</span> interval=<span style="color: #0000ff;">month</span>;
      state wn<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span> type=WN cov<span style="color: #66cc66;">&#40;</span>g<span style="color: #66cc66;">&#41;</span>;
      comp wn1 = wn<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#93;</span>;
      comp wn2 = wn<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#93;</span>;
      state meanLevel<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span> type=RW cov<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">rank</span>=<span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#41;</span> checkbreak;
      comp f_KSI_Level = meanLevel<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#93;</span>;
      comp r_KSI_Level = meanLevel<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#93;</span>;
      state season<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span> type=season<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">length</span>=<span style="color: #2e8b57; font-weight: bold;">12</span><span style="color: #66cc66;">&#41;</span>;
      comp f_KSI_season = season<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#93;</span>;
      comp r_KSI_season = season<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#93;</span>;
      model f_KSI = f_KSI_Level logKM logPrice f_KSI_season wn1;
      model r_KSI = r_KSI_Level logKM logPrice r_KSI_season wn2;
      <span style="color: #0000ff;">output</span> break<span style="color: #66cc66;">&#40;</span>alpha=<span style="color: #2e8b57; font-weight: bold;">0.01</span><span style="color: #66cc66;">&#41;</span> ao<span style="color: #66cc66;">&#40;</span>alpha=<span style="color: #2e8b57; font-weight: bold;">0.01</span><span style="color: #66cc66;">&#41;</span>;
      score out<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">8</span><span style="color: #66cc66;">&#41;</span>=mycas.inscore;
   <span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>

<p>The score store, mycas.inscore, saves the following information:</p>
<ul>
<li>Model specification and parameter estimates
<ul>
<li>This also includes the user-specified outlier and structural break detection options such as the CHECKBREAK option in the specification of the bivariate mean level, and the significance levels of the structural break and outlier detection tests (0.01 for both types of tests in this case).</li>
</ul>
</li>
<li>Estimate of the latent state vector (and its covariance), which serves as a bridge between the past and the future.</li>
<li>The last 8 rows, from May to December of 1981, of the input data table mycas.train. This is because we used the OUT(k)= form of the OUT option in the SCORE statement.   This user-supplied, 8-observation window is used during the stability monitoring.</li>
</ul>
<p>With the saved information in the score store, mycas.inscore, the model specification, or the historical data (mycas.train) are no longer needed to perform the stability monitoring of the future observations.  During the stability monitoring, new observations arrive one at a time.  This new observation gets read into a one-row table, mycas.nextObs, which, in turn, gets scored:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">   <span style="color: #000080; font-weight: bold;">proc cssm</span> <span style="color: #000080; font-weight: bold;">data</span>=mycas.nextObs;
      score <span style="color: #0000ff;">in</span>=mycas.inscore out=mycas.outscore;
   <span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>

<p>This simple scoring step accomplishes two key operations:</p>
<ul>
<li>The 9 observations, which are formed by joining the new observation with the 8-observation sliding window of the preceding observations that is stored in mycas.inscore, are scanned for any structural breaks or additive outliers</li>
<li>An updated score store, mycas.outscore, is created by utilizing the new observation</li>
</ul>
<p>After each scoring step, the two score stores (mycas.inscore and mycas.outscore)  are swapped so the new observation always gets scored using the latest information. See <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_018/casecon/casecon_cssm_examples20.htm">SAS Help Center: Scoring: Monitoring Streaming Data</a> for a complete code illustration of these steps.   In our example, stability monitoring starts in January 1982.  The main findings of the monitoring process are as follows:</p>
<ul>
<li>No breaks in the mean pattern or additive outliers are found between January 1982 and January 1983</li>
<li>Starting with February 1983, for the next several months, the structural break/outlier detection tests flag February 1983 as an unusual month for two reasons:
<ul>
<li>Drop in the mean level of F_KSI, which is called a level shift (LS)</li>
<li>The unusually low value of F_KSI, which is called an additive outlier (AO)</li>
</ul>
</li>
</ul>
<p>The following table shows a summary of this monitoring process:</p>
<ul>
<li>New_Obs column shows the month of the observation that was scored</li>
<li>AO_Z and LS_Z columns show the Z-statistics for these AO and LS tests, respectively</li>
</ul>
<table style="height: 213px" border="1" width="373" cellpadding="0">
<tbody>
<tr>
<td align="center"><strong>New_Obs</strong></td>
<td align="center"><strong>AO_Z</strong></td>
<td align="center"><strong>LS_Z</strong></td>
</tr>
<tr>
<td align="center">FEB83</td>
<td align="center">-7.13</td>
<td align="center">-7.13</td>
</tr>
<tr>
<td align="center">MAR83</td>
<td align="center">-6.57</td>
<td align="center">-8.83</td>
</tr>
<tr>
<td align="center">APR83</td>
<td align="center">-6.43</td>
<td align="center">-9.19</td>
</tr>
<tr>
<td align="center">MAY83</td>
<td align="center">-6.22</td>
<td align="center">-9.89</td>
</tr>
<tr>
<td align="center">JUN83</td>
<td align="center">-5.97</td>
<td align="center">-10.63</td>
</tr>
<tr>
<td align="center">JUL83</td>
<td align="center">-5.84</td>
<td align="center">-11.41</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Clearly, the Z-statistics for both AO and LS are highly significant in all the months between February and July.  However, the significance of LS increases steadily whereas the significance of AO decreases steadily.  This strongly points to February 1983 being a level shift for F_KSI, rather than a one-off value.</p>
<p>Usually when a structural break, such as a shift in the mean level, is detected, the monitoring is suspended.  Before the monitoring can resume, you must update the model to account for the break. Or the cause for the break must be addressed by making appropriate changes to the mechanics of the observation process.  In this traffic monitoring case, the monitoring can resume after the model is updated to account for the effect of the seat-belt law.  In this example, a break in mean level was detected. You can use both PROC CSSM (and PROC SSM) to <a href="http://support.sas.com/resources/papers/proceedings17/SAS0456-2017.pdf">detect more general types of breaks</a>.</p>
<h2>Conclusion</h2>
<p>Streaming data and their monitoring, usually without explicit human intervention, are happening everywhere.  SAS Viya provides excellent tools for dealing with such data (for example, <a href="https://www.sas.com/en_us/software/analytics-iot.html">SAS Analytics for IoT</a>). The new scoring capability in PROC CSSM (or the underlying CAS actions ssmFit and ssmScore) is designed to neatly fit in such applications.</p>
<a href="https://www.sas.com/en_us/software/viya.html" class="sc-button sc-button-default"><span>
<span class="btnheader">LEARN MORE |</span> SAS Viya </span></a>
<a href="https://www.sas.com/en_us/software/econometrics.html" class="sc-button sc-button-default"><span>
<span class="btnheader">LEARN MORE |</span> SAS Econometrics </span></a>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/26/using-state-space-models-for-the-stability-monitoring-of-streaming-data/">Using State Space Models for the Stability Monitoring of Streaming Data</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=bZuYNNP3z0s:KO3li2z_Axg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/10/26/using-state-space-models-for-the-stability-monitoring-of-streaming-data/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/traffic-150x150.jpg" />
	</item>
		<item>
		<title>How SAS developed a digital twin of a supply chain</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/10/18/how-sas-developed-a-digital-twin-of-a-supply-chain/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/10/18/how-sas-developed-a-digital-twin-of-a-supply-chain/#respond</comments>
		
		<dc:creator><![CDATA[Bahar Biller]]></dc:creator>
		<pubDate>Mon, 18 Oct 2021 12:30:44 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Analytics R&D]]></category>
		<category><![CDATA[digital twin]]></category>
		<category><![CDATA[risk management]]></category>
		<category><![CDATA[SAS Optimization]]></category>
		<category><![CDATA[SAS Simulation Studio]]></category>
		<category><![CDATA[SAS Visual Analytics]]></category>
		<category><![CDATA[SAS Visual Data Mining and Machine Learning]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[supply chain]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9606</guid>

					<description><![CDATA[<p>SAS' Bahar Biller, an operations researcher, details how to develop a supply chain digital twin.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/18/how-sas-developed-a-digital-twin-of-a-supply-chain/">How SAS developed a digital twin of a supply chain</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Managing supply chain network operations is a key challenge for SAS customers. This is because inventory holding and logistics costs have a major impact on a company’s profits. Furthermore, supply chain complexity, market uncertainty, and operational risk make it difficult to determine profit-optimal decisions. The following describes how we overcame this challenge by creating a digital twin of the supply chain enabling profit-optimal decisions under uncertainty. We will present a high-level description of the supply chain digital twin development at SAS.</p>
<h2>What is a supply chain digital twin?</h2>
<aside class="modern-quote pull alignright">A supply chain digital twin is a virtual representation of real-world supply-chain entities and processes.</aside>
<p>Envision the following use case. A major consumer packaged goods (CPG) client wanted to:</p>
<ul>
<li>predict product shortages</li>
<li>identify the bottlenecks that cause the supply chain to experience low fill rates</li>
<li>determine the corrective actions to take to reduce the risk of future shortages</li>
</ul>
<p>In other words, could we build a digital twin of the supply chain to predict problems? And could we identify solutions to fix future problems before they happened?</p>
<p>A supply chain digital twin is a virtual representation of real-world supply-chain entities and processes. Its digital and physical counterparts are synchronized at a specified frequency and fidelity (Digital Twin Consortium 2021). The expectation of the digital twin is that it will provide enhanced visibility into the future of supply chain operations. As well, it enables playing operational “what-if” games before the implementation of a decision or a policy.</p>
<h2>How to build supply chain digital twin</h2>
<p>We custom-built a digital twin for this client through the integrated use of several SAS applications to represent, visualize, simulate, analyze, and optimize the supply chain operations:</p>
<ul>
<li>SAS DATA and PROC steps retrieved, wrangled, and analyzed supply chain data</li>
<li><a href="https://www.sas.com/en_us/software/visual-analytics.html">SAS<sup>®</sup> Visual Analytics</a> visualized the supply chain</li>
<li><a href="https://www.sas.com/en_us/software/simulation-studio.html">SAS<sup>®</sup> Simulation Studio </a>simulated the supply chain</li>
<li><a href="https://www.sas.com/en_us/software/visual-data-mining-machine-learning.html">SAS<sup>®</sup> Visual Data Mining and Machine Learning</a> accelerated the “what-if” games</li>
<li><a href="https://www.sas.com/en_us/software/optimization.html">SAS<sup>®</sup> Optimization</a> helped to identify the corrective actions</li>
</ul>
<p>As a result, our client is now equipped with the power to continuously optimize their supply chain using a digital twin.</p>
<h2>Supply chain network flow</h2>
<p>The development of a supply chain starts with the description of the supply chain network flow. Motivated by the CPG industry, Figure 1 shows the flow of a multitiered network composed of suppliers, factories, warehouses, and customers. There is also a transportation channel between each pair of tiers.</p>
<p>At first glance, the illustration might indicate one supplier, one factory, one warehouse, and one customer pool. Nevertheless, at the heart of the supply chain digital twin development lies a flexible supply chain simulation model that is data-driven and scalable with the number of suppliers, factories, warehouses, and customers. Furthermore, the flow might appear to be moving in a single direction. However, underlying information flow is much more complicated. Warehouses replenish finished goods inventory from factories whereas factories replenish component inventory from their suppliers.</p>
<p>Each of these events must happen at the right time for the right quantities. Thus, there is a complex logic to the network flow moving both forward and backward. It requires the implementation of plans to manage suppliers, production, transportation, and inventory throughout the supply chain. Figure 1 also shows fill rate and total cost as two generic key performance indicators (KPIs). The goal of the supply chain digital twin is to mimic this flow for the following reasons:</p>
<ul>
<li>predict various KPIs and gain visibility into the future of the supply chain operations</li>
<li>assess the impact of operational policies and investment decisions in a virtual environment</li>
<li>stress test the supply chain and identify the best courses of action to take when faced with disruptions</li>
</ul>
<div id="attachment_9609" style="width: 1456px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-1.png"><img aria-describedby="caption-attachment-9609" loading="lazy" class="wp-image-9609 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-1.png" alt="Digital twin of a supply chain - illustration of supply chain network flow" width="1446" height="621" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-1.png 1446w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-1-300x129.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-1-1024x440.png 1024w" sizes="(max-width: 1446px) 100vw, 1446px" /></a><p id="caption-attachment-9609" class="wp-caption-text">Figure 1 An illustration of supply chain network flow</p></div>
<p>Once the supply chain network configuration is identified, the next step is to collect all relevant information and data which fully describe the supply chain. A generic list of input data sets includes:</p>
<ul>
<li>supply contracts and supplier data</li>
<li>initial inventory</li>
<li>production plan</li>
<li>customer demand</li>
<li>transportation details</li>
<li>inventory control policies</li>
<li>supply chain cost parameters</li>
<li>characterization of disruptive events.</li>
</ul>
<p>The disruptive event list could include loss of inventory and/or capacity due to natural disasters or disturbances such as fires, site contaminations, and equipment shutdowns. However, the details of data collection and definition are dependent on the types of decisions that the supply chain digital twin supports. They are further affected by the speed of decision-making.</p>
<p>A production plan is an example of the input data that are used to build a supply chain digital twin. Using SAS Visual Analytics, Figure 2 displays the daily quantity of production to take place on each production line in two factories of the production channel. It further illustrates the division of the production plan into hourly buckets with pre-specified production start and end times.</p>
<p>You might wonder where this production plan is coming from. It comes from a production plan optimization solved using SAS Optimization. The supply chain network simulation takes the optimized production plan as an input. It then uses SAS Simulation Studio to quantify the risk of implementing this plan under uncertainty. SAS DATA and PROC steps provide feedback on how the production plan can be improved with an in-depth analysis of the simulation-generated output data. This is how we bring optimization and simulation into a closed-loop reflecting continuous learning in digital twin development.</p>
<div id="attachment_9612" style="width: 1267px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-2.png"><img aria-describedby="caption-attachment-9612" loading="lazy" class="wp-image-9612 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-2.png" alt="" width="1257" height="607" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-2.png 1257w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-2-300x145.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-2-1024x494.png 1024w" sizes="(max-width: 1257px) 100vw, 1257px" /></a><p id="caption-attachment-9612" class="wp-caption-text">Figure 2 Visualization of a production plan over the course of a week</p></div>
<h2>Supply chain KPI prediction</h2>
<p>Combining supply chain network flow with all necessary pieces of information and data completes the characterization of the supply chain logic. This logic is represented in SAS Simulation Studio to mimic the flow of thousands of products. These products are starting as raw materials and ending as finished goods through a network of suppliers, factories, warehouses, and customers with transportation fleets operating among different tiers of the supply chain. However, this is done at a level of detail that is needed to support the way the client plans to use the supply chain digital twin. Furthermore, the supply chain digital twin design aligns with the speed of decision-making.</p>
<p>The representation of the supply chain logic in SAS Simulation Studio is followed by the characterization of the uncertainty in the random supply chain inputs. Examples of random inputs include supplier lead-time, production and assembly times, transportation time, and customer demand size. Figure 3 illustrates two example probability density functions - obtained from historical data - to represent the randomness in production-time and transportation-time processes. SAS Visual Analytics plots each of these density functions with the x-axis providing all the possible values production time or transportation time might take. The y-axis shows how likely it is to observe each of these values.</p>
<p>During the simulation of the supply chain network flow, SAS Simulation Studio captures input uncertainty by sampling from these density functions, propagates it throughout the supply chain network, and embeds the effect of uncertainty into the simulation-generated output data.</p>
<p>The next step is to design the “what-if” games. This is done by specifying the levers that the client could be changing in the search for better ways to manage the supply chain. If the purpose of developing the supply chain digital twin is to help improve the management of inventory, then the levers could be chosen as the inventory control parameters. If the goal is to support fleet management, then the levers would likely be chosen as the attributes of the transporters. The lever selection could be further customized to support supply chain reconfiguration decisions.</p>
<p>Next, we execute the supply chain network simulation. Consequently, we generate vast amounts of data representative of how the supply chain might perform in the future. Furthermore, we amass this large amount of data very quickly. By taking advantage of SAS Visual Analytics, Figure 3 shows how box-and-whisker plots illustrate risk profiles for the supply chain fill rate and the total cost. Therefore, not only do we predict the expected values of the two key supply chain KPIs, but we also quantify the risk in their predicted values.</p>
<div id="attachment_9618" style="width: 1470px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-3.png"><img aria-describedby="caption-attachment-9618" loading="lazy" class="wp-image-9618 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-3.png" alt="" width="1460" height="575" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-3.png 1460w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-3-300x118.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-3-1024x403.png 1024w" sizes="(max-width: 1460px) 100vw, 1460px" /></a><p id="caption-attachment-9618" class="wp-caption-text">Figure 3 Combining logic, input uncertainty, and game design to predict KPIs</p></div>
<h2>Identification of corrective actions</h2>
<p>Our analysis of the supply chain simulation output data is not limited to the prediction of supply chain fill rate and total cost. The supply chain network simulation is designed to generate time-stamped data. This shows the flow of every raw material, work-in-process, and finished goods through the supply chain. This includes the temporal usage of every resource including people, machines, facilities, and transportation fleet across many possible futures that might realize. Using SAS DATA and PROC steps to analyze this supply chain output data set leads to the time-based operational KPIs. This includes flow times, capacity utilization, and inventory fluctuations. Further analysis of these KPIs results in the identification of the supply chain bottlenecks.</p>
<p>Using SAS Visual Analytics, Figure 4 illustrates how the temporal study of supply chain fill rate together with factory inventory and utilization identifies a specific factory as the bottleneck. It causes the supply chain fill rate to fall below 50% between the 6th and 19th months of the time horizon. This simulation-data-driven insight is critical for the identification of a corrective action to help better manage the supply chain under uncertainty.</p>
<p>Visualize the following situation. An action to prescribe for the supply chain shortages, illustrated in Figure 4, is adding capacity to the overutilized factory. This can be done in three different ways:</p>
<ol>
<li>Purchasing a flexible machine that can perform multiple tasks (solution A).</li>
<li>Allowing weekend production (solution B).</li>
<li>Adding a third shift (solution C).</li>
</ol>
<div id="attachment_9642" style="width: 919px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-4.png"><img aria-describedby="caption-attachment-9642" loading="lazy" class="size-full wp-image-9642" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-4.png" alt="" width="909" height="552" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-4.png 909w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-4-300x182.png 300w" sizes="(max-width: 909px) 100vw, 909px" /></a><p id="caption-attachment-9642" class="wp-caption-text">Figure 4 An Illustrative supply chain digital twin KPI dashboard</p></div>
<p>The supply chain digital twin can be used to select which one (or a combination) of these actions to take. Figure 5 illustrates an example (operational) risk profile of the supply chain fill rate for each candidate solution. It suggests that the purchase of a flexible machine (solution A) would not only maximize the supply chain fill rate but also reduce the level of risk exposure. Similar information can be readily obtained from a supply chain digital twin for a wider variety of operational and financial KPIs. This enables the client to make decisions with more information.</p>
<div id="attachment_9645" style="width: 713px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-5.png"><img aria-describedby="caption-attachment-9645" loading="lazy" class="wp-image-9645 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-5.png" alt="Digital twin of a supply chain - optimizing Supply Chain Fill Rate under Uncertainty" width="703" height="382" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-5.png 703w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/biller-digital-supply-chain-twin-figure-5-300x163.png 300w" sizes="(max-width: 703px) 100vw, 703px" /></a><p id="caption-attachment-9645" class="wp-caption-text">Figure 5 Optimizing supply chain fill rate under uncertainty</p></div>
<h2>Digital twin of a supply chain summary</h2>
<aside class="modern-quote pull alignright">We built digital twins customized to our client's supply chains through the integrated use of multiple SAS applications</aside>
<p>This post covers the development of a supply chain digital twin. We helped a SAS client gain enhanced visibility into the future of supply chain network operations. At the heart of this custom-built digital twin lies a flexible, data-driven, and scalable network simulation. Because this simulation can be viewed as a large supply-chain data generation program, KPI prediction and optimization can be accelerated through integration with SAS Visual Data Mining and Machine Learning.</p>
<p>We built digital twins customized to our client's supply chains through the integrated use of multiple SAS applications:</p>
<ul>
<li>to represent, visualize (SAS Visual Analytics)</li>
<li>simulate (SAS Simulation Studio)</li>
<li>analyze (SAS Visual Data Mining and Machine Learning)</li>
<li>optimize (SAS Optimization) the supply chain operations.</li>
</ul>
<p>The resulting capability equips our clients with the power to predict supply chain KPIs. It also quantifies the risk in the KPI predictions. As well, they can identify the corrective actions maximizing profit in addition to improving resiliency. The supply chain digital twin also serves as a virtual laboratory to assess the impact of disruptions, operational policies, and investment decisions.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/18/how-sas-developed-a-digital-twin-of-a-supply-chain/">How SAS developed a digital twin of a supply chain</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=iSe6c8uSPJ4:9Z7yCgpSBWw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/10/18/how-sas-developed-a-digital-twin-of-a-supply-chain/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/supply-chain2-150x150.jpg" />
	</item>
		<item>
		<title>Who else wants to know the mystery behind GANs?</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/10/13/the-mystery-behind-gans/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/10/13/the-mystery-behind-gans/#respond</comments>
		
		<dc:creator><![CDATA[Susan Kahler]]></dc:creator>
		<pubDate>Wed, 13 Oct 2021 15:00:35 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data scientist]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[GAN]]></category>
		<category><![CDATA[StyleGan]]></category>
		<category><![CDATA[Tabular GAN. synthetic data]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9714</guid>

					<description><![CDATA[<p>Generative adversarial networks (GANs) are one of the newer machine learning algorithms that data scientists are tapping into. When I first heard it, I wondered how can networks be adversarial? I envisioned networks with swords drawn going at it. Close… but I can assure you that no networks were harmed in the making of this article.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/13/the-mystery-behind-gans/">Who else wants to know the mystery behind GANs?</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Generative adversarial networks (GANs) are one of the newer machine learning algorithms that data scientists are tapping into. When I first heard it, I wondered how can networks be adversarial? I envisioned networks with swords drawn going at it. Close… but I can assure you that no networks were harmed in the making of this article.</p>
<p>Let’s break <font color="red"><strong>GAN</strong></font> down further to understand how this algorithm works and dispel the mystery behind it.</p>
<ul>
<li><font color="red"><strong>G</strong></font>enerative model: A statistical model that can generate new data. This includes the distribution of the data.</li>
<li><font color="red"><strong>A</strong></font>dversarial training process: There are two networks involved in training. One network generates the data (the generator) while the other network tries to discriminate (the discriminator) if that data is real or fake. If it is deemed to be fake, the generator is notified and tries to improve on the next batch of generated data. Therefore, the two networks are training against each other, hence the adversarial part.</li>
<li><a href="https://www.sas.com/en_us/insights/analytics/deep-learning.html">Deep learning</a> <font color="red"><strong>N</strong></font>etworks: Deep learning methods use neural network architectures to process data, which is why they are often referred to as deep neural networks.</li>
</ul>
<h2>Why on earth would you want to use a GAN?</h2>
<p>Now that you know what a GAN is, what do you do with it? You may have heard of deepfakes and enjoyed seeing videos of political leaders uttering some unbelievable statements. (Somedays, I wonder how we would know the difference!) Other than playing tricks on the world, GANs do have a valuable purpose.</p>
<p>Deep learning models are data-hungry. What if you could just snap your fingers and grow your training data set? Well, GANs can help you create synthetic data for those deep learning models. Synthetic data, or artificial data, serves as proxy data because it maintains the statistical characteristics of the real-world data that it is based off. Synthetic data should generate observations based on existing variable distributions and preserve correlations amongst the variables in the data set.</p>
<p>Deepfakes typically use image data and the type of GAN to create synthetic image data is called a styleGAN. However, other types of data such as tabular data (think rows and columns of integers, text, etc.) can also be created. This is a tabular GAN.</p>
<p>Watch SAS Data Scientist, Brett Wujek, talk about StyleGANs in the SAS Viya Release Highlights (2021.1.2).<br />
<center><br />
<iframe title="StyleGANs, Custom Performance Reports | SAS Viya 2021.1.2" width="702" height="395" src="https://www.youtube.com/embed/23XbbdxgdxM?start=310&feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe><br />
</center></p>
<p>I see lots of potential with GANs and synthetic data. Synthetic data allows you to create deep learning models when you may not have previously been able to do so. There simply may not be the volume of data available that is required, especially when you are working with new products or processes. Data may also be expensive and time-consuming to acquire from third-party resources or through data collection methods such as surveys and studies. Synthetic data may also help fulfill the gaps in underrepresented groups such as customer segments, regions, or even the different driving conditions required by computer vision models for self-driving cars. Lastly, because this data is generated, it does not impact human privacy (think GDPR and personal data sharing regulations) and is less risky should the data be breached.</p>
<p>To remind us, that while synthetic data has the potential to help us progress with deep learning, the patterns in the synthetic data must be representative of the real data and should be verified as an initial step in the modeling process. </p>
<h3>Learn more</h3>
<p>GANs are available in the <a href="https://www.sas.com/en_us/software/data-science-offerings.html">SAS Data Science Offerings</a>.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/13/the-mystery-behind-gans/">Who else wants to know the mystery behind GANs?</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=sMeWt1okLtk:TMdgoRdqheM:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/10/13/the-mystery-behind-gans/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/images_GAN_model-150x150.png" />
	</item>
		<item>
		<title>Art or science? Choosing the right regression model</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regression-model/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regression-model/#respond</comments>
		
		<dc:creator><![CDATA[Udo Sglavo]]></dc:creator>
		<pubDate>Mon, 11 Oct 2021 13:00:25 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Analytics R&D]]></category>
		<category><![CDATA[peace of mind]]></category>
		<category><![CDATA[SAS Econometrics]]></category>
		<category><![CDATA[SAS Visual Statistics]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9549</guid>

					<description><![CDATA[<p>SAS' Udo Sglavo and Jan Chvosta discuss the power of a regression framework and choosing the correct regression model.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regression-model/">Art or science? Choosing the right regression model</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div id="attachment_9471" style="width: 160px" class="wp-caption alignright"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan.jpg"><img aria-describedby="caption-attachment-9471" loading="lazy" class="wp-image-9471 size-thumbnail" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan-150x150.jpg" alt="Photo of Jan Chvosta" width="150" height="150" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan-150x150.jpg 150w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan.jpg 200w" sizes="(max-width: 150px) 100vw, 150px" /></a><p id="caption-attachment-9471" class="wp-caption-text">Jan Chvosta</p></div>
<p><strong><em>Note from <a href="https://blogs.sas.com/content/subconsciousmusings/author/udosglavo/">Udo Sglavo</a></em>: </strong><em>In our <a href="https://blogs.sas.com/content/subconsciousmusings/2021/10/05/which-regression-technique-is-appropriate-for-my-data/">previous post</a>,<a href="https://blogs.sas.com/content/subconsciousmusings/author/janchvosta/"> Jan Chvosta</a>, the director of Scientific Computing at SAS, and I discussed the origins of regression analysis and some of the ways it is used today. Now we will further discuss the power of regression framework and choosing the correct regression model.</em></p>
<h3>Udo: In the previous post we discussed the power of the regression framework. In a way, all practitioners are attempting to accomplish a similar task. They want to choose the best regression model and fit it to the data available.  It sounds simple in principle but deciding on what is the best might be tricky. How would you go about this in the case of regression analysis?</h3>
<p><strong>Jan</strong>: In principle, this sounds simple but there is much to be considered in the process. In the first post, we talked a lot about linear regression but in reality, there are many more types that need to be considered.  It begins with understanding your data and choosing the right regression analysis type. For example, if your data is binary, logistic regression might be a good choice. There are many questions that need to be asked and decisions that need to be made to choose the right regression model. <a href="https://support.sas.com/en/software/visual-statistics-support.html#documentation">SAS<sup>®</sup> Visual Statistics</a> and <a href="https://support.sas.com/en/software/sas-econometrics.html#documentation">SAS<sup>®</sup> Econometrics</a> documentation can help with these decisions. If you are new to regression analysis, <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/statug/statug_introreg_toc.htm">Introduction to Regression Procedures</a> can help to get you started.</p>
<h3><strong>Udo: We often hear about regression assumptions that are critical for the framework to work. Could you explain what this means? </strong></h3>
<p><strong>Jan</strong>: These assumptions are crucial because they play an important role in determining whether the model will be a success or failure. Each regression type has its own set of assumptions that you need to consider and evaluate. If they are too restrictive, you might need to think about a different model. For example, if you are fitting a linear regression you need to:</p>
<ul>
<li>confirm that the relation between your response and explanatory variables is linear</li>
<li>your observations are independent of one another</li>
<li>variance of the residual is constant for all observations</li>
<li>residuals are normally distributed</li>
</ul>
<h3>Udo: What happens if the assumptions are violated?</h3>
<aside class="modern-quote pull alignright">If we choose the wrong regression model, our parameter estimates, inference about them, and even predictions can be all wrong.</aside>
<p><strong>Jan</strong>: I think we can illustrate this with a situation we face in our daily lives. We might assume that there isn’t a traffic jam during our morning commute to work thus we make it to work on time. If our first assumption is wrong, however, and there is a traffic jam, then our prediction that we make it to work on time is also wrong. It is similar to regression analysis. If we choose the wrong regression model, our parameter estimates, inference about them, and even predictions can be all wrong.</p>
<h3><strong>Udo: This seems like a complex problem. Can you provide us with some guidelines on how to choose the correct model?</strong></h3>
<p><strong>Jan</strong>: There are many ways to go about this and fortunately SAS is here to help. SAS software can address a variety of models based on different distributional assumptions and data complexities. It can also handle tall and wide data, and overall help you with your modeling efforts. You have a set of very powerful tools available. But the success of a modeling effort is often decided even before statistical or econometric modeling starts.</p>
<aside class="modern-quote pull alignright">SAS provides robust tools for users to expand their modeling capabilities and predictive modeling power.</aside>
<p>The idea is that you need to understand what you want prior to any analysis, regression or not. You need to ask yourself many questions before you even start fitting a model. For example, you might ask if a simple but interpretable model is more important to you than a more complex model with higher prediction accuracy.</p>
<p>Many questions will point to your data. Are your data observations discrete, count, rare events, independent in time? Do they follow a pattern? Is this analysis univariate or multivariate? In all of these situations, SAS provides robust tools for users to expand their modeling capabilities and predictive modeling power. However, the users play an essential role because they control the modeling process and make important decisions. The questions asked and tools used will also likely depend on the field specialization of the users.</p>
<h3>Udo: Are you suggesting that scientists working in different fields would have a different approach to regression analysis?</h3>
<p><strong>Jan</strong>: Yes, domain knowledge is important. It comes into play when you need to build a sensible and interpretable scientific model. The domain expert tells you what they need from the model. Regression analysis experts (such as statisticians, econometricians, or data scientists) can provide data modeling inputs that are more appropriate to the choice of regression analysis. The domain knowledge is also necessary for correctly interpreting the results. The two need to work hand in hand to achieve a better fit, a better prediction, more robustness, and better interpretability.</p>
<h3>Udo: We touched on many important issues and it is wonderful to hear that we have many regression tools available at SAS. Can you provide a high-level overview of these tools?</h3>
<p><strong>Jan</strong>: Most of our regression tools are available in SAS Visual Statistics and SAS Econometrics. There is certainly more than one way to group these tools. I am going to choose a grouping that follows a frequently used modeling decision process. It also highlights the procedures we have available for each of the areas.</p>
<p>Figure 1 depicts regression procedures in SAS Visual Statistics. They are grouped according to whether the response variable is categorical or continuous, models for means, quantiles, Generalized Linear Models, nonparametric, semiparametric, or parametric approaches.</p>
<div id="attachment_9570" style="width: 827px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-1.png"><img aria-describedby="caption-attachment-9570" loading="lazy" class="size-full wp-image-9570" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-1.png" alt="Figure 1: Regression Procedures" width="817" height="900" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-1.png 817w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-1-272x300.png 272w" sizes="(max-width: 817px) 100vw, 817px" /></a><p id="caption-attachment-9570" class="wp-caption-text">Figure 1: Regression Procedures</p></div>
<p>Figure 2 depicts regression procedures available in SAS Econometrics. The econometric regression tools can be grouped into count data modeling, cross-sectional data regression, spatial data regression, and panel data regression. The time series regression procedures can be grouped into univariate and multivariate analyses.</p>
<div id="attachment_9573" style="width: 739px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-2.png"><img aria-describedby="caption-attachment-9573" loading="lazy" class="size-full wp-image-9573" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-2.png" alt="Figure 2: SAS Econometric and Time Series Regression Procedures" width="729" height="777" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-2.png 729w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-b-figure-2-281x300.png 281w" sizes="(max-width: 729px) 100vw, 729px" /></a><p id="caption-attachment-9573" class="wp-caption-text">Figure 2: SAS Econometric and Time Series Regression Procedures</p></div>
<h3>Udo: What are some recent additions to SAS Visual Statistics regression analysis?</h3>
<p><strong>Jan</strong>:  Regression analyses have been enhanced in SAS Visual Statistics and SAS Econometrics in many areas. I will just mention several recent ones.</p>
<p><a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/statug/statug_categories_sect005.htm">Causal inference</a> in SAS Visual Statistics is where regression techniques are used to not only model the data but also help in providing valid causal interpretations.</p>
<p>In addition, Bayesian computation regression techniques are implemented in several procedures. This includes the often-used <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/statug/statug_genmod_toc.htm">GENMOD procedure</a> that handles generalized linear regression models. As well, the <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/statug/statug_bglimm_toc.htm">BGLIMM procedure</a> provides regression modeling capabilities to fixed and clustered data. Combining multiple sources of information in a regression analysis brings unique advantages in numerous applications.</p>
<p>Model selection is an inevitable frontier in regression analysis, as the dimensions of the data get to be both incredibly wide and long. <a href="https://support.sas.com/en/software/sas-stat-support.html">SAS/STAT<sup>®</sup></a> provides a broad spectrum of tools in helping you to select variables in big data. This ranges from traditional but optimized methods such as forward and stepwise, to shrinkage methods, such as LASSO, SCAD, and MCP, to projective methods, such as principal components regression, to model averaging. These techniques are developed in many regression tools, such as linear, GLM, quantile regression, cox proportional-hazard models, as well as nonparametric and semiparametric additive models.</p>
<h3>Udo: Are there also recent additions to SAS Econometrics regression analysis?</h3>
<p>In econometrics, the Hidden Markov Models (HMM) is an advanced regression tool for time series analysis. In HMMs, there are hidden unobserved states, and at different states, the observations follow different regression models.</p>
<p>State Space Modeling (SSM) is another quickly growing area in time series because almost all kinds of time series models can be written in the state space form. The <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/casecon/casecon_cssm_toc.htm">CSSM procedure</a> was added to SAS Econometrics in the 2020.1.3 release and will soon introduce scenario analysis, forecasting, and monitoring of streamed data.</p>
<p>Spatial data regression available in SAS Econometrics is another relatively new tool that is quickly gaining popularity. I think it is not that hard to imagine that observations are often spatially dependent and omitting that dependence from your model can impact your results.</p>
<h3>Udo: Anything else you would like to share with our readers?</h3>
<p><strong>Jan</strong>: When it comes to regression analysis, there are many procedures and action sets to choose from. There is a lot to learn and explore. Mastering regression analysis and applying them to your research or business problems will certainly have a great impact on meeting your goals and objectives regardless of your area of interest. With SAS you have a very compelling set of tools available. Learning and exploring them can be fun as well as having a great impact on your business.</p>
<p>For more information on spatial econometric modeling:</p>
<ul>
<li><a href="https://blogs.sas.com/content/subconsciousmusings/2021/03/02/spatial-econometric-modeling-unleashes-the-geographic-potential-of-your-data/">Spatial econometric modeling unleashes the geographic potential of your data</a></li>
<li><a href="https://blogs.sas.com/content/subconsciousmusings/2021/08/09/automate-spatial-regression-model-selection-using-proc-cspatialreg/">Automate spatial regression model selection using proc cspatialreg</a></li>
</ul>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regression-model/">Art or science? Choosing the right regression model</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=MX_7hTCndIk:p8_3y1ap43I:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/10/11/art-or-science-choosing-the-right-regression-model/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/art-or-science-150x150.jpg" />
	</item>
		<item>
		<title>Which regression technique is appropriate for my data?</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/10/05/which-regression-technique-is-appropriate-for-my-data/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/10/05/which-regression-technique-is-appropriate-for-my-data/#respond</comments>
		
		<dc:creator><![CDATA[Udo Sglavo]]></dc:creator>
		<pubDate>Tue, 05 Oct 2021 05:00:12 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Analytics R&D]]></category>
		<category><![CDATA[linear regression]]></category>
		<category><![CDATA[peace of mind]]></category>
		<category><![CDATA[SAS Econometrics]]></category>
		<category><![CDATA[SAS Visual Statistics]]></category>
		<category><![CDATA[SAS Viya]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9435</guid>

					<description><![CDATA[<p>SAS' Udo Sglavo interviews colleague Jan Chvosta, director of Scientific Computing at SAS, on regression analysis and how it works.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/05/which-regression-technique-is-appropriate-for-my-data/">Which regression technique is appropriate for my data?</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div id="attachment_9471" style="width: 210px" class="wp-caption alignright"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan.jpg"><img aria-describedby="caption-attachment-9471" loading="lazy" class="size-full wp-image-9471" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan.jpg" alt="Photo of Jan Chvosta" width="200" height="200" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan.jpg 200w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-jan-150x150.jpg 150w" sizes="(max-width: 200px) 100vw, 200px" /></a><p id="caption-attachment-9471" class="wp-caption-text">Jan Chvosta</p></div>
<p><em><strong>Note from <a href="https://blogs.sas.com/content/subconsciousmusings/author/udosglavo/">Udo Sglavo</a></strong></em>:<em> In many different areas of data science, we often talk about which regression technique to use. There are many other important tools outside of the regression framework. But it seems though that regression is the tool of choice. Interestingly, if you mention the word regression to a group of data scientists, it never fails to create an interesting discussion. They will passionately discuss what the concept means to them. However, the discussion will be heavily influenced by the field of their expertise and experience. We all grasp the general concept, but it seems that regression means different things to different groups of practitioners. In the first of two blog posts, I discuss the origins of regression analysis and how it works with <a href="https://blogs.sas.com/content/subconsciousmusings/author/janchvosta/">Jan Chvosta</a>, the director of Scientific Computing at SAS.</em></p>
<h3>Udo: Linear regression has been around for a very long time. Can you discuss its origins?</h3>
<p><strong>Jan</strong>: The idea of regression goes back to the 19<sup>th</sup> century and the research work Sir Francis Galton did concerning the hereditary characteristics of sweat pea plants. In his experiments, he discovered that children of plants with very high weight didn’t necessarily possess the same characteristic. On average they were closer to the average population weight than their parents were. In other words, Galton observed that some extreme characteristics weren’t completely passed to the offspring but instead these characteristics tended to regress towards a standard point. He repeated the experiment hundreds of times discovering the regression to the mean principle. In addition to many other areas, today's modern regression analysis is a powerful tool also used in the field of genetics that Galton was pioneering.</p>
<h3>Udo: Interesting story but how would you explain a regression today?</h3>
<aside class="modern-quote pull alignright">Simply put, linear regression is an approach that allows us to model a relationship between variables in a linear fashion.</aside>
<p><strong>Jan</strong>: If you were to conduct a quick search for books on regression analysis, you would find many different books written on the topic. Taking a closer look, you would realize that one theme is dominant - linear regression. Simply put, linear regression is an approach that allows us to model a relationship between variables in a linear fashion. Response, also known as the dependent variable, is modeled by one or more explanatory (independent) variables. If there is just one explanatory variable, we get a simple linear regression. If there is more than one explanatory variable, we get multiple linear regression. I would start with linear regression.</p>
<h3>Udo: What is the intuition behind linear regression? Can you explain how it works?</h3>
<p><strong>Jan</strong>: In the case of multiple linear regression, it provides a very intuitive explanation of how a unit change in one independent variable (say age) impacts the dependent variable (say weight), holding all other variables (age, height, gender) in the model constant. The simplicity makes the modeling and its interpretation accessible to a wild range of audiences. It also often works well in applications. The extension from a simple regression to multiple regression leads to the capability of having many covariates jointly predict a response variable. Not only does this frequently work better than simple regressions, but it also provides a better understanding of the association of variables on the outcome. Figure 1 shows The PROC REG output from a multiple linear regression of children's weight on age, height, and gender.</p>
<div id="attachment_9447" style="width: 368px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-1.png"><img aria-describedby="caption-attachment-9447" loading="lazy" class="wp-image-9447 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-1.png" alt="Regression Technique Figure 1: Multiple Linear Regression of Children Weight" width="358" height="159" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-1.png 358w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-1-300x133.png 300w" sizes="(max-width: 358px) 100vw, 358px" /></a><p id="caption-attachment-9447" class="wp-caption-text">Figure 1: Multiple Linear Regression of Children Weight</p></div>
<p>If you hold all other variables constant, it is not surprising that age has a positive impact on weight. One month of age increases the weight of a child by 0.43 lbs. There is also a positive impact of height on weight. One inch increase in height increases the weight by 0.49 lbs. Both of these variables are statistically significant from zero at a 0.01% level of significance. The regression results seem to suggest that if the child is a female, she weighs less by 0.73 lbs. However, the female variable is not statistically significant from zero at the 10% level. This small example demonstrates clearly that regression analysis can be a powerful tool.</p>
<h3>Udo: A picture is often worth a thousand words. Is there a way to show a linear regression with a picture?</h3>
<p><strong>Jan:</strong> You can think of a linear regression as a modeling approach where you fit a line to data points in such a way that it minimizes departures (squared) of all points from this line. The <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_017/statug/statug_reg_toc.htm">REG procedure</a> allows you to fit many different linear and even some nonlinear regression models. Figure 2 shows an example of PROC REG fitted linear regression of weight on age along with information about the data and regression fit.</p>
<div id="attachment_9450" style="width: 650px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-2.png"><img aria-describedby="caption-attachment-9450" loading="lazy" class="wp-image-9450 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-2.png" alt="Regression Technique Figure 2: Fit Plot for Weight" width="640" height="480" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-2.png 640w, https://blogs.sas.com/content/subconsciousmusings/files/2021/10/chvosta-linear-regression-part-a-figure-2-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a><p id="caption-attachment-9450" class="wp-caption-text">Figure 2: Fit Plot for Weight</p></div>
<p>Interestingly, even this fit plot can be tied back to some of the work that Sir Francis Galton did. He was also plotting the mother and daughter pea plant data. He was trying to explain the relationship between them by putting a straight line through the data. Not having the modern software, however, it was a lot of work.</p>
<h3>Udo: Thank you for describing the principles of linear regression. Can you elaborate more on its use?</h3>
<aside class="modern-quote pull alignright">..the relationship between variables matters and regression analysis can help you to explore it.</aside>
<p><strong>Jan</strong>: At a very high level, the relationship between variables matters, and regression analysis can help you to explore it. Imagine you are running a marketing campaign trying to increase sales of your company. You collected the data from similar campaigns in the past and now you are trying to understand how various marketing options impacted the sales. Fitting a regression can help you to determine what marketing options (explanatory variables) had a significant impact on sales (response variable). It can also help you to predict the sales for various combinations of marketing options and find the best combination to maximize your sales.</p>
<h3>Udo: Are you suggesting that regression analysis can be used for predictive modeling?</h3>
<p><strong>Jan</strong>: Yes, regression analysis can definitely help you if you are interested in predictive modeling. For example, if you think about picturing a regression in linear form with estimated coefficients from Figure 2, you just need to plug in values for your inputs (age, height, gender) to predict the dependent variable (weight). You can even get predictions for values of age, height, and gender that weren’t available in the dataset you used to estimate your coefficients.</p>
<h3>Udo: Regression analysis has many different aspects. How would you go about learning more?</h3>
<p><strong>Jan</strong>: Regression analysis is a very powerful tool. You need to understand how to correctly use it. <a href="https://support.sas.com/en/software/visual-statistics-support.html#documentation">SAS<sup>®</sup> Visual Statistics</a> and SAS<sup>®</sup> Econometrics documentation is a great resource with many examples.  If you are entirely new to regression analysis in SAS, <a href="https://support.sas.com/documentation/onlinedoc/stat/141/introreg.pdf">Introduction to Regression Procedures</a> provides a great overview. In our next post. we further discuss the regression framework and choosing the correct models.</p>
<a href="https://www.sas.com/en_us/software/econometrics.html" class="sc-button sc-button-default"><span>
<span class="btnheader">LEARN MORE |</span> SAS Econometrics </span></a>
<a href="https://www.sas.com/en_us/software/visual-statistics.html" class="sc-button sc-button-default"><span>
<span class="btnheader">LEARN MORE |</span> SAS Visual Statistics </span></a>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/10/05/which-regression-technique-is-appropriate-for-my-data/">Which regression technique is appropriate for my data?</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=ydvWlKDFEYQ:C-t0pEE7PiY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/10/05/which-regression-technique-is-appropriate-for-my-data/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/10/sweet-peas-2-150x150.jpg" />
	</item>
		<item>
		<title>SAS Leads Market 28 Years in a Row, IDC Advanced &amp; Predictive Analytics Software</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/09/30/sas-leads-market-28-years-in-a-row/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/09/30/sas-leads-market-28-years-in-a-row/#respond</comments>
		
		<dc:creator><![CDATA[Briana Ullman]]></dc:creator>
		<pubDate>Thu, 30 Sep 2021 12:59:51 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[advanced and predictive analytics]]></category>
		<category><![CDATA[banking]]></category>
		<category><![CDATA[IDC]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9420</guid>

					<description><![CDATA[<p>IDC measures advanced &#038; predictive analytics in its annual Worldwide Business Intelligence and Analytics Software Market Shares* report – and has consistently ranked SAS as the #1 market leader for over two decades!</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/30/sas-leads-market-28-years-in-a-row/">SAS Leads Market 28 Years in a Row, IDC Advanced &#038; Predictive Analytics Software</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>28 years ago, Seinfeld was a staple on our televisions and Doc Martens were as popular as the Nirvana CDs flying off shelves. While the 90s may be back again, some good things never left – like SAS fueling advanced & predictive analytics innovation for 28 years running.</p>
<p><a href="https://www.sas.com/en_us/insights/analytics/predictive-analytics.html">Advanced and predictive analytics</a> software includes data mining and statistical software and uses techniques including machine learning, regression, neural networks, rule induction, and clustering to create, test, and execute statistical models.</p>
<p>IDC measures advanced & predictive analytics in its annual Worldwide Business Intelligence and Analytics Software Market Shares* report – and has <a href="https://www.sas.com/en_us/news/press-releases/2020/october/sas-a-leader-in-idc-marketscape-for-advanced-machine-learning-software-platforms.html">consistently ranked SAS as the #1 market leader</a> for over two decades!</p>
<p>Advanced and predictive analytics can be used to discover relationships in data and dig into that data to make predictions. SAS helps customers gain these insights through our analytics technology.</p>
<blockquote><p>
“In the advanced and predictive analytics market, SAS represents over a quarter of the market share, with continued focus on statisticians and data scientists and the broader need for advanced analytics.”<br />
—Dan Vesset, IDC Group Vice President, Analytics and Information Management
</p></blockquote>
<p>Let's take a look at how advanced and predictive analytics from SAS helped a customer achieve their business goals:</p>
<p>OTP Bank Romania is part of OTP Bank Group, one of the largest financial institutions in Central and Eastern Europe. They relied on a combination of data mining and predictive analytics to gain insights from an increasing amount of data – and make sure they’re not missing out on ways to increase customer retention. They were able to quickly develop descriptive and predictive models through a streamlined data mining process.</p>
<p><a href="https://www.sas.com/en_us/customers/otp-bank-romania.html">SAS gave OTP Bank Romania</a> the ability to harness algorithms and techniques including decision trees, time series analysis, neural networks, linear and logistic regression, sequence and web path analysis, market basket analysis and link analysis. The bank relied on SAS as a workbench and a set of tools for statisticians or data scientists, which increased collaboration with analysts to improve their business.</p>
<p>They began using predictive analytics to meet the needs of their modeling team. This allowed the bank to ensure control over the quality of loan originations, achieve more accurate prediction of business and risk outcomes, and meet profitability targets required for the bank’s loan portfolios.<br />
OTP Bank Romania gained the ability to:</p>
<ul>
<li>More easily work with databases. </li>
<li>Conduct new analyses based on performance indicators. </li>
<li>Develop numerous models for a single scope. </li>
<li>Compare results. </li>
<li>Choose the best result to meet the bank’s objectives.</li>
</ul>
<p>SAS brought valuable insights to OTP Bank Romania in addition to 91 of the top 100 companies on the 2020 Fortune Global 500®. A growing part of the business intelligence market, advanced & predictive analytics from SAS can help organizations everywhere achieve their business goals.</p>
<p>Learn more about how <a href="https://www.sas.com/en_us/customers.html">SAS helps customers uncover insights using analytics</a> – and <a href="https://www.sas.com/en_us/trials.html">try SAS for yourself</a>.</p>
<p><small><tt>*IDC Business Intelligence and Analytics Software Market Share Reports, 1993-2021</tt></small></p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/30/sas-leads-market-28-years-in-a-row/">SAS Leads Market 28 Years in a Row, IDC Advanced &#038; Predictive Analytics Software</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=o3R5fcISWp8:zPKzIwd78a0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/09/30/sas-leads-market-28-years-in-a-row/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/linkedin-sales-solutions-46bom4lObsA-unsplash-150x150.jpg" />
	</item>
		<item>
		<title>Generating word embeddings</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/09/22/generating-word-embeddings/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/09/22/generating-word-embeddings/#respond</comments>
		
		<dc:creator><![CDATA[Sophia Rowland]]></dc:creator>
		<pubDate>Wed, 22 Sep 2021 15:08:07 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ELMo]]></category>
		<category><![CDATA[GloVe]]></category>
		<category><![CDATA[SAS Text Miner]]></category>
		<category><![CDATA[SAS Visual Text Analytics]]></category>
		<category><![CDATA[tmCooccur Action]]></category>
		<category><![CDATA[word embedding]]></category>
		<category><![CDATA[Word2Vec]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9370</guid>

					<description><![CDATA[<p>Word embeddings are the learned representations of words within a set of documents. Each word or term is represented as a real-valued vector within a vector space. Terms or words that reside closer to each other within that vector space are expected to share similar meanings. Thus, embeddings try to capture the meaning of each word or term through its relationships with the other words in the corpus.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/22/generating-word-embeddings/">Generating word embeddings</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Unstructured text data is often rich with information. This information can affect strategies, operations, and decisions in an organization. The information found in text can serve as inputs into machine learning models or visualizations in reports served to decision-makers. But the insights within text data aren't readily available. Text data requires processing to create structured data from documents in a corpus. There are numerous techniques available for text processing and text analytics, but today we will focus on generating word embeddings.</p>
<h2>Generating and using word embeddings</h2>
<p>Word embeddings are the learned representations of words within a set of documents. Each word or term is represented as a real-valued vector within a vector space. Terms or words that reside closer to each other within that vector space are expected to share similar meanings. Thus, embeddings try to capture the meaning of each word or term through its relationships with the other words in the corpus.</p>
<p><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word-Embeddings.png"><img loading="lazy" class="aligncenter size-full wp-image-9373" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word-Embeddings.png" alt="" width="648" height="587" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word-Embeddings.png 648w, https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word-Embeddings-300x272.png 300w" sizes="(max-width: 648px) 100vw, 648px" /></a></p>
<p>Word embeddings are one way to create structured data from text. These embeddings can be used for finding similar terms or in machine translation. Additionally, embeddings can be used in machine learning models to classify sentiment or in document categorization.</p>
<p>There are several word embedding methods available but new techniques seem to move into <a href="https://chatbotslife.com/a-brief-tour-to-the-nlp-sesame-street-7bba02d75ae3" target="_blank" rel="noopener noreferrer">Sesame Street</a> at a rapid pace. Let's move chronologically through the development of a few of the most popular word embedding techniques.</p>
<h2>Early word embedding techniques</h2>
<p>SAS has been doing word embeddings since <a href="https://www.sas.com/en_us/software/text-miner.html" target="_blank" rel="noopener noreferrer">SAS Text Miner</a>, but our approach has changed over the years as new research and techniques have been developed. In SAS Text Miner, word embeddings started with the Term-by-Document matrix. This matrix represents each document in the corpus as a column and each term as a row. The values stored in each cell represent a weighted value of term frequency within a document. Even with a small corpus, this matrix could be large and sparse. A process called Singular Value Decomposition (SVD) can factor this large, sparse matrix into smaller, more information-dense matrices. Performing SVD on the Term-by-Document matrix results in a word embedding matrix as well as a document embedding matrix.</p>
<p><a href="https://arxiv.org/pdf/1301.3781.pdf" target="_blank" rel="noopener noreferrer">Word2Vec</a> emerged in 2013 through research at Google. Word2Vec takes a prediction-based approach to word embeddings. This approach uses a shallow neural network in one of two configurations. Within both configurations, the context for a word is defined by a predetermined number of terms before and after the given word, also known as a sliding window. In the first configuration, called Continuous Bag of Words (CBOW), the context of a word (i.e. terms around each word) is used as the inputs to a neural network, with the word itself as the output. The second configuration, called Skip-Gram, is like CBOW but reversed. The word itself is input to the neural network and the context is the output.</p>
<div id="attachment_9379" style="width: 814px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word2Vec.png"><img aria-describedby="caption-attachment-9379" loading="lazy" class="size-full wp-image-9379" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word2Vec.png" alt="" width="804" height="491" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word2Vec.png 804w, https://blogs.sas.com/content/subconsciousmusings/files/2021/09/Word2Vec-300x183.png 300w" sizes="(max-width: 804px) 100vw, 804px" /></a><p id="caption-attachment-9379" class="wp-caption-text">Source Mikolov et al. (2013) "Efficient Estimation of Word Representations in Vector Space"</p></div>
<p>In the following year, <a href="https://aclanthology.org/D14-1162.pdf" target="_blank" rel="noopener noreferrer">Global Vectors for Word Representation (GloVe)</a> emerged through research at Stanford. GloVe is an unsupervised approach that uses a co-occurrence matrix. The main idea behind the co-occurrence matrix is that similar words tend to occur together and have a similar context. Instead of using a Term-by-Document matrix, GloVe utilizes a symmetric Term-by-Term matrix, where each row and column represent a term and their matrix value represents their co-occurrence. This resulting matrix is also large and sparse. SVD or PCA (another dimensionality-reduction technique) is utilized to create smaller information-dense matrices, including our word embedding. And like Word2Vec, Glove utilizes a sliding window of terms to calculate co-occurrence.</p>
<p>Word2Vec and GloVe tend to show better results on semantic and syntactic word analogy tasks than the Term-by-Document matrix, but Word2Vec and GloVe don't do the best job on capturing context. Word2Vec and GloVe generate a single embedding for each word, which isn't great for words with the same spellings but different meanings. Thus, starting in 2018 we see several new methods materialize.</p>
<h2>A growing number of techniques</h2>
<p>In 2018, SAS released the <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casvtapg/cas-textutil-tmcooccur.htm" target="_blank" rel="noopener noreferrer">tmCooccur Action</a> in <a href="https://www.sas.com/en_us/software/visual-text-analytics.html" target="_blank" rel="noopener noreferrer">SAS Visual Text Analytics</a>. Expanding on GloVe, this method utilizes a sentence-level context to calculate term co-occurrence and can utilize the results of <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casvtapg/cas-textparse-tpparse.htm" target="_blank" rel="noopener noreferrer">SAS's text parsing action</a> to better capture context and multi-group phases such as noun-groups and entities.</p>
<p><a href="https://arxiv.org/pdf/1802.05365.pdf" target="_blank" rel="noopener noreferrer">Embeddings from Language Models (ELMo)</a> is another method that also uses the whole sentence to provide context. Unlike previously mentioned methods, ELMo is a pre-trained bi-directional Long Short-Term Memory (LSTM) model. LSTM is a type of Recurrent Neural Network (RNN). Recurrent Neural Networks are often used in text analytics because they can model sequential data through neuron feedback connections. LSTM adds a few components to RNN models to govern the neuron's output.</p>
<p><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/ELMO.png"><img loading="lazy" class="aligncenter size-full wp-image-9382" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/ELMO.png" alt="" width="629" height="293" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/ELMO.png 629w, https://blogs.sas.com/content/subconsciousmusings/files/2021/09/ELMO-300x140.png 300w" sizes="(max-width: 629px) 100vw, 629px" /></a></p>
<p>Another pre-trained model is the <a href="https://arxiv.org/pdf/1810.04805v2.pdf" target="_blank" rel="noopener noreferrer">Bidirectional Encoder Representation from Transformers (BERT)</a> released by Google. Like ELMo, BERT can generate a different embedding for each term based on its context. Unlike ELMo, BERT is using a Transformer. Transformers are another type of neural network that also handle sequential data well, but they don't always process data in order. For example, it may process the end of the sentence before the beginning.</p>
<div id="attachment_9385" style="width: 1026px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/BERT-and-ELMO.png"><img aria-describedby="caption-attachment-9385" loading="lazy" class="size-full wp-image-9385" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/BERT-and-ELMO.png" alt="" width="1016" height="328" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/BERT-and-ELMO.png 1016w, https://blogs.sas.com/content/subconsciousmusings/files/2021/09/BERT-and-ELMO-300x97.png 300w" sizes="(max-width: 1016px) 100vw, 1016px" /></a><p id="caption-attachment-9385" class="wp-caption-text">Source Devlin et al. (2019) "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"</p></div>
<h2>The best word embedding technique</h2>
<p>There are certainly a lot of different word embedding techniques. But just like prediction and classification models, there isn't a technique that is best for all uses cases and data sets. In <a href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3295-2019.pdf" target="_blank" rel="noopener noreferrer">benchmarking tests</a>, one approach may be better for word similarity tasks whereas another may be better for document categorization. With that in mind, experiment and test! If you've generated a word embedding using open source that you want to use on SAS Viya, you can bring it in using the <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casvtapg/n047bll9q5h0nln1uulc1qpo0b2y.htm" target="_blank" rel="noopener noreferrer">Word Vector Action</a>. Additionally, using the Scripting Wrapper for Analytics Transfer (SWAT) package, you can run this action and other analytics in SAS Viya from Python or R. With this new knowledge <em>embedded </em>in your mind, I hope you can make the most of your text data!</p>
<h2>Learn more about word embeddings</h2>
<ul>
<li><a href="https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3295-2019.pdf" target="_blank" rel="noopener noreferrer">The Wondrous New tmCooccur SAS® Cloud Analytic Services (CAS) Action and Some of Its Many Uses</a></li>
<li><a href="https://medium.com/analytics-vidhya/co-occurrence-matrix-singular-value-decomposition-svd-31b3d3deb305" target="_blank" rel="noopener noreferrer">Co-occurrence matrix &amp; Singular Value Decomposition (SVD)</a></li>
<li><a href="https://medium.com/data-science-group-iitr/word-embedding-2d05d270b285" target="_blank" rel="noopener noreferrer">Word embedding</a></li>
<li><a href="https://www.geeksforgeeks.org/word-embeddings-in-nlp/" target="_blank" rel="noopener noreferrer">Word Embeddings in NLP</a></li>
<li><a href="https://medium.com/compassred-data-blog/introduction-to-word-embeddings-and-its-applications-8749fd1eb232" target="_blank" rel="noopener noreferrer">Introduction to Word Embeddings and its Application</a></li>
<li><a href="https://go.documentation.sas.com/doc/en/tmref/14.3/n0wkpj99c809qln1sdf48w1d35lc.htm" target="_blank" rel="noopener noreferrer">Singular Value Decomposition</a></li>
<li><a href="https://www.geeksforgeeks.org/overview-of-word-embedding-using-embeddings-from-language-models-elmo/#:~:text=%20Embeddings%20from%20Language%20Models%20%28ELMo%29%20%3A%20,the%20complete%20sentence%20containing%20that%20word.%20More%20" target="_blank" rel="noopener noreferrer">Overview of Word Embedding using Embeddings from Language Models (ELMo)</a></li>
<li><a href="https://towardsdatascience.com/elmo-contextual-language-embedding-335de2268604" target="_blank" rel="noopener noreferrer">ELMo: Contextual language embedding</a></li>
</ul>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/22/generating-word-embeddings/">Generating word embeddings</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=gYDiDV84obc:kvoKz8QJPhw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/09/22/generating-word-embeddings/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/09/ELMO-150x150.png" />
	</item>
		<item>
		<title>Causal inference and policy evaluation with deep neural networks</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/09/07/causal-inference-and-policy-evaluation-with-deep-neural-networks/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/09/07/causal-inference-and-policy-evaluation-with-deep-neural-networks/#respond</comments>
		
		<dc:creator><![CDATA[Xilong Chen]]></dc:creator>
		<pubDate>Tue, 07 Sep 2021 12:30:39 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Analytics R&D]]></category>
		<category><![CDATA[PROC DEEPCAUSAL]]></category>
		<category><![CDATA[SAS Econometrics]]></category>
		<category><![CDATA[SAS Viya]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=9157</guid>

					<description><![CDATA[<p>SAS' Xilong Chen introduces the new DEEPCAUSAL procedure in SAS Econometrics for causal inference and policy evaluation and much more.</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/07/causal-inference-and-policy-evaluation-with-deep-neural-networks/">Causal inference and policy evaluation with deep neural networks</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>In this post, I will introduce the new <a href="https://documentation.sas.com/doc/en/pgmsascdc/default/casecon/casecon_deepcausal_toc.htm">DEEPCAUSAL procedure</a> in <a href="https://support.sas.com/en/software/sas-econometrics.html">SAS Econometrics</a> for causal inference and policy evaluation. It was introduced in the 2021.1.4 release of <a href="https://www.sas.com/en_us/software/viya.html">SAS Viya 4</a>. First, I review causal modeling and its challenges. Second, I discuss how machine learning techniques embedded in the semiparametric framework can help us to overcome some of these difficulties. Finally, I demonstrate the powerful and easy-to-use PROC DEEPCAUSAL.</p>
<h2>Overview of causal inference</h2>
<p>Cause-and-effect relationships have been puzzling humans for centuries. Which came first – the chicken or the egg? Whenever you ask why or what-if questions, you are indirectly trying to find answers about cause and effect. The causal inference can help you in your effort. It is the new science of providing the methods and tools for identifying the cause and measuring the effect.</p>
<p>In measuring the causal effect, the randomized experiment is always the gold standard. However, in most cases, the randomized experiment is too expensive or even impossible to conduct. For example, if you would like to know the effect smoking has on developing lung cancer, you can’t randomly assign someone to smoke. If you would like to know how having a college degree affects salary, you can’t randomly send someone to college. In real life, most causal analysis is based on observational data!</p>
<h2>Big-data challenge and machine learning tools</h2>
<p>Before discussing the challenges of estimating causal effects based on observational data, let’s first focus on the data itself. In the era of big data, it’s not surprising to see tens of or even hundreds of thousands of variables that are used to describe an object of interest (customer, patient, area, store). Some of those variables are often categorical (gender, race), count (number of siblings), or continuous (spending amount on a product last year). In this scenario, two variables are our main interests. They are the outcome variable, denoted as <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script>, and the treatment variable, denoted as <span class='MathJax_Preview'>\(\mathit{T}\)</span><script type='math/tex'>\mathit{T}</script>. For instance, in some personalized pricing cases, <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script> could represent demand, and <span class='MathJax_Preview'>\(\mathit{T}\)</span><script type='math/tex'>\mathit{T}</script> could represent whether a discount was offered or not. The causal model is typically researching the relationship between these two variables.</p>
<p>Later in this post, we will discuss the other variables (denoted as <span class='MathJax_Preview'>\(\mathit{X}\)</span><script type='math/tex'>\mathit{X}</script>) which might also be critical in the causal effect estimation. However, how to incorporate those high-dimensional mixed-discrete-and-continuous variables (<span class='MathJax_Preview'>\(\mathit{X}\)</span><script type='math/tex'>\mathit{X}</script>) into the estimation, especially when the relationship between all these variables might be in some unknown nonlinear forms is challenging. The classical linear regressions and the Generalized Linear Models might lead to misspecification. So, we might need some more powerful nonparametric tools. Due to the high dimensionality of the problem, using machine learning or deep learning tools might also be a good idea.</p>
<h2>Causation is more than a prediction</h2>
<p>Using machine learning and deep learning techniques certainly sounds promising. However, we might not be able to directly apply them in the causal effect estimation. Machine learning is suitable for prediction. Unfortunately, causation is more than a prediction. In the simplest and most common case, the treatment variable is binary. The person is treated or not, and the causal effect is defined as the difference between the potential outcome if the individual were treated,<span class='MathJax_Preview'>\(\mathit{Y(1)}\)</span><script type='math/tex'>\mathit{Y(1)}</script>, and the potential outcome if the individual were not treated, <span class='MathJax_Preview'>\(\mathit{Y(0)}\)</span><script type='math/tex'>\mathit{Y(0)}</script>. However, each individual is either treated or not. You can never observe both <span class='MathJax_Preview'>\(\mathit{Y(1)}\)</span><script type='math/tex'>\mathit{Y(1)}</script> and <span class='MathJax_Preview'>\(\mathit{Y(0)}\)</span><script type='math/tex'>\mathit{Y(0)}</script> for the same individual in the data. This effectively means, <span class='MathJax_Preview'>\(\mathit{Y(1)} - \mathit{Y(0)} \)</span><script type='math/tex'>\mathit{Y(1)} - \mathit{Y(0)} </script>, the target of your interest, is always missing for all individuals! Hence, we can’t use machine learning tools to directly predict the causal effect.</p>
<p>Instead of estimating the causal effect directly, we can consider breaking the algorithm into three steps. We then use the machine learning tools for the prediction in steps 1 and 2. In step 3, we estimate the average causal effect using results from machine learning steps 1 and 2 as follows:</p>
<ul>
<li>We estimate the relationship <span class='MathJax_Preview'>\(\mathit{f^{(1)}}\)</span><script type='math/tex'>\mathit{f^{(1)}}</script>(.) between <span class='MathJax_Preview'>\(\mathit{X}\)</span><script type='math/tex'>\mathit{X}</script> and <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script> by using the data of individuals who are treated, <span class='MathJax_Preview'>\(\mathit{E(Y^{(1)}) = f^{(1)}(X^{(1)})}\)</span><script type='math/tex'>\mathit{E(Y^{(1)}) = f^{(1)}(X^{(1)})}</script>, and the relationship <span class='MathJax_Preview'>\(\mathit{f^{(0)}}\)</span><script type='math/tex'>\mathit{f^{(0)}}</script>(.) between <span class='MathJax_Preview'>\(\mathit{X}\)</span><script type='math/tex'>\mathit{X}</script> and <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script> by using the data of individuals who are not treated, <span class='MathJax_Preview'>\(\mathit{E(Y^{(0)}) = f^{(0)}(X^{(0)})}\)</span><script type='math/tex'>\mathit{E(Y^{(0)}) = f^{(0)}(X^{(0)})}</script>, where <span class='MathJax_Preview'>\(\mathit{E}\)</span><script type='math/tex'>\mathit{E}</script>(.) is the expectation and <span class='MathJax_Preview'>\(\mathit{Y^{(T)}}\)</span><script type='math/tex'>\mathit{Y^{(T)}}</script> and <span class='MathJax_Preview'>\(\mathit{X^{(T)}}\)</span><script type='math/tex'>\mathit{X^{(T)}}</script> denote the data when <span class='MathJax_Preview'>\(\mathit{T}\)</span><script type='math/tex'>\mathit{T}</script> = 0 (untreated) or 1 (treated);</li>
<li>We predict the <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script>(0)s for individuals who are treated by using <span class='MathJax_Preview'>\(\mathit{f^{(1)}}\)</span><script type='math/tex'>\mathit{f^{(1)}}</script>(.) and the <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script>(1)s for individuals who are not treated by using <span class='MathJax_Preview'>\(\mathit{f^{(0)}}\)</span><script type='math/tex'>\mathit{f^{(0)}}</script>(.);</li>
<li>Now, we average the difference between <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script>(1)s and <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script>(0)s of all individuals and that is the estimated (average) causal effect.</li>
</ul>
<p style="text-align: left">If the data are not observational but comes from random controlled trials, the three-step method above works well. However, when the data are observational, this plug-in method has a good chance to lead to some highly biased estimation results. This is because it does not take care of certain kinds of important variables, known as confounders! The confounders are the variables that have an impact on both the Outcome <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script> and Treatment <span class='MathJax_Preview'>\(\mathit{T}\)</span><script type='math/tex'>\mathit{T}</script>. <a href="https://en.wikipedia.org/wiki/Simpson%27s_paradox">Simpson’s Paradox</a> is often used to illustrate the importance of confounders.</p>
<p style="text-align: left">For example, in a hospital, 2,050 patients treated for an illness are observed. 550 of them are treated and the rest of the 1,500 are not. 445 patients get cured in the treated group and 1,260 patients get cured in the untreated group. Then, the success rate of the untreated group, 84%, is higher than the success rate of the treated group, 81%. So, this seems that the causal effect is at least nonpositive.</p>
<p style="text-align: left">However, when a confounder, the severity of illness (SOI), is considered, an interesting fact is revealed. For patients with higher SOI, the success rate of the treated group is higher than the untreated group. For patients with lower SOI, the success rate of the treated group is also higher than the untreated group! That is, without considering the confounder, we might reach the totally wrong conclusion about the causal effect. The details of the example are shown in Table 1.</p>
<table class="aligncenter" style="height: 136px" border="1" width="354">
<tbody>
<tr>
<td style="text-align: center" width="33%"></td>
<td style="text-align: center" width="33%"><strong>Untreated</strong></td>
<td style="text-align: center" width="33%"><strong>Treated</strong></td>
</tr>
<tr>
<td style="text-align: center" width="33%"><strong>Total</strong></td>
<td style="text-align: center" width="33%"><span style="color: #ff0000">1260/1500=84%</span></td>
<td style="text-align: center" width="33%">445/550=81%</td>
</tr>
<tr>
<td style="text-align: center" width="33%"><strong>Higher SOI</strong></td>
<td style="text-align: center" width="33%">70/100=70%</td>
<td style="text-align: center" width="33%"><span style="color: #ff0000">400/500=80%</span></td>
</tr>
<tr>
<td style="text-align: center" width="33%"><strong>Lower SOI</strong></td>
<td style="text-align: center" width="33%">1190/1400=85%</td>
<td style="text-align: center" width="33%"><span style="color: #ff0000">45/50=90%</span></td>
</tr>
</tbody>
</table>
<p style="text-align: center">Table 1:  An example of Simpson’s Paradox</p>
<h2>Semiparametric framework comes to the rescue</h2>
<p>Now, you know, it’s not easy to estimate the causal effect based on the observational data. This is true even when the machine learning tools can help us to handle the huge number of mixed-discrete-and-continuous variables <span class='MathJax_Preview'>\(\mathit{X}\)</span><script type='math/tex'>\mathit{X}</script>, <span class='MathJax_Preview'>\(\mathit{Y}\)</span><script type='math/tex'>\mathit{Y}</script> and <span class='MathJax_Preview'>\(\mathit{T}\)</span><script type='math/tex'>\mathit{T}</script>, as well as the unknown nonlinear relationships among them. Here, the semiparametric framework comes to the rescue! There are two main steps in the semiparametric framework:</p>
<ul>
<li>Nonparametric step: use the machine learning tools (here Deep Neural Networks, DNNs) to estimate the relationship between the treatment variable and the covariates (the so-called propensity score model) and the relationship between the outcome variable and the covariates with the treatment variable (the so-called outcome model).</li>
<li>Parametric step: for parameters of interest, construct the doubly robust estimator through the influence functions, which takes care of the impact of the confounders.</li>
</ul>
<p>Among so many machine learning tools, why are DNNs selected? It's because of the DNNs’ convergence rate. One issue that we sometimes experience with machine learning tools in econometrics and other fields is the interpretability of the model. In this particular case, the estimator needs the standard errors to conduct inference. If the machine learning tools can converge to the true unknown functions fast enough in the nonparametric step, the semiparametric framework can provide the valid standard errors of the estimators in the parametric step.</p>
<p>To our knowledge, thanks to the proof in <a href="https://doi.org/10.3982/ECTA16901">Max H. Farrell, Tengyuan Liang, and Sanjog Misra (2021)</a>, DNN is the only machine learning method that can directly fulfill this converge fast requirement. So, by using DNNs in the nonparametric step, we can obtain not only the unbiased estimator but also its standard errors for inference.</p>
<h2>The powerful and easy-to-use DEEPCAUSAL procedure</h2>
<p>Successfully applying the semiparametric framework including DNN implementation, might not be easy because of the complexity of the estimation. You don’t need to worry, however, because all the technical details are taken care of in the <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casecon/titlepage.htm">DEEPCAUSAL</a> procedure. For example, the SAS code for estimating the causal effects of interest might look like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">   <span style="color: #000080; font-weight: bold;">proc deepcausal</span> <span style="color: #000080; font-weight: bold;">data</span>=mycas.mydata;
         id rowId;
         psmodel t=x1-x20;
         model y=x1-x20;
         infer out=mycas.oest;
   <span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>

<p>In this simple example, no DNN options are needed, because tens of DNN options can take their default values. Of course, if you are an advanced user, you can even specify different DNNs with different training hyperparameters for different models. See the documentation for details.</p>
<p>The output of PROC DEEPCAUSAL includes the estimates of parameters of interest for both the full population and the subpopulation. For example, the two most famous and useful parameters of interest are in the output: the average treatment effect (ATE), and the average treatment effect for the treated (ATT). An example of the output from the DEEPCAUSAL procedure is shown in Table 2.</p>
<div id="attachment_9250" style="width: 1014px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table2-aug.png"><img aria-describedby="caption-attachment-9250" loading="lazy" class="wp-image-9250 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table2-aug.png" alt="Table 2: causal inference and policy evaluation" width="1004" height="633" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table2-aug.png 1004w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table2-aug-300x189.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table2-aug-343x215.png 343w" sizes="(max-width: 1004px) 100vw, 1004px" /></a><p id="caption-attachment-9250" class="wp-caption-text">Table 2: Example of the estimated parameters of interest</p></div>
<h2>The DEEPCAUSAL procedure and policy evaluation</h2>
<p>Besides the parameter estimates, PROC DEEPCAUSAL also supports policy evaluation. which is used to assign the treatment. For example, the policy of sending coupons to customers might be sending them only to customers who spent more than $400 last year. If we know the causal effect on each individual and we can determine the treatment assignment, we have an opportunity to optimize business objectives (maximize revenue, minimize loss, evaluate the impact of a marketing campaign) by setting up the optimal policy.</p>
<p>There are many applications of policy optimization in different fields. Here are just a few: customer targeting, personalized pricing, stratification in clinical trials, learning the click-through rates, and so on. PROC DEEPCAUSAL can evaluate the average effect of a policy or compare the difference between two policies by using the observational data that you have in hand. To do so, you only need to provide the policies in the POLICY= option and the policies to be compared in the POLICYCOMPARISON= option in the INFER statement.</p>
<p>For example, the observational data shows the base policy denoted by <strong>t</strong>. What if we changed that policy to a new policy denoted by <strong>s</strong>?  Here are two candidates. Policy <strong>s1</strong> is to assign treatment to individuals only if their estimated individual treatment effect is positive. The policy <strong>s0</strong> is the opposite. It only assigns treatment to individuals whose estimated individual treatment effects are negative. You’d like to evaluate each policy’s average outcome and compare them with the observed treatment <strong>t</strong>. The SAS code might look like this:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">   <span style="color: #000080; font-weight: bold;">proc deepcausal</span> <span style="color: #000080; font-weight: bold;">data</span>=mycas.mydata;
         id rowId;
         psmodel t=x1-x20;
         model y=x1-x20;
         infer policy=<span style="color: #66cc66;">&#40;</span>s1 s0<span style="color: #66cc66;">&#41;</span> policyComparison=<span style="color: #66cc66;">&#40;</span>base=<span style="color: #66cc66;">&#40;</span>t<span style="color: #66cc66;">&#41;</span> compare=<span style="color: #66cc66;">&#40;</span>s0 s1<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
               out=mycas.oest;
   <span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>

<p>Table 3 shows the output.</p>
<p><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3-aug.png"><img loading="lazy" class="size-full wp-image-9274 aligncenter" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3-aug.png" alt="" width="621" height="184" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3-aug.png 621w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3-aug-300x89.png 300w" sizes="(max-width: 621px) 100vw, 621px" /></a></p>
<div id="attachment_9277" style="width: 790px" class="wp-caption aligncenter"><a href="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3b-aug.png"><img aria-describedby="caption-attachment-9277" loading="lazy" class="wp-image-9277 size-full" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3b-aug.png" alt="Causal inference and policy evaluation: Table 3" width="780" height="181" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3b-aug.png 780w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/xilong-jan-table3b-aug-300x70.png 300w" sizes="(max-width: 780px) 100vw, 780px" /></a><p id="caption-attachment-9277" class="wp-caption-text">Table 3: Example of policy evaluation and comparison</p></div>
<p>As you can see from the table, different policies have different average outcomes. Policy <strong>s1</strong> leads to an average outcome of 0.65 whereas policy <strong>s0</strong> leads to an average outcome close to 0. Compared to the observed treatment assignment <strong>t</strong>, policy <strong>s1</strong> has significant positive gains whereas policy <strong>s0</strong> has a significant loss. This is an example of where policy matters and why policy optimization is so important.</p>
<h2>Other powerful tools for causal inference</h2>
<p>Although we covered several topics in this post, there are many more related to causal modeling. The following list mentions a few that might be of interest to you.</p>
<ul>
<li><a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=statug&amp;docsetTarget=titlepage.htm">CAUSALGRAPH</a> procedure in SAS/STAT enables you to analyze graphical causal models and to construct sound statistical strategies for causal effect estimation.</li>
<li><a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=statug&amp;docsetTarget=titlepage.htm">CAUSALMED</a> procedure in SAS/STAT enables you to decompose a (total) causal effect into direct and indirect effects.</li>
<li><a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=statug&amp;docsetTarget=titlepage.htm">CAUSALTRT</a> procedure in SAS/STAT enables you to perform an estimation of a causal effect.</li>
<li><a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=etsug&amp;docsetTarget=titlepage.htm">MODEL</a>, <a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=etsug&amp;docsetTarget=titlepage.htm">PANEL</a>, <a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=etsug&amp;docsetTarget=titlepage.htm">QLIM</a>, and <a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=etsug&amp;docsetTarget=titlepage.htm">TMODEL</a> procedures in SAS/ETS and the <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casecon/titlepage.htm">CPANEL</a> and <a href="https://go.documentation.sas.com/doc/en/pgmsascdc/v_016/casecon/titlepage.htm">CQLIM</a> procedures in SAS Econometrics support instrumental variables when there are unmeasured confounders.</li>
<li><a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=statug&amp;docsetTarget=titlepage.htm">PSMATCH</a> procedure in SAS/STAT enables you to perform propensity score analyses and to assess covariate balance.</li>
<li><a href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;cdcVersion=v_016&amp;docsetId=etsug&amp;docsetTarget=titlepage.htm">VARMAX</a> procedure supports Granger causality tests for time series data.</li>
</ul>
<h2>Causal inference conclusion</h2>
<p>Correctly estimating the causal effects is critical for decision-making in our daily lives and business applications, especially in this big data era. In the 2021.1.4 release, SAS provided a new DEEPCAUSAL procedure that takes advantage of both deep learning and econometrics methods and makes your causal inference modeling much easier.  We hope you find it useful in your modeling efforts.</p>
<a href="https://www.sas.com/en_us/software/stat.html" class="sc-button sc-button-default"><span>
<span class="btnheader">LEARN MORE |</span> SAS/STAT </span></a>
<a href="https://www.sas.com/en_us/insights/analytics/deep-learning.html" class="sc-button sc-button-default"><span>
<span class="btnheader">LEARN MORE |</span> Deep learning </span></a>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/07/causal-inference-and-policy-evaluation-with-deep-neural-networks/">Causal inference and policy evaluation with deep neural networks</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=ymo44XwNxEU:xch1wTxCEz0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/09/07/causal-inference-and-policy-evaluation-with-deep-neural-networks/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/big-data-150x150.jpeg" />
	</item>
		<item>
		<title>Analyzing movement and tracking data using SAS Visual Analytics</title>
		<link>https://blogs.sas.com/content/subconsciousmusings/2021/09/02/analyzing-movement-and-tracking-data-using-sas-visual-analytics/</link>
					<comments>https://blogs.sas.com/content/subconsciousmusings/2021/09/02/analyzing-movement-and-tracking-data-using-sas-visual-analytics/#respond</comments>
		
		<dc:creator><![CDATA[Falko Schulz]]></dc:creator>
		<pubDate>Thu, 02 Sep 2021 15:00:45 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[advanced analytics]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[SAS Visual Analytics]]></category>
		<category><![CDATA[SAS Viya]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/subconsciousmusings/?p=8974</guid>

					<description><![CDATA[<p>Technological advancements in connectivity and global positioning systems (GPS) have led to increased data tracking and related business use cases to analyze such movements. Whether analyzing a vehicle, an animal or a population's movements - each use case requires analyzing underlying spatial information. Global challenges such as virus outbreaks, deforestation [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/02/analyzing-movement-and-tracking-data-using-sas-visual-analytics/">Analyzing movement and tracking data using SAS Visual Analytics</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h4>Technological advancements in connectivity and global positioning systems (GPS) have led to increased data tracking and related business use cases to analyze such movements. Whether analyzing a vehicle, an animal or a population's movements - each use case requires analyzing underlying spatial information. Global challenges such as virus outbreaks, deforestation and climate change all benefit from deeply analyzing location and movement data.</h4>
<p>This year's <a href="https://vast-challenge.github.io/2021/index.html">IEEE Visual Analytics Science and Technology (VAST) Challenge</a> covered such a scenario and a team of SAS volunteers decided to put <a href="https://www.sas.com/en_us/software/viya.html">SAS® Viya®</a> to the test and submit a solution for both Mini Challenges 2 and 3. The mission was to analyze movement and tracking data for several employees of a fictitious company named <em>GAStech</em>. Two weeks' worth of data was provided before a disappearance (kidnapping) of some of the company's employees. Mini Challenge 2 involved using provided credit card and loyalty card usage data to identify any anomalies and suspicious behavior leading up to the disappearance. Mini Challenge 3 required further analysis of social media and text information to determine any changes in risk levels during the night of the disappearance.</p>
<h2>The process</h2>
<p>We used <a href="https://www.sas.com/en_us/software/visual-analytics.html">SAS® Visual Analytics</a> to track the area's vehicle traffic and to analyze financial and loyalty data. We detected popular locations and points of interest by analyzing credit card transaction records. The data also revealed employees' classification given their movement profile at different times during the day. We utilized SAS Viya's support for machine learning (ML) and decision tree scoring to determine actual credit card owners based on location visits and frequency.</p>
<p>We also utilized <a href="https://www.sas.com/en_us/software/visual-text-analytics.html">SAS Visual Text Analytics</a> to tackle Mini Challenge 3 by extracting useful information from provided social media posts to detect risk levels using natural language processing (NLP).</p>
<h3>The tools</h3>
<div id="attachment_8983" style="width: 310px" class="wp-caption alignright"><img aria-describedby="caption-attachment-8983" loading="lazy" class="wp-image-8983 size-medium" style="font-size: 14px; font-family: 'Open Sans', Arial, sans-serif;" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-02-300x210.png" alt="" width="300" height="210" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-02-300x210.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-02-1024x717.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-02-1536x1075.png 1536w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-02.png 1793w" sizes="(max-width: 300px) 100vw, 300px" /><p id="caption-attachment-8983" class="wp-caption-text">Map plotting geo-referenced locations</p></div>
<p>The majority of work was done using SAS Studio and SAS Visual Analytics. We leveraged some core capabilities in SAS data steps to prepare and adjust the provided data for easier analysis. In particular, we calculated aspects like vehicle speeds and credit card owner scores in SAS Studio with greater flexibility. We also used Esri's ArcGIS mapping tools to geo-reference a provided image of the city streets and tourist map. Some shop and business locations were determined by inspecting both the map and provided Esri shape files.</p>
<h3>The Solution - Mini Challenge 2</h3>
<p>The first step in discovering meaningful patterns was to analyze provided credit and loyalty card data. Each employee was issued a company card and the challenge was to determine the actual card owner based on anonymized transaction data. Knowing who made which purchases at what time of the day allowed not only for the classification of shops and businesses, but also revealed some anomalies in the data (e.g., wrong/missing transaction timestamps, purchases at exact noon times, etc.).</p>
<div id="attachment_8986" style="width: 273px" class="wp-caption alignright"><img aria-describedby="caption-attachment-8986" loading="lazy" class="size-medium wp-image-8986" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-03-263x300.png" alt="" width="263" height="300" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-03-263x300.png 263w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-03-897x1024.png 897w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-03.png 1307w" sizes="(max-width: 263px) 100vw, 263px" /><p id="caption-attachment-8986" class="wp-caption-text">Purchase prices by location</p></div>
<p>Further cluster analysis using SAS Visual Analytics allowed us to categorize locations into three natural groupings based on the purchase frequency, price and busiest time.</p>
<p>We also added vehicle data into our analysis of the credit and loyalty card data. Using Esri ArcGIS mapping tools, we georeferenced both the city streets and tourist map and created a list of all known locations including their exact GPS location.</p>
<p>Taking the actual vehicle tracking data into account, we could then determine where vehicles were at a given point in time. Clustering GPS data also allowed the creation of travel hotspots grouped by an employee's department. This showed that some employee groups visit some areas more frequently than others.</p>
<div id="attachment_8989" style="width: 233px" class="wp-caption alignleft"><img aria-describedby="caption-attachment-8989" loading="lazy" class="size-medium wp-image-8989" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-223x300.png" alt="" width="223" height="300" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-223x300.png 223w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-760x1024.png 760w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01.png 828w" sizes="(max-width: 223px) 100vw, 223px" /><p id="caption-attachment-8989" class="wp-caption-text">Travel hotspots</p></div>
<p>From the general movement patterns GAStech employees can be classified into two broad types:</p>
<ol>
<li style="list-style-type: none;">
<ol>
<li style="list-style-type: none;">
<ol>
<li style="list-style-type: none;">
<ol>
<li>General staff who visit all stores in Abila except Abila Airport, Carlyle Chemical Inc. and Nationwide Refinery (which are all locations involved in industrial interactions with GASTech).</li>
</ol>
</li>
</ol>
</li>
</ol>
</li>
<li>Truck drivers that typically drive between industrial locations and GASTech headquarters with only a few exceptions.</li>
</ol>
<p>We also identified 25 credit card transaction locations geo-referenced from the Abila tourist map to plot on the provided shape files. The other 9 locations were not geo-referenced as there was no associated place on the provided map. Additional non-credit card locations were included in the analysis such as parks, shops and other popular points of interest.<br />
 <br />
GPS data was used to determine a vehicle’s speed (mph) and time spent between waypoints. Knowing when a vehicle was parked is essential to determine potential store visits. We were also able to track a vehicle's motion by animating its travel pattern on a geographical map using SAS Visual Analytics:</p>
<div id="attachment_9013" style="width: 710px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-9013" loading="lazy" class="size-full wp-image-9013" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-05-min_resized.gif" alt="" width="700" height="371" /><p id="caption-attachment-9013" class="wp-caption-text">Travel Route Animation</p></div>
<p>A closest known point of interest was assumed if it was less than 0.25 miles away from a parked vehicle (speed less than 5 mph). Taking into account the time of travel revealed that some employees were active all day including night hours. Others show recurring events such as executives playing golf on the weekends.</p>
<p>Utilizing geographic polyline visualization within SAS Visual Analytics also allows the visualization of trajectories and helps when comparing routes taken on different days or cars driven by different employees.</p>
<div id="attachment_8995" style="width: 712px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-8995" loading="lazy" class="size-large wp-image-8995" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/q2_travel_pattern_dashboard-1024x696.png" alt="" width="702" height="477" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/q2_travel_pattern_dashboard-1024x696.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/q2_travel_pattern_dashboard-300x204.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/q2_travel_pattern_dashboard-1536x1044.png 1536w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/q2_travel_pattern_dashboard.png 1652w" sizes="(max-width: 702px) 100vw, 702px" /><p id="caption-attachment-8995" class="wp-caption-text">Daily routes taken by CEO</p></div>
<p>To infer the owners of each credit card, we utilized the previously calculated stop locations of each car. If a given car was near a location at the time of a card transaction, the employee assigned to the car represents a potential match. This method's accuracy decreases if there are multiple candidates for a given transaction.</p>
<p>In order to rank candidates, we used two different approaches:</p>
<ol>
<li>Machine learning / decision tree scoring.</li>
<li>Manual modeling and calculation for ranking and scoring.</li>
</ol>
<div id="attachment_8998" style="width: 712px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-8998" loading="lazy" class="size-large wp-image-8998" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-02-1024x529.png" alt="" width="702" height="363" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-02-1024x529.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-02-300x155.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-02-1536x793.png 1536w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-02.png 1567w" sizes="(max-width: 702px) 100vw, 702px" /><p id="caption-attachment-8998" class="wp-caption-text">Leaf nodes to compare candidate scores</p></div>
<p>With the knowledge about which employee made which purchase, we were able to create purchase profiles. Knowing an individual's credit card transaction history allowed us to validate outlier purchases and highlight anomalies.</p>
<div id="attachment_9001" style="width: 310px" class="wp-caption alignright"><img aria-describedby="caption-attachment-9001" loading="lazy" class="wp-image-9001 size-medium" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-02-300x193.png" alt="" width="300" height="193" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-02-300x193.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-02-1024x660.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-02-1536x990.png 1536w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-02.png 1547w" sizes="(max-width: 300px) 100vw, 300px" /><p id="caption-attachment-9001" class="wp-caption-text">Travel Route Network</p></div>
<p>Analyzing vehicle movement data allowed the identification of travel routes and related stops. Knowing each start and stop location of a given trip allowed us to create a network showing popular travel routes. Comparing visit times (start/stop) and the duration of the stay across employees reveals commonly visited locations like the GAStech headquarters, but also subsets of employees that visit the same location at the same time.</p>
<p>If an employee visits a POI at the same time as someone else we consider this as a meeting event. These meeting events include living in the same household or working together.</p>
<div id="attachment_9004" style="width: 310px" class="wp-caption alignleft"><img aria-describedby="caption-attachment-9004" loading="lazy" class="wp-image-9004 size-medium" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-05-300x294.png" alt="" width="300" height="294" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-05-300x294.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-05-1024x1004.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q4-05.png 1308w" sizes="(max-width: 300px) 100vw, 300px" /><p id="caption-attachment-9004" class="wp-caption-text">Social Company Network</p></div>
<p>If we consider all possible meetings across all employees we can create a scheduling matrix and identify potential relationships if employees meet regularly. This social network reveals strong relationships between the executives as well as some individuals in the Engineering department.</p>
<p>The final steps in the analysis included the comparison of meeting events. Comparing meeting duration and location not only reveals when individuals arrived at a given location, but also the route they took and other POIs they visited while traveling. This analysis revealed that some employees share a common household, have common social activities (e.g. executives playing golf) or work unusual hours at HQ. A Friday night party just one day before the disappearances is particularly interesting given many employees from both the Engineering and IT department met at the same location.</p>
<h3>The Solution - Mini Challenge 3</h3>
<div id="attachment_9046" style="width: 310px" class="wp-caption alignright"><img aria-describedby="caption-attachment-9046" loading="lazy" class="size-medium wp-image-9046" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-1-300x95.png" alt="" width="300" height="95" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-1-300x95.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-1-1024x324.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-1-1536x486.png 1536w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-01-1.png 1540w" sizes="(max-width: 300px) 100vw, 300px" /><p id="caption-attachment-9046" class="wp-caption-text">Risk level through the evening</p></div>
<p>Solving Mini-Challenge 3 required a deep analysis of social media and documents published during the time of the disappearance. The team used SAS Visual Text analytics to classify text messages into low, medium and high-risk categories depending on whether they relate to emergencies or other generic events. We also uncovered noteworthy events using our automated categorization process (a rally, a fire, an explosion, reckless driving, running a red light, bicycle/pedestrian hit and run and a hostage situation).</p>
<p>We used a risk level model to pinpoint when the risk was rising or falling. The peaks in the visualization represent the fire (first peak), a car being hit (second peak), a hostage situation (all remaining peaks in the latter half of the timeline), and an explosion (the very last peak).</p>
<div id="attachment_9043" style="width: 310px" class="wp-caption alignright"><img aria-describedby="caption-attachment-9043" loading="lazy" class="size-medium wp-image-9043" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-07-300x146.png" alt="" width="300" height="146" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-07-300x146.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q1-07.png 521w" sizes="(max-width: 300px) 100vw, 300px" /><p id="caption-attachment-9043" class="wp-caption-text">Author Similarity Network</p></div>
<p>Some messages come from microblogs and contain informative reports (from government, media accounts, and eyewitnesses, for example) as well as chatter/junk/spam. There are re-posts for both types of messages. One way to distinguish between those two types is to identify usernames that may be more trustworthy than others. The term scores from the top 25 authors allowed for the creation of a network that shows pockets of similarity.</p>
<p>As part of our solution, we team also compiled a visualization dashboard to provide first responders a way to monitor potential needs and to prioritize their activities.</p>
<div id="attachment_9049" style="width: 712px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-9049" loading="lazy" class="size-large wp-image-9049" src="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-01_horiz-1024x487.png" alt="" width="702" height="334" srcset="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-01_horiz-1024x487.png 1024w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-01_horiz-300x143.png 300w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-01_horiz-1536x730.png 1536w, https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q3-01_horiz-2048x973.png 2048w" sizes="(max-width: 702px) 100vw, 702px" /><p id="caption-attachment-9049" class="wp-caption-text">Risk Level Dashboard</p></div>
<h3>Results</h3>
<p>We compiled our findings in one video highlighting some of the approaches taken when analyzing the VAST challenge data:</p>
<p><center><iframe title="Visual Analytics of Movement and Transaction Data" width="702" height="395" src="https://www.youtube.com/embed/BTX8mT4Kfr8?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></center></p>
<p>The VAST Challenge provides a great opportunity to validate our software against real-world scenarios using complex data sets. Not only do we learn from these projects, but we also send feedback to our development teams to further improve product capabilities for customers.</p>
<h3>The team</h3>
<p>Spending time on VAST challenges is always fun but also requires a lot of commitment and technical knowledge in various areas of technology. Submitting a solution for these challenges wouldn't have been possible without the help of <strong>Riley Benson,</strong> <strong>Cheryl LeSaint</strong>, <strong>Rajendra Singh</strong>, <strong>Don Chapman</strong> (MC2) and <strong><a href="https://blogs.sas.com/content/author/biljanabelamaricwilsey/">Biljana Belamaric Wilsey</a>, Russell Albright</strong> (MC3). <a href="https://blogs.sas.com/content/author/falkoschulz/"><strong>Falko Schulz</strong></a> used SAS Visual Analytics to explore and visualize the data to tell a complete story and focus on the Mini Challenges' questions. Also huge thanks to <strong>Rachel Nisbet </strong>and <strong>Chelsea Mayse</strong> for the willingness and thoroughness in producing a beautiful video summary. None of this would have been possible without each of you.</p>
<p>Thanks again to the entire SAS team!</p>
<h2>References</h2>
<ul>
<li><a href="https://vast-challenge.github.io/2021/index.html">VAST Challenge 2021</a></li>
<li><a href="http://visualdata.wustl.edu/varepository/benchmarks.php">Visual Analytics Benchmark Repository</a></li>
<li>YouTube - <a href="https://youtu.be/BTX8mT4Kfr8">Submission Video</a></li>
<li><span data-contrast="auto">SAS Institute</span><span data-contrast="auto"> Inc</span><span data-contrast="auto">. <a href="https://www.sas.com/en_us/software/visual-analytics.html.">SAS Visual Analytics</a>. </span><span data-contrast="auto">(</span><span data-contrast="auto">Online</span><span data-contrast="auto">.)</span> <span data-contrast="auto">2021.</span></li>
</ul>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings/2021/09/02/analyzing-movement-and-tracking-data-using-sas-visual-analytics/">Analyzing movement and tracking data using SAS Visual Analytics</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/subconsciousmusings">The SAS Data Science Blog</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/advanalytics?a=hSDqzd49Rhg:zI8R99g1o8g:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/advanalytics?d=yIl2AUoC8zA" border="0"></img></a>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/subconsciousmusings/2021/09/02/analyzing-movement-and-tracking-data-using-sas-visual-analytics/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/subconsciousmusings/files/2021/08/figure-q2-06-150x150.png" />
	</item>
	</channel>
</rss>
