<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DeepSee.io</title>
	<atom:link href="https://deepsee.io/feed" rel="self" type="application/rss+xml" />
	<link>https://deepsee.io</link>
	<description>Detect ad fraud before it becomes a problem.</description>
	<lastBuildDate>Thu, 13 Apr 2023 20:21:46 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.2.2</generator>
	<item>
		<title>DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation</title>
		<link>https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Wed, 08 Mar 2023 14:00:00 +0000</pubDate>
				<category><![CDATA[Research & Development]]></category>
		<category><![CDATA[News & Events]]></category>
		<category><![CDATA[ad fraud]]></category>
		<category><![CDATA[deepstreamer]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[Laundering]]></category>
		<category><![CDATA[malwarebuytes]]></category>
		<category><![CDATA[pin]]></category>
		<category><![CDATA[Piracy]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1508</guid>

					<description><![CDATA[This investigation was a joint effort between Malwarebytes Threat Intelligence’s Jérôme Segura, DeepSee’s Rocky Moss and Antonio Torres. Key findings Introduction Online video streaming sites have always been some of the most visited destinations on the web. Legitimate ones will typically require a subscription fee or rely on advertising as part of their business model. &#8230; <a href="https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation">Continued</a>]]></description>
										<content:encoded><![CDATA[
<p><em>This investigation was a joint effort between Malwarebytes Threat Intelligence’s Jérôme Segura, DeepSee’s Rocky Moss and Antonio Torres.</em></p>



<h2 class="wp-block-heading" id="0-key-findings">Key findings</h2>



<ul>
<li>Over a dozen unique domains were found selling ad inventory through Google Ad Manager, even though the pages were embedded invisibly under the content of illegal movie &amp; porn streaming sites</li>



<li>Streaming sites in the DeepStreamer fraud ring generated an estimated <strong>210,550,928</strong> <strong>visits</strong> in January 2023, as measured by Similar Web</li>



<li>There was not a single seller in common between each of the sites used for laundering (the “money sites”), but most offered their inventory for sale through Google Ad Manager</li>



<li>Using extremely conservative estimates, which factor in a 50% ad-block rate &amp; 70% ad-unit fill rate, we project advertiser spend on this scheme between <strong>$120k &#8211; $1.2 million</strong> in January 2023 alone</li>



<li>Working with a leading ad buying platform, we were able to confirm there were hundreds of millions of bid requests generated for these domains between January &#8211; February 2023</li>
</ul>


<div class="ub_table-of-contents" data-showtext="show" data-hidetext="hide" data-scrolltype="auto" id="ub_table-of-contents-6537b21c-e540-4996-a0cc-d3a057bfc97e" data-initiallyhideonmobile="false"
                    data-initiallyshow="true"><div class="ub_table-of-contents-header-container"><div class="ub_table-of-contents-header">
                    <div class="ub_table-of-contents-title">Table of Contents</div></div></div><div class="ub_table-of-contents-extra-container"><div class="ub_table-of-contents-container ub_table-of-contents-1-column "><ul><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#0-key-findings>Key findings</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#1-introduction>Introduction</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#2-a-deceptive-business-model>A deceptive business model</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#3-anti-debugging-tricks>Anti-debugging tricks</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#4-hidden-iframe-containers>Hidden iframe containers</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#5-evasion-techniques>Evasion techniques</a><ul><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#6-unintended-3rd-party-measurement-evasion>(Un)intended 3rd Party Measurement Evasion</a></li></ul></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#7-monetization>Monetization</a><ul><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#8-the-roster-of-embedded-sites>The Roster of Embedded Sites</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#9-the-non-google-dsp-perspective>The Non-Google DSP Perspective</a><ul><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#10-google-was-the-top-exchange-offering-these-opportunities-there-was-not-1-particular-seller-in-common>Google Was the Top Exchange Offering These Opportunities; There Was Not 1 Particular Seller in Common</a></li></ul></li></ul></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#11-conclusion>Conclusion</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#13-about-malwarebytes>About Malwarebytes</a></li><li><a href=https://deepsee.io/blog/deepstreamer-piracy-platforms-hide-lucrative-ad-fraud-operation#14-about-deepsee>About DeepSee</a></li></ul></div></div></div>


<h2 class="wp-block-heading" id="1-introduction">Introduction</h2>



<p>Online video streaming sites have always been some of the most visited destinations on the web. Legitimate ones will typically require a subscription fee or rely on advertising as part of their business model. Unfortunately, at any given point in time, there are thousands of sites that allow users to illegally stream pirated content, and they often manage to devise strategies that allow them to monetize their illegally sourced content with programmatic advertising.</p>



<p>Researchers at <a href="https://deepsee.io/">DeepSee</a> and <a href="https://www.malwarebytes.com/" target="_blank" rel="noopener">Malwarebytes</a> have identified an invalid traffic scheme that has gone undetected for over one year via a number of illegal video streaming platforms. DeepStreamer used different techniques to evade detection and forge traffic by surreptitiously loading “money sites” (ad-monetized sites used to monetize/launder the human traffic to pirate sites) filled with Google ads completely hidden from view, while internet users were watching movies.</p>



<p>Not only are these streaming sites breaking the law by using copyrighted material, they are also defrauding advertisers for, possibly, up to $1.2million per month, based on conservative estimates.</p>



<h2 class="wp-block-heading" id="2-a-deceptive-business-model">A deceptive business model</h2>



<p>DeepSee researchers contacted Malwarebytes about a scheme they had observed recently via a video streaming website called moviesjoy[.]to. DeepSee’s crawlers had observed the site mikerin[.]com loading ads deep under the content of moviesjoy, but it wasn’t exactly clear <em>how </em>this was happening.</p>



<p>Interestingly, the site claims to offer free HD movies and TV series with “<em>absolutely zero ads on our site. Once you hit the play button, you can start streaming right away, without any interruptions in the middle.</em>”</p>



<p>On the internet if something is “free”, it usually means you are the product in some shape or form. Hosting and streaming costs money that needs to be recouped so the service can stay online.</p>



<p>What we identified was not entirely surprising but was quite clever. The platform does indeed rely on ads but rather than having them visible on the site, they are embedding and hiding them.</p>



<p>While the site owner could display ads to their visitors, there is no way legitimate advertisers (meaning those that would pay more) would accept traffic coming from a site offering pirated movies.</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/v_DILvcCeDgGxHI2L5YJ0PjEHwOK7BP1fD8GQ1g1yUZ2WEKc210wRomyJe91UX6NhHSopw1r-00xDdlwD9x09gzT2lRxAi3_nligqD5kplGa3aoDIe5p_OnhaoOwIb6dEqDrat_hiF-p7lPhXwHguqE" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>The trick consists of loading ads from seemingly regular websites and not showing them to anyone. Those “legitimate” websites are embedded and hidden into the page as iframes while users are watching movies.</p>



<p>There are 4 Google ads that load per page and the pages reload periodically. Advertisers are buying ad space for mainstream content but on websites that are inserted as invisible iframes into illegal video streaming platforms.</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/539Eh7MF4yyo7HKgNZuqzidOVMCAn8hICITnC8vm2ae0Wk9ohctwT2mnSFndVxUuDjsr9Ini08tofGSdZp1cZt6tnCB8dfr-jD0dIZ1cWmJyCznkys_OUBgntVSjKU01ByaiSjTuDzjfTZ-7oEIm198" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<h2 class="wp-block-heading" id="3-anti-debugging-tricks">Anti-debugging tricks</h2>



<p>Rather than using more simple techniques such as popunders, DeepStreamer relies on intermediary domains that create hidden iframe containers within the existing page.</p>



<p>The code that they use is highly obfuscated and detects the presence of debuggers. Capturing network traffic externally will only show some static elements, and not the dynamically created iframes.</p>



<p>Here is the overall traffic view, from the streaming site (moviesjoy) to the money site (mikerin):</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/DWrZFG8BximuAnt504oBMtsrol8nOg4ewO2vyGtWvxd0jej6dPrm1aseL5btaiW1FSyjbrYcN520zPqki7FY1dnp5CFNTxzyOf5GppCD9BahdBwxxbYgqdKfzkVWYJvGQwg_T6pHBKrbHIQpRts2BhQ" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>There are several anti-debugging tricks being used, the first one actually from the online video streaming site itself:&nbsp;</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/m-Rilg3SJnq8ljOSMpiQGwTP-IATYF_4R7DlbXC6Mmjgea8jpXTpb8OTbTlz9U5wTS3XR9kyQFeWwp-hpWMrDisAzSIex7rhfQVTaNa6SVgg6Kp8hx1mTG_FTAzhFhvylNn1Zy5PTge7ClV0R1KyNHM" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>The domain hosted at adtrue[.]top (or adtrue[.]info) plays an important role in loading the money domains by performing a HEAD /dynamic/ads/ HTTP request, and yet it shows an enigmatic 404 code response.</p>



<p>We were able to replay the attack by putting a breakpoint on adtrue[.]info using an external web debugger (Fiddler) and observed that it started loading the domain immediately responsible for rendering the money site.</p>



<figure class="wp-block-image is-resized"><img decoding="async" loading="lazy" src="https://lh5.googleusercontent.com/Eq9nL5vcp3qZvCFHUR-CQTtpjrVZY-6CIagnTtZqOkA42jB6uRknCy-dn2f_mfOZgDkVseJUeHuQPYNjvAtPLina0FUTEvBGWyJjnR-0qatNVn6u8ED-IiiMtNP_k4uS8SnACOcauZF-8czKGCcfwfw" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" width="808" height="489" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>It appears though that all these intermediary domains are connected and watching for each other.</p>



<h2 class="wp-block-heading" id="4-hidden-iframe-containers">Hidden iframe containers</h2>



<p>Let’s look at the difference between static and dynamically rendered content with mikerin[.]ml which is related to mikerin[.]com (money site with ads). It only appears to load jquery.js:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/y6LgxVZ1LclMk_cjAt0PZlkF7X-6BeU3CKCo9DQtAzbJ7fshD1d7QRDtKweVTn9LUtbgaKSNmWvtCVmDdwdDCKgOIn7FDa1XjrEsNiA7BbY5fSTP7KQMBR9wwR1AmMduMLTyYRvfy6G0mh-3rMUHJ4I" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>This has nothing to do with the popular jQuery JavaScript library, but instead is heavily obfuscated and debugger-proof code that contains the clues on how DeepStreamer is loading their iframe:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/FiXc4B6pTGokUNzxjLQli5CE5eoSxywtiv8HiyS2lpiZx0fWvEA91xFt17kLIzd9a21h8ciWQSJ2GIKHL4i83Uj3eEdhsKxvVNtAHjNIDZJhv2IXCdrpmrnCmilMEshqGV-nG4gl63slzTEg--M22N0" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>However, we can take a shortcut and see what the Document Object Model (DOM) looks like by saving the current webpage as a complete *.html,*.html file using the browser UI.</p>



<p>While the HTML saved from mikerin[.]ml showed very little information (Figure xx), the DOM provides a lot more useful information since it shows objects that have been rendered by the browser.</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/ZYneIZV8YnxTwaU8ngDoh4Y0Ih854FAzmMkd3Qyvm-Oh7l2f45Uy3LAt1cBMJQga3Kwbc5XkCpJL_vC9nuFB9vWD3ecOJ-veaJ0GkLj84tKnwWUNte1PVrtr9jt4Y3xBOAnLjL7yjf3GTh73pTQKXWU" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>There is a new element called “containerIframeBlog____” that is referring to the money sites which are ordinary looking blogs with Google ads. The iframe’s properties make it so that nothing is visible to the user.</p>



<p>One way to confirm those iframes without triggering the anti-debugger code is by launching Chrome’s Task Manager:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh4.googleusercontent.com/_skHdloLgVYDWqs29Vd_ZYRK4ioDkruxKyu5t92sPhRrcyYb0EaHA29f41xh1X99_II6YY35wBqHdTu6JixAT28VRXZpCndP5fC1IJL5r9Al3pCwjliJCqAKmfWN51Mt2fcULT3TouvKz5WFRFnROZw" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<h2 class="wp-block-heading" id="5-evasion-techniques">Evasion techniques</h2>



<p>What we refer to as the money sites are WordPress sites with a number of blog articles and Google ads. At first glance, everything looks legitimate but that is simply a decoy to fool everyone.</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/CTSeNEEVFU8eS_3QOOIkITpFzxYnEdp2xhuQu9_XvhFawSaohaMRnUOJDj2sWC0FyuoB-tdGVVOu9wTOjDDPBdIgZRYo6HD6FJNTtmWHlVMkUUudy6vXwCZiyHta8q_-9AsbmiP2_tx13pRAEqGqPYg" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>What we noticed are articles that are completely clean, while others contain ad fraud code. Of course, you will only get to the latter if your referer is one of the movie streaming sites.</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh4.googleusercontent.com/lAKQu0wmHzhAfUJIs8LUTpBq44_QxhONKP_DU7nc0pdNdytXTJIkfsGpp6w2q2OuiJNucP2--qQF4mBG3knWxFZyz2kSyljhpCzv8-CrSiTjYCVsgyqrznTrzaM3JKa138bK3EGwGKtySwSopo_2cKM" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>There is one problem though. If visitors truly came from a pirated site, then ad networks would not allow their customer’s ads through.&nbsp;</p>



<p>This is where referral forging comes into play. We can see that DeepStreamer is spoofing the referer and choosing from one of their own (Google, Bing or Facebook):</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh4.googleusercontent.com/X8PqzPgdkhM42gueczwOHwZl0i0lvrrBYmueK_T5-MHqerDv5zejTt3HCeVoVOQLKOqkHm9Fqu423JqJGwmB6FxqW45XRMGG4hFHB-EOfXq_7w3X1WCLs1k50nSsawmMoURG72fdt22zVOVqRmz3Gvs" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>Another issue is that the invisible iframes will not reflect user activity, and yet it is important to pretend humans are scrolling and clicking on the articles. The next piece of code from the ad fraud script does just that:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/yJ_buU6pXN0r5IgGi3dqAAK2XK-PQoOQKUnjZMgxbHY0qKd4D_ua9CmZohPLQu8tnwhygev4aSjOEPt9gIgw8mFqB6eThcOxm2GCM47QbWBDZZB5bg2MKS2UwcUm-GET7QvOXdAn1nN6x9luHBqG2TQ" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<figure class="wp-block-image"><img decoding="async" src="https://lh4.googleusercontent.com/pUdXtDxv8WfLr8ViO0KHCBqx-ovLzZArnPlZZbur101MTriHPxEj2TzdVdzQhZxDYnXGoKYa3aK5epaxXAHec_enS46M7ZU_8yq3ElpT2kCc-0H82QNKMu3l7iPXWb-2A932tLfGvy3-Llex4gGxEQY" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>If the money site was not hidden as an iframe, this is what it would look like while performing ad fraud:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/93PtWilLKoLVUMR3X9vU__uOvlm54mI7vxB3yUt5Ffq3DYvI_12Pz4c3ZEVrjac7B1F-jtlTmnGP0gBkfoJCApzCSRSqXAIZ-ySe3oUj_xWOhU5DusRSLFAMpg3aiMQ0wofTmMS3olMJhbr72hvZY0E" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>Perhaps as a measure to avoid creating too many ad requests, these embedded pages do not often refresh ad units within the context of a single page-view. Instead, they generate a visit to a new spoofed page every 2-3 minutes, as demonstrated in this code snippet (looking at the <em>interval </em>object in particular for details on timing):</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/UVRwwl8zWB6peEabIr_Owd_fQjBEcMUDzpBG6wJTnF4rKVD9QJLjnoEeBOx4_RlHGkLIQb7Qkvmp2Z8DamVt-ijuTTU4Dve8tCZf1AyQfPQSJXedyCC3dWl4vfG1M3zJJW3U0y33lOmsvGE3FeGNV80" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>This is also confirmed by our packet captures from manually generated visits to these pirate sites; a new page is loaded every 2-3 minutes.&nbsp;</p>



<h3 class="wp-block-heading" id="6-unintended-3rd-party-measurement-evasion">(Un)intended 3rd Party Measurement Evasion</h3>



<p>One interesting side effect of embedding the money domain as DeepStreamer here has: estimates from SimilarWeb were <strong>completely</strong> thrown off! Take for example the SimilarWeb results for 2 money sites that generated hundreds of millions of ad opportunities in the same measurement period (Nov ‘22 to Jan ‘23):</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh6.googleusercontent.com/dgVd_-XkkCcroVR78RkEuxUoFaCfLORfSC7cy172m1YjdfMuVeAfQ3HDBJsi77sIGuQKpvb9tdgvEGqUVv2dJ1tKjReq8aMFj8E_l1Ge5ImIcDKU8H9mS-vjSrZ3tuikxUCWRJh6YZbF32YDMPIOqgM" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>Similarweb has no idea they exist &amp; are generating these kinds of ad traffic volumes. This makes it seem like SimilarWeb measures traffic for domains that are navigated to in the browser address bar, and not accounting for hidden / embedded pages. This could be both a blessing and a curse.&nbsp;</p>



<p>On the plus side: many ad exchanges check for 3rd party traffic metrics from tools like SimilarWeb before making a publisher’s inventory available, and organizations doing that basic check will protect themselves from exposure to sites like this. Put another way: a quality specialist would see that there’s no traffic to mikerin[.]com, or guiadosabor[.]com, and the sites would not be approved for the platform subsequently.&nbsp;</p>



<ul>
<li>This begs the question: how were these publishers able to sell their inventory through Google’s ad exchange? What checks and balances were in place to ensure that the traffic volumes to those sites were believable?</li>
</ul>



<p>One negative outcome of this measurement scenario is that researchers who rely on SimilarWeb insights can not know about the “money” sites’ connections to pirate domains; the connection from source -&gt; money site is lost given the absence of SimilarWeb “related sites” data.&nbsp;</p>



<p>DeepSee’s crawl data revealed ground-truth connections between the pirate &amp; “money” sites, but it could not account for the volume of traffic directed at the “money” sites. Luckily, since these sites load every time someone visits the pirate sites, it’s possible to estimate the visit counts to the “money” domains by understanding traffic volumes to the pirate sites which embed them.</p>



<h2 class="wp-block-heading" id="7-monetization">Monetization</h2>



<h3 class="wp-block-heading" id="8-the-roster-of-embedded-sites">The Roster of Embedded Sites</h3>



<p>By working with the team at Malwarebytes, DeepSee was better able to profile the activity of a monetized site involved in the DeepStreamer scheme, and set about the task of mapping the active ones to their pirate/source domains. What we found are 14 active content domains, loaded by 250+ unique pirate sites, which cumulatively generated hundreds of millions of visits in January:</p>



<figure class="wp-block-table"><table><tbody><tr><td><strong>Money Domain</strong></td><td><strong>Unique Source Domains</strong></td><td><strong>Source Domain Est. Jan SimWeb Visits</strong></td></tr><tr><td>brandingjoy[.]in</td><td>33</td><td>80,991,970</td></tr><tr><td>aitechgear[.]in</td><td>19</td><td>50,552,027</td></tr><tr><td>guiadosabor[.]com</td><td>2</td><td>37,769,619</td></tr><tr><td>mikerin[.]com</td><td>2</td><td>32,999,385</td></tr><tr><td>adorablefurnishing[.]com</td><td>194</td><td>32,100,166</td></tr><tr><td>journeywithvision[.]com</td><td>1</td><td>29,430,082</td></tr><tr><td>satishmoheyt[.]in</td><td>2</td><td>21,499,619</td></tr><tr><td>primesinfo[.]com</td><td>1</td><td>9,913,095</td></tr><tr><td>techyclub[.]in</td><td>2</td><td>4,190,218</td></tr><tr><td>streamix[.]tv</td><td>11</td><td>3,899,551</td></tr><tr><td>newsworldcity[.]com</td><td>1</td><td>3,427,093</td></tr><tr><td>pharmabeaver[.]com</td><td>1</td><td>2,948,025</td></tr><tr><td>guerytech[.]online</td><td>3</td><td>2,590,490</td></tr><tr><td>virvida[.]com</td><td>1</td><td>1,359,371</td></tr></tbody></table></figure>



<p>In order to arrive at the estimated visit statistics, we used data from Similarweb. Not every pirate domain was found in their dataset due to recent registration, or low traffic volumes.</p>



<p>Now that we had identified a sample of ad-monetized domains, we needed to make sure these ad units were actually firing off impression trackers, meaning the advertiser would be charged for presenting their ads on the page.&nbsp;</p>



<p>In order to confirm this, DeepSee analyzed data its crawlers had gathered when visiting the pirate sites in question, and compared the number of Google ad requests generated to the number of corresponding Google impression trackers fired.&nbsp;</p>



<p>This dataset, composed of 6,748 crawls performed between January 1st and February 27th 2023 showed the following:</p>



<ul>
<li>Of the 35,269 Google ad requests measured, DeepSee measured 25,387 corresponding impression trackers, making for a fill rate of ~72%</li>



<li>The “money” sites loaded a median 4 ad units per-page load; confirmed by manual inspection performed by Malwarebytes</li>



<li>In DeepSee’s limited manual tests, generated by visiting the pirate sites &amp; running packet capture software, there was a measured fill rate of ~80%</li>



<li>Perhaps more troubling, ~98% of the sessions that DeepSee crawlers generated were from known data centers, performed without any attempt to cloak the IP.</li>
</ul>



<p>(For more information on how to do this kind of auditing yourself, check out <a href="https://www.monetizemore.com/blog/google-ad-manager-advanced-troubleshooting-guide-for-display-ads/" target="_blank" rel="noopener">this explainer from MonetizeMore</a>)</p>



<p>These data points in hand, we could now construct an estimate of how much advertisers might be spending on this inventory. <em>For complete insights into the dataset we used to create these estimates, alongside the complete list of Source:Money domain mappings, </em><a href="https://docs.google.com/spreadsheets/d/13KbXqbARRr9reaxoMJQirYLL21FR81-OvBXMkvfSxhU/edit?usp=sharing" target="_blank" rel="noopener"><em>check out our companion document</em></a></p>



<ul>
<li>After matching the pirate source domains to SimilarWeb data, and summing the visit counts, we counted <strong>221,823,394 cumulative visits generated.</strong></li>



<li>Using the visit data, and the time-on-site metrics from SimilarWeb, we arrived at a weighted average time-on-site of <strong>~7.75 minutes per visit</strong></li>



<li>Visitors immediately cause 4 ads to load upon a page load, and another 4 ads load on average each 2.5 minutes when the page reloads. This makes for an average <strong>16.40 ad exposures per visit for each user</strong></li>



<li>Multiplying average exposures per user by the number of visits yielded a total of <strong>3,636,840,849 estimated ad exposures in January</strong>, but we had to add a few modifiers to this figure:
<ul>
<li><a href="https://www.statista.com/topics/3201/ad-blocking" target="_blank" rel="noopener">According to data compiled by Statista</a>, ~50% of desktop web users block ads, and that number is ~30% for mobile browser users. We chose to use the more conservative 50% figure, and removed half of the projected impressions from the pool, leaving<strong> 1,818,420,425 estimated ad exposures in January</strong></li>



<li>As we previously mentioned, DeepSee crawlers measured a fill rate of ~72% for Google ad units on the money sites during our visits.<strong> Factoring in a slightly more conservative 70% fill rate left us with 1,272,894,297 estimated ad exposures in January</strong></li>
</ul>
</li>
</ul>



<p>Given our final figure of <strong>1,272,894,297 estimated ad exposures in January</strong>, the advertiser spend was estimated to be <strong>between $127k and $1.27 million, </strong>depending on the average price of these advertisements, which was never disclosed to us. We broke our estimates down across several probable price points for this media:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/R-t_bF_bBNK3bdDMNjMxwcVyvf7Xh64UQqV7QFUnwshlQr4GqJXYkhlaZWUfZbPQkj7GN-zm1ABK7PPmVrVu2j6J4eRW3ISFRyQswmZrlBfBG2rLTLQOhkibYbUNRfiUjEp1Tzea6hbX78sw2NVvH5w" alt="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation" title="DeepStreamer: Illegal Movie Streaming Platforms Hide Lucrative Ad Fraud Operation"></figure>



<p>At this point, it was clear that advertisers were really buying this space, so we started asking around for evidence that could point us to who was selling the space.</p>



<h3 class="wp-block-heading" id="9-the-non-google-dsp-perspective">The Non-Google DSP Perspective</h3>



<p>The data in this section was provided to DeepSee by a leading DSP (<a href="https://blog.hubspot.com/marketing/what-is-dsp" target="_blank" rel="noopener">demand-side advertising platform</a>) with global reach, who agreed to participate under condition of anonymity (we’ll call them <strong>DSP “A”</strong>) . They provided reporting, from their perspective, on the count of bid requests generated by the money domains dating back to 2020. Most helpfully, they also provided the supply-path related to an opportunity, which tells us the exchange &amp; seller name related to the opportunity.<br><br>As a disclaimer, there are a few limitations of this dataset:</p>



<ul>
<li>This is just the perspective of one DSP, and we can’t claim to know that these sellers created a similarly large share of opportunities presented to all other DSPs. We suspect they do, but without input from Google in particular, it can’t be confirmed.</li>



<li>These sites seemed to monetize extremely poorly outside of Google; fewer than 1% of requests resulted in an ad being delivered via DSP “A”.
<ul>
<li>That low fill rate was echoed by another non-google exchange we polled, who told us that only .1% of opportunities they created resulted in ads being loaded</li>



<li>On the other hand, we observed that these Google ad units were filled upwards of 70% of the time. </li>
</ul>
</li>
</ul>



<p>Understanding the above, the below table shows the top sellers offering space on these money domains, and the ad exchange the opportunity came through.</p>



<h4 class="wp-block-heading" id="10-google-was-the-top-exchange-offering-these-opportunities-there-was-not-1-particular-seller-in-common">Google Was the Top Exchange Offering These Opportunities; There Was Not 1 Particular Seller in Common</h4>



<p><strong>Top Seller Per Domain, Ordered by Magnitude of Ad Opportunities Presented to DSP “A”</strong></p>



<figure class="wp-block-table"><table><thead><tr><th scope="col"><strong>Domain</strong></th><th scope="col"><strong>Approx. Ad Opportunities Created</strong></th><th scope="col"><strong>Top Exchange</strong></th><th scope="col"><strong>Top Seller(s)</strong></th></tr></thead><tbody><tr><td>guiadosabor[.]com</td><td>1 Billion+</td><td>Rubicon</td><td>Grumft Media</td></tr><tr><td>mikerin[.]com</td><td>1 Billion+</td><td>Google</td><td>Agency Orquidea</td></tr><tr><td>journeywithvision[.]com</td><td>100 Million &#8211; 1 Billion</td><td>Google</td><td>Publift</td></tr><tr><td>adorablefurnishing[.]com</td><td>100 Million &#8211; 1 Billion</td><td>Google</td><td>redmas.com / Entravision Latam</td></tr><tr><td>guerytech[.]online</td><td>10 Million &#8211; 100 Million</td><td>Google</td><td>GreedyGame Media</td></tr><tr><td>streamix[.]tv</td><td>10 Million &#8211; 100 Million</td><td>Google</td><td>Join Ads (joinads.me)</td></tr><tr><td>techyclub[.]in</td><td>10 Million &#8211; 100 Million</td><td>Google</td><td>GreedyGame Media</td></tr><tr><td>newsworldcity[.]com</td><td>10 Million &#8211; 100 Million</td><td>Google</td><td>Hafiz Maaz</td></tr><tr><td>virvida[.]com</td><td>1 Million &#8211; 10 Million</td><td>Google</td><td>redmas.com / Entravision Latam</td></tr><tr><td>satishmoheyt[.]in</td><td>1 Million &#8211; 10 Million</td><td>Google / Rubicon</td><td>Verizon Media Inc &amp; Cyber Media (India) Ltd.</td></tr><tr><td>pharmabeaver[.]com</td><td>&lt;1 Million</td><td>Google</td><td>GlobalSNS Titans LTD</td></tr></tbody></table></figure>



<h2 class="wp-block-heading" id="11-conclusion">Conclusion</h2>



<p>In this investigation, we uncovered a network of streaming websites and bogus domains created for the purpose of illicitly gaining revenue from advertisements by a threat actor we called DeepStreamer.</p>



<p>We were impressed by the technical complexity of the code and underlying infrastructure. The perpetrators took many steps to prevent reverse engineering and tracking metrics were not accurately representing the scale of the abuse at play.</p>



<p>We have notified Google and other industry partners and some actions have already taken place. Malwarebytes users are not participating in this invalid traffic scheme defrauding advertisers as we already block the fraudulent domains used.</p>



<p>The active domains used for laundering traffic, as well as some other key details related to projections we made are available in<strong> <a href="https://docs.google.com/spreadsheets/d/13KbXqbARRr9reaxoMJQirYLL21FR81-OvBXMkvfSxhU/edit?usp=sharing" target="_blank" rel="noopener">the companion data workbook.</a></strong></p>



<h1 class="wp-block-heading" id="12-indicators-of-compromise">Indicators of Compromise</h1>



<p><strong>Domains launching invisible iframes:</strong></p>



<p>adorablefurnishing[.]ml</p>



<p>awscloudfront[.]ml</p>



<p>bigcache[.]ml</p>



<p>brcache201[.]ml</p>



<p>brient[.]ml</p>



<p>cache33[.]ml</p>



<p>cdncache[.]ml</p>



<p>compactembed[.]ml</p>



<p>dbcache[.]fun</p>



<p>dcache[.]ml</p>



<p>embed123[.]ml</p>



<p>fcache[.]ml</p>



<p>filecache[.]ml</p>



<p>financeirocartao[.]ml</p>



<p>fishuflatinned[.]ml</p>



<p>fullcdn[.]ga</p>



<p>harateness[.]ml</p>



<p>honessity[.]ml</p>



<p>hypercdn[.]ml</p>



<p>hypercdn3[.]ml</p>



<p>investwell[.]ml</p>



<p>jestick[.]ml</p>



<p>journeywithvision[.]ga</p>



<p>jscache[.]live</p>



<p>kbyte[.]ml</p>



<p>livrosdereceita[.]ml</p>



<p>maxcache[.]ml</p>



<p>mbyte[.]gq</p>



<p>mcdn[.]ga</p>



<p>megacdn[.]ml</p>



<p>megacdn[.]top</p>



<p>megasearch[.]gq</p>



<p>melhoresdomomento[.]ml</p>



<p>mikerin[.]ml</p>



<p>myplayer[.]ml</p>



<p>newsworldcity[.]ml</p>



<p>poptube[.]fun</p>



<p>primesinfo[.]ml</p>



<p>satishmoheyt[.]ml</p>



<p>supercache[.]top</p>



<p>tapcache[.]ml</p>



<p>tcache[.]ml</p>



<p>tecnowebclub[.]ga</p>



<p>toptube[.]fun</p>



<p>uwatchtube[.]ml</p>



<p>video[.]your-notice[.]fun</p>



<p>videocdn[.]fun</p>



<p>videosdahora[.]fun</p>



<p>whatsappvideos[.]ml</p>



<p>wispields[.]ml</p>



<p>wpcache[.]ml</p>



<p>youbesttube[.]gq</p>



<p>yourtube[.]fun</p>



<p>ytcache[.]fun</p>



<p>pharmabeaver[.]ml</p>



<p>pharmabeaver[.]com</p>



<p>virvida[.]com</p>



<p>guiadosabor[.]com</p>



<p>techyclub[.]in</p>



<p>journeywithvision[.]com</p>



<p>newsworldcity[.]com</p>



<p>mikerin[.]com</p>



<p>primesinfo[.]com</p>



<p>investwell[.]site</p>



<p>streamix[.]tv</p>



<p>guerytech[.]online</p>



<p>brandingjoy[.]in</p>



<p>aitechgear[.]in</p>



<p>adorablefurnishing[.]com</p>



<p>satishmoheyt[.]in</p>



<p><strong>Money domains:</strong></p>



<p>brandingjoy[.]in</p>



<p>aitechgear[.]in</p>



<p>guiadosabor[.]com</p>



<p>mikerin[.]com</p>



<p>adorablefurnishing[.]com</p>



<p>journeywithvision[.]com</p>



<p>satishmoheyt[.]in</p>



<p>primesinfo[.]com</p>



<p>techyclub[.]in</p>



<p>streamix[.]tv</p>



<p>newsworldcity[.]com</p>



<p>pharmabeaver[.]com</p>



<p>guerytech[.]online</p>



<p>virvida[.]com</p>



<p><strong>Malicious JavaScript (iframe):</strong></p>



<p>1701f50afde2db48d58e6789cfa810f2fdfae74ad0b5de983ace21beb9542a4b</p>



<p>2405699d9b90c36950440d8dd0335d8da1574abda11ae9900cfb31a68f80a864</p>



<p>344550fd85db609434f9eb6838642df1e0283ce43b23c02859cb593b7331ef70</p>



<p>5f8598bdf64f2f3c7a6b9134cd80bb44ac46f546d4047d796278437b5c3485b7</p>



<p>86c160f073347d3c810a824ba90de66105882195dd607175a32fa7adffe31163</p>



<p>98d2cd6e4f3a3aa3200d53ac09750d192ca6ba546aba09a935fe4f38d878bc4c</p>



<p>af70188588c75165f919c9c155827eb458f26aed5288ef52bab532dc7bd38015</p>



<p>b6845734220755e8a163d27d30fb0470ac0aa0d6e57e52af38fe59619d4dd1fb</p>



<p>bcb9ee387efcd936e2abd1ede483fda13cfe40320af9df6462398f329e6aae1e</p>



<p>fc2006c24b6153bfeafb3e9dc6e5ffc4d239c021f1e1777265569f672b4e184b</p>



<h2 class="wp-block-heading" id="13-about-malwarebytes">About Malwarebytes</h2>



<p>Malwarebytes believes that when people and organizations are free from threats, they are free to thrive. Founded in 2008, Malwarebytes CEO Marcin Kleczynski had one mission: to rid the world of malware. Today, Malwarebytes&#8217; award-winning endpoint protection, privacy and threat prevention solutions and its world-class team of threat researchers protect millions of individuals and thousands of businesses across the globe.&nbsp;</p>



<p>The effectiveness and ease-of-use of Malwarebytes solutions are consistently recognized by independent third parties including <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=1963040584&amp;u=https%3A%2F%2Fwww.malwarebytes.com%2Fblog%2Fnews%2F2022%2F04%2Fmalwarebytes-evaluation-of-the-mitre-engenuity-attck-round-4-emulations&amp;a=MITRE+Engenuity" target="_blank" rel="noopener">MITRE Engenuity</a>, <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=995566812&amp;u=https%3A%2F%2Fwww.malwarebytes.com%2Fblog%2Fbusiness%2F2022%2F12%2Fmalwarebytes-outperforms-competition-in-latest-mrg-ettifas-assessment&amp;a=MRG+Effitas" target="_blank" rel="noopener">MRG Effitas</a>, <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=940814897&amp;u=https%3A%2F%2Favlab.pl%2Fen%2Fsecurity-test-of-400-malicious-samples-in-the-wild%2F&amp;a=AVLAB" target="_blank" rel="noopener">AVLAB</a>, AV-TEST (<a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=421225912&amp;u=https%3A%2F%2Fwww.av-test.org%2Fen%2Fantivirus%2Fhome-windows%2Fwindows-10%2Fjune-2022%2Fmalwarebytes-premium-4.5.8--4.5-221311%2F&amp;a=consumer" target="_blank" rel="noopener">consumer</a> and <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=2070578983&amp;u=https%3A%2F%2Fwww.av-test.org%2Fen%2Fantivirus%2Fbusiness-windows-client%2Fwindows-10%2Fjune-2022%2Fmalwarebytes-endpoint-protection-1.2-222316%2F&amp;a=business" target="_blank" rel="noopener">business</a>), <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=3716309128&amp;u=https%3A%2F%2Fgo.malwarebytes.com%2FVoC-Report_Gartner_01.LP.html&amp;a=Gartner+Peer+Insights" target="_blank" rel="noopener">Gartner Peer Insights</a>, <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=2269880369&amp;u=https%3A%2F%2Fwww.g2.com%2Fcategories%2Fendpoint-protection-suites&amp;a=G2+Crowd" target="_blank" rel="noopener">G2 Crowd</a> and <a href="https://c212.net/c/link/?t=0&amp;l=en&amp;o=3791949-1&amp;h=2217975717&amp;u=https%3A%2F%2Fwww.cnet.com%2Ftech%2Fservices-and-software%2Fbest-antivirus%2F&amp;a=CNET" target="_blank" rel="noopener">CNET</a>.</p>



<p>The company is headquartered in California with offices in Europe and Asia. For more information and career opportunities, visit <a href="https://www.malwarebytes.com" target="_blank" rel="noopener">https://www.malwarebytes.com</a>.</p>



<h2 class="wp-block-heading" id="14-about-deepsee">About DeepSee</h2>



<p><a href="http://deepsee.io">DeepSee</a> uses highly sophisticated crawlers, combined with rigorous network analysis, in order to capture the behaviors websites present when visited by actual humans, and contextualize those behaviors within the graph of the internet.</p>



<p>DeepSee uses this data to arm advertising professionals with ground-truth signals about content appropriateness, ad-density, on-page technologies, backlink makeup, and more.</p>



<p>This dataset enables the sell-side to effectively &amp; automatically moderate the quality of the inventory they offer, and empowers the buy-side to quickly generate robust blocking / targeting lists.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What is Brand Suitability?</title>
		<link>https://deepsee.io/blog/what-is-brand-suitability</link>
		
		<dc:creator><![CDATA[Antonio Torres]]></dc:creator>
		<pubDate>Tue, 20 Sep 2022 16:00:00 +0000</pubDate>
				<category><![CDATA[Educational]]></category>
		<category><![CDATA[brand suitability]]></category>
		<category><![CDATA[quality]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1435</guid>

					<description><![CDATA[Brand Suitability is a measure of the overall compatibility between an advertiser&#8217;s goals and the content where an ad is being displayed next to. The goal is to create positive brand associations and campaign outcomes while minimizing risk. It’s difficult for marketers to ensure that their ads will not be displayed alongside high-risk content, or &#8230; <a href="https://deepsee.io/blog/what-is-brand-suitability">Continued</a>]]></description>
										<content:encoded><![CDATA[
<p><strong>Brand Suitability</strong> is a measure of the overall compatibility between an advertiser&#8217;s goals and the content where an ad is being displayed next to. The goal is to create positive brand associations and campaign outcomes while minimizing risk. It’s difficult for marketers to ensure that their ads will not be displayed alongside high-risk content, or even content that is in the wrong category altogether. Balancing these constraints can help ensure that the publisher, and the content it is displayed alongside, are <strong>brand suitable</strong> and will yield high quality results.</p>



<p>It’s not enough to avoid fraud or the usual <a href="https://en.wikipedia.org/wiki/List_of_fake_news_websites" target="_blank" rel="noreferrer noopener">fake news website</a>. Advertising is a numbers game, one that is played long-term. Creating strong positive associations with your brand is important, but its significance is often overlooked in favor of buying inexpensive ad space. Unfortunately, this grab-and-go strategy wastes significant parts of advertisers&#8217; budgets, and it doesn&#8217;t have to be that way.</p>



<h2 class="wp-block-heading">What You&#8217;ll Learn</h2>



<ul><li>What is brand suitability?</li><li>How does it impact my advertising strategy?</li><li>What factors impact my brand suitability?</li><li>How can I improve my brand suitability?</li></ul>



<h2 class="wp-block-heading">Suitability: More Than Brand Safety</h2>



<p>We now know that brand suitability is focused on identifying appropriate environments for your advertisements. It&#8217;s a holistic view where advertisers have to ask themselves, is this publisher someone my brand wants to associate with? Remember, who you associate with matters. When your ads are auctioned into ad units on websites with low organic traffic alongside unrelated content, you can be sure that you’ve wasted your time and money.</p>



<p>If you’ve ever used an inclusion list, this may sound simple, you curate a list of publishers you are interested in and launch your campaign. However, studies show that almost <a href="https://www.statista.com/statistics/1117195/invalid-digital-ad-traffic-share-region/" target="_blank" rel="noreferrer noopener nofollow"><strong>11% of all global ad traffic is invalid</strong></a>. Inclusion lists can certainly help reduce this problem, but re-using the same inclusion list may not be enough to stop ads from winding up in low-quality environments that create this invalid traffic. Inclusion lists require ongoing work to ensure you can hit the right scale, too small of a list and you may end up way under your delivery goals. Prospecting for new publishers and monitoring the quality of publishers in your inclusion is vital.</p>



<p>When your advertising strategy is focused more on allocating funds for quick and cheap impressions, it becomes a race to see which advertising space you can occupy first. Prioritizing brand suitability helps you develop a sustainable strategy that doesn’t use a wide net but, instead, a fine-tooth comb.</p>



<h2 class="wp-block-heading">How to Find Brand Suitable Publishers</h2>



<p>There are different factors that can make a publisher brand suitable. <a href="https://deepsee.io/blog/made-for-advertising-sites-waste-ad-budget" target="_blank" rel="noreferrer noopener"><strong>Made-for-advertising</strong></a> (MFA) pages, for example, create a low-quality environment for your advertisements. These pages would not be considered brand suitable and, in fact, can even cause higher churn rates.</p>



<p>MFA sites disguise their behavior based on incoming traffic, and they’re also relatively cheap when it comes to CPM (cost per mille) values. This allows MFA sites to easily evade suspicion from anyone buying space on the page. As a result, these <a href="https://deepsee.io/blog/2-tales-one-site-how-arbitrage-sites-manipulate-metrics" target="_blank" rel="noreferrer noopener"><strong>arbitrage websites directly impact both direct and paid traffic for advertisers.</strong></a></p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="700" height="527" src="https://deepsee.io/wp-content/uploads/2022/09/screely-1663286370362.png" alt="What is Brand Suitability?" class="wp-image-1485" srcset="https://deepsee.io/wp-content/uploads/2022/09/screely-1663286370362.png 700w, https://deepsee.io/wp-content/uploads/2022/09/screely-1663286370362-300x226.png 300w" sizes="(max-width: 700px) 100vw, 700px" title="What is Brand Suitability?"><figcaption>The results can sometimes feel stranger than fiction</figcaption></figure></div>


<p>When a publisher is brand suitable, they align more with the values, ideals, and aesthetics of your brand. For example, a car manufacturer would not want to advertise on an article about a tragic incident involving a vehicle and pedestrians. Considering brand suitable publishers means evaluating the priorities of your brand and partnering with sites that match your style.</p>



<h2 class="wp-block-heading">Brand Suitability Factors Worth Considering</h2>



<p>Knowing when to work with a publisher starts with evaluating four key aspects. There are a variety of factors that make a publisher brand suitable, many of which focus on avoiding negative, inappropriate, or low-quality content. Today we&#8217;ll be covering four important factors.</p>



<h3 class="wp-block-heading">Content Category</h3>



<p>This is a baseline requirement and refers to broad categories such as Sports, Arts &amp; Crafts, Finance, etc. When focusing on brand suitable publishers, it’s necessary to ensure that you can accurately target the right content category on any potential ad space. This applies for inclusion and exclusion alike as there likely are categories that, in relation to your ad, will not yield the right results. Determining the content category of a page or website at scale is tough to do as it is a continuous process. Our <a href="https://deepsee.io/publisher-risk-portal" target="_blank" rel="noreferrer noopener">platform makes this process </a>simple by accurately categorizing every web page using a state of the art machine learning model.</p>



<h3 class="wp-block-heading">Brand Association</h3>



<p>Brand association is the connection that a customer has to your brand, it’s essentially how a customer remembers you. Any level of brand association requires marketers to identify which organizations, products, and services have complimentary attributes. For example, a Luxury car company may target websites that focus on sophistication or expensive products, such as high-quality watches. However, this could even include a financial news website or a blog about investments. Brand association also encompasses which other brands advertise on those sites and helps to narrow down the type of potential traffic that a site may receive. Generating consistent positive associations with your brand can have both short and long term benefits.</p>



<h3 class="wp-block-heading">Brand Safety Risk</h3>



<p>Certain content can pose a brand safety risk to your advertisement. This includes anything that may not align directly with your brand, but it can also include inappropriate content or fake news. Mitigating this risk means ensuring that all content is safe and preferably shared by a credible publisher. One necessary step is to check disinformation databases that may list websites known to publish disinformation and exclude them from your targeting pool. Our platform provides a convenient flag on all of our results indicating if a publisher has been listed in any disinformation database.</p>



<h3 class="wp-block-heading">Fraud Risk</h3>



<p>Low-quality websites are more than just poorly built — oftentimes, these pages come with serious risks of fraud. If the publisher sources bots or incentivizes traffic to their page, there is a chance that they could be fraudulent or put users at risk. Determining if a page has a fraud risk involves learning whether or not they hide or stack ads, if they use pop-ups, and more. Using a <a href="https://deepsee.io/publisher-risk-portal/" target="_blank" rel="noreferrer noopener">platform that offers a way to view publisher risks from a high level</a> is a great way to meet campaign scale goals while increasing the strength their brand. There is no need to choose one or the other.</p>



<h2 class="wp-block-heading">Tips for Finding Brand Suitable Publishers</h2>



<p>Identifying brand suitable publishers to partner with requires you to start by searching within your target industry. This depends on how well you can ensure that the publishers who host your ads resonate with your product or service. Visit these pages, look for the amount of ads that they display, gauge the quality and uniqueness of their content, verify that they provide a good user experience. These are all key indicators of a strong, brand suitable site.</p>



<p>Identifying these low-quality pages at scale is an issue for many advertisers. That’s why deepsee.io offers an efficient solution for identifying and avoiding low-quality made-for-advertising publishers in your campaigns, making it easier for you to direct your ad traffic to the appropriate locations.</p>



<p>With access to<strong> <a href="https://deepsee.io/publisher-risk-portal/">Deepsee&#8217;s PublisherRisk Portal</a></strong>, advertisers and agencies can efficiently prospect for publishers and evaluate publishers across many of the factors mentioned in this article. Our system audits millions of websites on a daily basis to provide accurate information at scale. You can upload a delivery report and gauge the quality of a previous campaign, or start anew with our publisher discovery tools.</p>



<p>Learn more about our <a href="https://deepsee.io/publisher-risk-portal/"><strong>PublisherRisk Portal</strong></a>. Not sure where to start? We offer a <a href="https://deepsee.io/media-spend-audit-ad-budget-clawbacks"><strong>a free media audit</strong></a> where we evaluate one of your previous campaigns and provide you with an inclusion list, all free of charge. </p>



<figure class="wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter"><div class="wp-block-embed__wrapper">
<blockquote class="twitter-tweet" data-width="500" data-dnt="true"><p lang="en" dir="ltr">Some say that programmatic advertising has taken away a brand&#39;s ability to choose who they associate their brand with.<br><br>Hoping your ad doesn&#39;t end up funding disinformation and negatively impacting your brand?<br><br>It can feel like you&#39;re grasping in the dark.<br><br>Let&#39;s try to fix it<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f9f5.png" alt="🧵" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>&mdash; DeepSee.io (@deepsee_io) <a href="https://twitter.com/deepsee_io/status/1572261821519917056?ref_src=twsrc%5Etfw" target="_blank" rel="noopener">September 20, 2022</a></blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</div><figcaption>On twitter? We have a thread version of this post! </figcaption></figure>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How Solitaired.com Turbocharged Viewability, and Reduced Bid Volume Without Losing Revenue</title>
		<link>https://deepsee.io/blog/how-solitaired-com-turbocharged-viewability-and-reduced-bid-volume-without-losing-revenue</link>
		
		<dc:creator><![CDATA[Emry Downinghall]]></dc:creator>
		<pubDate>Tue, 13 Sep 2022 16:01:47 +0000</pubDate>
				<category><![CDATA[Case Study]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1450</guid>

					<description><![CDATA[The following is a guest post from our friends at Unwind Media, with statistical insights provided by Edward Krueger, Chief Data Scientist at DeepSee: Solitaired.com faces a challenge familiar to many publishers. While our platform and engagement stats are strong, we are still a young company and don’t have the brand recognition from advertisers to &#8230; <a href="https://deepsee.io/blog/how-solitaired-com-turbocharged-viewability-and-reduced-bid-volume-without-losing-revenue">Continued</a>]]></description>
										<content:encoded><![CDATA[
<p><strong><em>The following is a guest post from our friends at Unwind Media, with statistical insights provided by Edward Krueger, Chief Data Scientist at DeepSee:</em></strong></p>



<p><a href="http://solitaired.com" target="_blank" rel="noreferrer noopener">Solitaired.com</a> faces a challenge familiar to many publishers. While our platform and engagement stats are strong, we are still a young company and don’t have the brand recognition from advertisers to drive meaningful direct sales. As a result, our near-term goal is to make our ad inventory as compelling as possible for SSPs and subsequently DSPs and their advertisers.</p>



<p>The concept we were interested in testing is straight forward:</p>



<p><em>In addition to requiring that our ad units are viewable when refreshing, what happens if we also ensured that ads would not be shown to sessions that have gone idle?</em></p>



<p>This means for ads to refresh the placement would need to be deemed in-view and the user will have to have performed an action on-page within the last 60 seconds such as a click or scroll event.</p>



<p>In-view refresh is industry standard so adding an engagement requirement takes that a step further. It’s a simple change we hypothesized would increase viewability, decrease total bid requests and have minimal negative impact to topline revenue.&nbsp;</p>



<p>The concept for this test was driven by a focus on measurement and efficiency. While the casual gaming category has awesome websites, some have fallen under appropriate scrutiny for <a href="https://deepsee.io/blog/made-for-advertising-sites-waste-ad-budget" target="_blank" data-type="post" data-id="1337" rel="noreferrer noopener">arbitraged traffic</a> and poor ad experiences and we wanted to create as much distance from those stereotypes as possible. This, in addition to our +99% direct traffic footprint and strong user engagement stats, would help us stand out.</p>



<p>We partnered with <a href="http://deepsee.io">DeepSee</a> to evaluate test design and validate results. DeepSee is an industry leader in assessing publisher ad quality and while they typically focus on identifying bad actors, this was an opportunity to shift that focus to a feature that could potentially establish best practices of high-quality ad inventory.</p>



<h2 class="wp-block-heading"><strong>About Us</strong></h2>



<p><a href="https://solitaired.com/" target="_blank" rel="noreferrer noopener">Solitaired.com is a casual gaming platform with over 500 games</a>. The most popular are classics like solitaire, freecell and a word game called Phrazle. Stats we’re proud of include +400% YoY session growth, 7 games played per user session, an 8x monthly return rate and over 3 million games played daily.&nbsp;</p>



<p>We’re a small business powered by user growth and monetized by open exchange ad revenue. To balance both, we try to present an ad footprint we believe is reasonable and we experiment with to improve. On desktop our standard layout is two display units with a dismissible outstream (placement=5) video player, on tablet we have two display units, and mweb has a max of two units depending on device type.&nbsp;</p>



<h2 class="wp-block-heading"><strong>Thoughts heading into the test</strong></h2>



<p>I was confident based on Solitaired’s direct traffic profile and engagement metrics that we have very active users, but we had not run a similar test prior so it wasn&#8217;t clear what metrics would be affected. However, we had no reason to suspect we had idle users and our domain level viewability was already strong at around 84%.</p>



<p>We defined idle as no activity for 60 seconds. We settled on 60 seconds vs. 30 seconds because we knew some of our users think about their next move or guess within a 30 second timeframe. Extending it a bit beyond our 30 second in-view refresh trigger felt warranted.</p>



<h2 class="wp-block-heading"><strong>Test setup and considerations</strong></h2>



<p>This test was set up as a before-after comparison. Although we typically favor A/B testing, we elected not to run this as an A/B test because we thought it would significantly increase complexity while muddling the positive impact of improved viewability to the buy-side, which is often measured on the macro ad unit vs. impression level.</p>



<p>While we recognized that seasonality could impact revenue, we were confident that core metrics like viewability and bid requests wouldn’t be affected and the intention wasn’t short term CPM driven.&nbsp;</p>



<p>This test went live on July 5th for display. We were initially hesitant to include our dismissible outstream unit since the user has the option of closing that unit at any point during gameplay (making it harder to measure). However, we elected to include it since it could be argued this feature is even more critical to video than display.&nbsp;</p>



<h2 class="wp-block-heading"><strong>Technical Implementation</strong></h2>



<p>For purposes of this test, we defined a user interaction as one of the following events: a click, a scroll, or a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/touchstart_event" target="_blank" rel="noreferrer noopener nofollow">touchstart</a>.</p>



<p>In addition to monitoring these events, an ad must meet viewability requirements (as defined by GAM via their impressionViewable event combined with our own viewability monitoring in-browser), regardless of when it is rendered or refreshed.&nbsp;</p>



<p>With each user interaction, we reset our 60-second user interaction timer. On any display refresh opportunity, we check to ensure that the user interaction happened within the last 60 seconds. If not, no calls are made to SSPs, nor are calls made to GAM to trigger a new auction and ad render.&nbsp;</p>



<p>Our outstream video runs on its own auction cycle timer and this is subject to the same 60-second interaction time frame as display.</p>



<p>Once a user returns to interact with the page (using any of the above events, or with a window focus event) display ads are re-enabled, and the outstream auction cycle is restarted.</p>



<h2 class="wp-block-heading"><strong>Metrics tracked</strong></h2>



<ul>
<li><strong>Total impressions</strong> &#8211; The total number of ad impressions, paid and unpaid, served to a user per game.&nbsp;</li>
</ul>



<ul>
<li><strong>Paid impressions</strong> &#8211; The total number of ad impressions monetized above price floors per game.</li>
</ul>



<ul>
<li><strong>Unpaid impressions</strong> &#8211; The total number of ad impressions that either didn’t receive a bid or fell below established price floors.</li>
</ul>



<ul>
<li><strong>Viewability</strong> &#8211; The change in viewability by device type before and after implementation.</li>
</ul>



<ul>
<li>Ad <strong>requests</strong> &#8211; The change in total ad requests by domain before and after feature implementation.&nbsp;</li>
</ul>



<h2 class="wp-block-heading"><strong>Results</strong></h2>



<p>The following metrics describe the period between July 6th and August 5th 2022:</p>


<div class="wp-block-image is-style-default">
<figure class="alignleft is-resized"><img decoding="async" loading="lazy" src="https://lh6.googleusercontent.com/7FNosPDUuj03NCCVjkhyPKMMftPCLq7OC_-eHWot3i2zXTBs9EFMGxHWWuCypY7FHgnSynbwhpicPbEkqOgye6QEsMuycB_TtZOobV20Dq2h3JLWsBOhnUv3U33U8ZX0WVFGoAWNT3QTvQm68dD_IRNT0WqgoOapq0KjERUqz2syppT5Bl3WKk1cYA" alt="Unpaid impressions decreased and daily ad viewability improved" width="822" height="512" title="How Solitaired.com Turbocharged Viewability, and Reduced Bid Volume Without Losing Revenue"><figcaption class="wp-element-caption">The impact was significant. Overall improvements despite a 20% decrease in total bid requests.</figcaption></figure></div>


<figure class="wp-block-image size-large is-style-default"><img decoding="async" loading="lazy" width="1024" height="669" src="https://deepsee.io/wp-content/uploads/2022/09/image-28-1024x669.png" alt="Bid requests decreased while viewability improved" class="wp-image-1496" srcset="https://deepsee.io/wp-content/uploads/2022/09/image-28-1024x669.png 1024w, https://deepsee.io/wp-content/uploads/2022/09/image-28-300x196.png 300w, https://deepsee.io/wp-content/uploads/2022/09/image-28-768x502.png 768w, https://deepsee.io/wp-content/uploads/2022/09/image-28-1536x1003.png 1536w, https://deepsee.io/wp-content/uploads/2022/09/image-28-2048x1338.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" title="How Solitaired.com Turbocharged Viewability, and Reduced Bid Volume Without Losing Revenue"><figcaption class="wp-element-caption">The improvement was stable over the observation period.</figcaption></figure>



<h2 class="wp-block-heading"><strong>Conclusion</strong></h2>



<p>We consider this a successful test and have elected to keep it live across 100% of our inventory.&nbsp; As we saw no decrease to paid impressions, this is do-no-harm to immediate performance that puts us in a better position to advocate for spend and inclusion in always-on PMPs and high viewability packages.&nbsp;</p>



<p>Similar to the standardization the industry has seen with refresh on-view, we believe there’s benefit for both buy and sell side with the adoption of refresh on-idle. The more buyers continue to value and reward publishers for increasing viewability and request efficiency, the greater benefit this will have for publishers and with better performance will come more rapid adoption.&nbsp;</p>



<p><strong><em>Have any thoughts after reading? Come and join the conversation with us on <a href="https://www.linkedin.com/feed/update/urn:li:activity:6975484956542205952" target="_blank" rel="noopener">linkedin</a> or  <a href="https://mobile.twitter.com/deepsee_io/status/1569718464407023617" target="_blank" rel="noopener">twitter!</a></em></strong></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Made-For-Advertising Sites Burn 12% of Your Ad Budget</title>
		<link>https://deepsee.io/blog/made-for-advertising-sites-waste-ad-budget</link>
		
		<dc:creator><![CDATA[Antonio Torres]]></dc:creator>
		<pubDate>Wed, 17 Aug 2022 12:08:00 +0000</pubDate>
				<category><![CDATA[Educational]]></category>
		<category><![CDATA[Laundering]]></category>
		<category><![CDATA[MFA]]></category>
		<category><![CDATA[pin]]></category>
		<category><![CDATA[Traffic]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1337</guid>

					<description><![CDATA[Low-quality content is a waste of your time and money and, when it comes to your advertising budget, every dollar counts. Most advertisers waste over 12% of their programmatic display ad budget made-for-advertising sites, that number climbs to 24% for programmatic web video. These fraudulent websites are everywhere, usually filled with spam, clickbait, and stolen &#8230; <a href="https://deepsee.io/blog/made-for-advertising-sites-waste-ad-budget">Continued</a>]]></description>
										<content:encoded><![CDATA[
<p>Low-quality content is a waste of your time and money and, when it comes to your advertising budget, every dollar counts. Most advertisers waste over <strong>12% of their programmatic display ad budget</strong> made-for-advertising sites, that number climbs to <strong>24% for programmatic web video</strong>. These fraudulent websites are everywhere, usually filled with spam, clickbait, and stolen content, making it harder to optimize your ad budget. Although made-for-advertising (MFA) sites can be hard to spot, there’s an easy way to avoid them while strengthening your ad targeting at the same time.</p>



<h2 class="wp-block-heading">What You’ll Learn</h2>



<ul><li>What are made-for-advertising websites?</li><li>How do low-quality websites avoid detection?</li><li>What is brand suitability?</li><li>How can you avoid MFA websites?</li><li>What is an inclusion list?</li><li>How can you reduce digital advertising waste?</li></ul>



<h2 class="wp-block-heading">Understanding Made-For-Advertising Websites</h2>



<p><strong>Made-for-advertising websites</strong> rely heavily on programmatic ads that refresh constantly, changing the content they display based on the origin of the user visiting the page. Avoiding these websites can be tricky since many of them have found ways to get around SSPs (Supply Side Platforms) that check for low-quality pages, so oftentimes they go undetected for long periods of time, wasting millions of dollars in the process.</p>



<p>MFA sites behave differently depending on the traffic they get. With <strong>direct traffic</strong>, for example, a user may have a normal ad-viewing experience. However, if the site identifies <strong>paid traffic</strong>, it will overload the user with aggressive monetized ads (Check out our study on <a href="https://deepsee.io/blog/2-tales-one-site-how-arbitrage-sites-manipulate-metrics" data-type="post" data-id="580" target="_blank" rel="noreferrer noopener">traffic arbitrage and made-for-advertising websites</a>). This type of dynamic behavior helps MFA sites to avoid scrutiny from advertising partners, allowing them to easily navigate compliance checks from supply site partners.</p>



<figure class="wp-block-video"><video autoplay loop muted src="https://deepsee.io/wp-content/uploads/2022/08/A-Typical-MFA-website.mp4"></video><figcaption><em>A typical made-for-advertising website</em></figcaption></figure>



<p>These fraudulent pages host low-quality or stolen content to appear more legitimate. Combine that with a website that is overloaded with advertisements jockeying for position in front of the user, and you end up with a chaotic and unpleasant viewing experience for anyone visiting the page.&nbsp;</p>



<p>Placement on these sites is often relatively inexpensive, making it an attractive option for advertisers looking to get the greatest number of impressions out of their budget. These pages boast high view counts and completion rates due to non-stop ads but, unfortunately, the organic traffic to these websites is usually minuscule and inconsequential.</p>



<h2 class="wp-block-heading">How MFA Websites Pose a Brand Suitability Risk</h2>



<p><strong>Brand suitability</strong> is centered on identifying a brand’s needs and increasing value through online content placement. Displaying high-quality ads alongside relevant content is the goal, but brand suitability is about finding the advertising space that fits your brand across multiple channels.&nbsp;</p>



<p><strong>Made-for-advertising</strong> websites threaten brand suitability by placing your ads in an environment that discourages organic traffic. In <a href="https://go.integralads.com/the-halo-effect.html" target="_blank" rel="noreferrer noopener nofollow">a 2019 study</a>, ads that were viewed in “high-quality mobile web environments” were seen “74% more favorably than the same advertisements seen in low-quality context.” By keeping programmatic ads on MFA sites, you reduce the quality of your brand suitability by not directly optimizing ad placement. It’s almost like throwing money away.</p>



<h2 class="wp-block-heading">Avoiding Made-For-Advertising Websites</h2>



<p>Brand suitability relies on being aware of your brand image. The clearer your vision is for your brand, the easier it is to target your advertisements to relevant websites. Putting your ad spend in the right places lets you effectively increase traffic and reduce <strong>digital advertising waste</strong>.&nbsp;</p>



<p>Part of finding a higher-quality environment for your advertisements involves developing an inclusion list that highlights the pages where your programmatic ads will succeed. Avoiding made-for-advertising websites gives you more control over where your ads are found. By narrowing in on these sites, you avoid the clickbait headlines that often drown out your ads while simultaneously creating a better space for consumers to view your content.</p>



<p>This is why deepsee.io offers a <a href="https://deepsee.io/publisher-risk-portal/" target="_blank" rel="noreferrer noopener">Publisher Risk Portal</a> for marketers who want to effectively and easily <strong>remove made-for-advertising sites</strong> from their inclusion list. Our platform gives advertisers the ability to audit websites where their ads are displayed, as shown in their campaign delivery reports, to identify possible risks and avoid any pages that might host exploitative behavior. deepsee.io also provides access to dynamic domain lists, manual domain lists, and other powerful tools to help agencies and advertisers increase the quality of their programmatic spending at scale.</p>



<p><br>Want to learn more? <a href="https://deepsee.io/media-spend-audit-ad-budget-clawbacks" target="_blank" data-type="page" data-id="1282" rel="noreferrer noopener">Get a free media audit from us and see the impact on your KPIs first hand.</a></p>
]]></content:encoded>
					
		
		<enclosure url="https://deepsee.io/wp-content/uploads/2022/08/A-Typical-MFA-website.mp4" length="4273770" type="video/mp4" />

			</item>
		<item>
		<title>Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts</title>
		<link>https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Wed, 10 Aug 2022 15:22:05 +0000</pubDate>
				<category><![CDATA[Research & Development]]></category>
		<category><![CDATA[incentivized traffic]]></category>
		<category><![CDATA[paid traffic]]></category>
		<category><![CDATA[pin]]></category>
		<category><![CDATA[rewarded traffic]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1297</guid>

					<description><![CDATA[Introduction to Rewarded Traffic In this article we provide clarity into the practice of &#8220;rewarded traffic,&#8221; or traffic generated by users who are compensated with in-game currency in exchange for opening ad-monetized publisher pages in a webview during an ad break. There is no real industry documentation about this format for creating web traffic; it &#8230; <a href="https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat">Continued</a>]]></description>
										<content:encoded><![CDATA[<div class="ub_table-of-contents" data-showtext="show" data-hidetext="hide" data-scrolltype="auto" id="ub_table-of-contents-59279b41-1a73-4769-bba0-5ca2cf8190b2" data-initiallyhideonmobile="false"
                    data-initiallyshow="true"><div class="ub_table-of-contents-header-container"><div class="ub_table-of-contents-header">
                    <div class="ub_table-of-contents-title">Table of Contents</div></div></div><div class="ub_table-of-contents-extra-container"><div class="ub_table-of-contents-container ub_table-of-contents-1-column "><ul><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#0-introduction-to-rewarded-traffic>Introduction to Rewarded Traffic</a></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#1-what-is-incentivized-traffic>What is Incentivized Traffic?</a><ul><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#2-is-incentivized-traffic-considered-ivt>Is Incentivized Traffic Considered IVT?</a></li></ul></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#3-what-are-rewarded-ads->What Are Rewarded Ads?</a><ul><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#4-do-we-really-need-another-term-for-incentivized>Do We Really Need Another Term for &#8220;Incentivized?&#8221;</a></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#5-how-are-rewarded-ads-supposed-to-look>How Are Rewarded Ads Supposed to Look?</a></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#6-can-advertisers-identify-rewarded-inventory>Can Advertisers Identify Rewarded Inventory?</a></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#7-is-generating-rewarded-traffic-a-valid-use-of-rewarded-ad-placements->Is Generating Rewarded Traffic a Valid Use of Rewarded Ad Placements?</a></li></ul></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#8-who-generates-rewarded-traffic>Who Generates Rewarded Traffic?</a><ul><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#9-which-apps-use-the-hypermx-sdk>Which Apps Use the HyperMX SDK?</a></li></ul></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#10-examining-hypermx-sdk-signals-from-our-packet-captures>Examining HyperMX SDK Signals From Our Packet Captures</a><ul><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#11-programmatic-display-amp-video-supported-web-page>Programmatic Display &amp; Video Supported Web Page</a></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#12-audio-ad-supported-podcast>Audio-Ad Supported Podcast</a></li></ul></li><li><a href=https://deepsee.io/blog/rewarded-traffic-incentivized-traffic-in-a-top-hat#13-conclusion>Conclusion</a></li></ul></div></div></div>


<h2 class="wp-block-heading" id="0-introduction-to-rewarded-traffic">Introduction to Rewarded Traffic</h2>



<p>In this article we provide clarity into the practice of &#8220;rewarded traffic,&#8221; or traffic generated by users who are compensated with in-game currency in exchange for opening ad-monetized publisher pages in a webview during an ad break. There is no real industry documentation about this format for creating web traffic; it could best be considered an abuse / misapplication of rewarded ad placements.</p>



<p>For Example, take this session we captured from the <a href="https://play.google.com/store/apps/details?id=com.kiloo.subwaysurf" target="_blank" rel="noopener">Subway Surfer app on Android</a>:</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Jun Rewarded InApp Web Publisher Traffic Example" width="500" height="281" src="https://www.youtube.com/embed/hZJV3WxYlXo?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div><figcaption><em>An example of a session to Young Hollywood generated via rewarded advertisement</em></figcaption></figure>



<p>At ~:15 seconds you can see where a users is asked to choose between using in-game currency to re-spawn, or watching an ad. The &#8220;Ad&#8221; is actually a whole ad-monetized publisher page (younghollywood[.]com in this case) containing multiple display placements and a video ad. The video doesn&#8217;t load properly in this example due to the packet capture software we were using, but normally a video ad would be the first thing you see. After visiting 2 pages for ~15 seconds each, we can drop back into the game. </p>



<p>For years, millions of daily visits to ad-monetized publisher destinations have likely been generated this way. In these situations, advertisers have no idea that they are paying to reach a user who is being compensated to interact with the publisher&#8217;s content. </p>



<p>Despite a growing negative sentiment, and public comments by Google and The Trade Desk condemning incentivized traffic as invalid, rewarded traffic maintains a thin veneer of legitimacy due to the fact that users are paid using in-game currency, not gift cards or cash (despite in-game currency having a quantifiable cash value). To make matters worse, the practice remains largely unnoticed / unknown, because such traffic is hardly ever identified as rewarded/incentivized to the buyer.</p>



<p>Our hope is to make this practice more transparent to programmatic ad buyers, and inspire them to ask the questions that need to be asked in order to prevent serving ads to users without genuine intent in the content.</p>


<div class="ub_styled_list " id="ub_styled_list-609a7e5c-7b25-460f-89e9-b65e97eaa9a8"><ul class="fa-ul"><li>For additional reporting, including details &amp; perspective from the buy-side, check out <a href="https://www.marketingbrew.com/stories/2022/08/11/major-publishers-are-buying-ads-in-mobile-games-like-subway-surfers-to-juice-traffic" target="_blank" rel="noopener">Marketing Brew&#8217;s excellent coverage of this scheme</a></li></ul></div>


<h2 class="wp-block-heading" id="1-what-is-incentivized-traffic">What is Incentivized Traffic?</h2>



<p>Incentivized traffic comes from users who are paid to visit a certain web property. They may additionally be required to perform subsequent actions on the page to receive their rewards. Compensation usually comes in the form of points that can be redeemed for cash, prizes, or gift cards.</p>



<h3 class="wp-block-heading" id="2-is-incentivized-traffic-considered-ivt">Is Incentivized Traffic Considered IVT?</h3>



<p>The MRC does not consider incentivized traffic invalid, but there is growing sentiment by the largest DSPs that this traffic is invalid. A few months ago, <a href="https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too">we published research on incentivized traffic sourcing by one of the worlds largest publishers</a>. At that time, both Google and The Trade Desk made fairly unambiguous statements about such traffic being considered invalid.</p>



<p>At the time, Google offered this on-the-record statement:</p>



<blockquote class="wp-block-quote"><p><em>“Google considers invalid traffic to be ad traffic that does not represent genuine user intent or interest. This includes both incentivized traffic and traffic from pop-unders. Generally speaking, invalid traffic applies to any clicks or impressions that may artificially inflate an advertiser’s costs or a publisher’s earnings.[…]”</em></p></blockquote>



<p>This is echoed in numerous places across their publisher policies, for example in the AdSense publisher policies:</p>



<figure class="wp-block-image"><img decoding="async" loading="lazy" width="836" height="149" src="https://deepsee.io/wp-content/uploads/2022/04/image-3.png" alt="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts" class="wp-image-1080" srcset="https://deepsee.io/wp-content/uploads/2022/04/image-3.png 836w, https://deepsee.io/wp-content/uploads/2022/04/image-3-300x53.png 300w, https://deepsee.io/wp-content/uploads/2022/04/image-3-768x137.png 768w" sizes="(max-width: 836px) 100vw, 836px" title="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts"><figcaption><a href="https://support.google.com/adsense/answer/2660562?hl=en#zippy=%2Cusing-an-incentivized-traffic-source" target="_blank" rel="noreferrer noopener">https://support.google.com/adsense/answer/2660562?hl=en#zippy=%2Cusing-an-incentivized-traffic-source</a></figcaption></figure>



<p>Google&#8217;s specific mention of &#8220;genuine user intent or interest&#8221; is helpful when thinking about rewarded traffic as well, because that bar is clearly not met. As we saw in the video featured in the introduction, the user has no idea where they will be sent when they choose to watch an ad. It&#8217;s the same for users <a href="https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#6-describing-the-incentivized-traffic-with-examples">we showed to visit websites via Swagbucks in the past</a>.</p>



<h2 class="wp-block-heading" id="3-what-are-rewarded-ads-">What Are Rewarded Ads? </h2>



<h3 class="wp-block-heading" id="4-do-we-really-need-another-term-for-incentivized">Do We Really Need Another Term for &#8220;Incentivized?&#8221;</h3>



<p>A quick check of your local thesaurus may reveal that &#8220;incentivized&#8221; and &#8220;rewarded&#8221; mean largely the same thing. Unfortunately, within the universe of AdTech jargon, the terms are unlikely to merge. It&#8217;s commonly understood that &#8220;incentivized&#8221; traffic is generated by users paid in something like cash; gift cards, paypal balance, things of that nature are the commonly assumed end-goal of a user generating incentivized traffic.</p>



<p>&#8220;Rewarded,&#8221; on the other hand, signals a user is compensated using items / currency unique to the game / app-environment they are in. These items generally can&#8217;t be turned into cash, though they can often be purchased with cash. </p>



<h3 class="wp-block-heading" id="5-how-are-rewarded-ads-supposed-to-look">How Are Rewarded Ads Supposed to Look?</h3>



<p>According to <a href="https://admanager.google.com/home/resources/feature-brief-rewarded-ads/" target="_blank" rel="noopener">Google Ad Manager&#8217;s feature brief</a>: &#8220;Ads that users can choose to view in exchange for an in-app reward — such as <strong>watching a video ad</strong> to get an extra life in a game [&#8230;] are called &#8216;rewarded ads.'&#8221; This aligns with the commonly understood definition of a rewarded ad creative looks like: a video, or interactive game demo; <strong>certainly not an ad-monetized web page</strong>.</p>



<p>A good example is visible at the end of our video in the intro section. For example:<br></p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="525" height="1013" src="https://deepsee.io/wp-content/uploads/2022/08/Normal-rewarded-ad-experience.png" alt="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts" class="wp-image-1330" srcset="https://deepsee.io/wp-content/uploads/2022/08/Normal-rewarded-ad-experience.png 525w, https://deepsee.io/wp-content/uploads/2022/08/Normal-rewarded-ad-experience-155x300.png 155w" sizes="(max-width: 525px) 100vw, 525px" title="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts"><figcaption><em>A normal rewarded ad experience.</em></figcaption></figure>



<h3 class="wp-block-heading" id="6-can-advertisers-identify-rewarded-inventory">Can Advertisers Identify Rewarded Inventory?</h3>



<p>Rewarded video / interstitial ads actually can, and should, be identified using the &#8220;rwdd&#8221; attribute within the &#8220;Imp&#8221; object of a bidRequest. This attribute has appeared at least since version 2.6 of the <a href="https://iabtechlab.com/wp-content/uploads/2022/04/OpenRTB-2-6_FINAL.pdf" target="_blank" rel="noopener">Open Real-Time Bidding (ORTB) standard</a>. For those not in the know, that&#8217;s a fancy way of saying that we have standards around rewarded ads, and how they should be identified to advertisers. Bid requests are how publishers announce their ad-inventory to interested buyers, and transmit various attributes of the user &amp; the environment where an ad would be rendered. </p>



<p>However, this standard assumes a &#8220;normal&#8221; rewarded ad experience wherein the ad placement can be accurately attributed to the app a user has open. <strong>Advertisers buying inventory on web pages loaded within rewarded traffic placements generally have no idea the context in which the page is loaded</strong>; they would see it as a mobile web impressions, and the visit as organic.</p>



<p>Given what we know, there&#8217;s an uncomfortable question here that needs to be asked:<strong> </strong> </p>



<h3 class="wp-block-heading" id="7-is-generating-rewarded-traffic-a-valid-use-of-rewarded-ad-placements-"><strong>Is Generating Rewarded Traffic a Valid Use of Rewarded Ad Placements?</strong></h3>



<p>We put forth that it is <strong>not</strong>.</p>



<p>Can you think of any other case where an <em>entire website</em> would be considered a valid ad creative? If <strong><em>entire</em> <em>websites</em></strong> were valid ad-creatives, wouldn&#8217;t websites often be trafficked within display &amp; video placements all over the internet in order to juice visitor numbers? Unfortunately, we live in a society, and that would not be considered acceptable by any major DSP.</p>



<p>It&#8217;s not even just websites that are the destination of rewarded traffic; we captured evidence that users are made to listen to audio advertisements within podcasts. This results in inflated podcast listener counts &amp; audio impression volumes. For example, take this session from the <a href="https://apps.apple.com/us/app/subway-surfers/id512939461" target="_blank" rel="noopener">Subway Surfer app on iOS</a>:</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Subway Surfer   Jun Podcast Imps 1" width="500" height="281" src="https://www.youtube.com/embed/D0uLnyO1vII?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div><figcaption><em>Bloomberg/Businesweek podcast traffic generated via rewarded placement</em></figcaption></figure>



<h2 class="wp-block-heading" id="8-who-generates-rewarded-traffic">Who Generates Rewarded Traffic?</h2>



<p>One of the largest companies with skin in the game, and the one we&#8217;ve dug into for the purpose of this blog post, is the <a href="http://jungroup.com" target="_blank" rel="noopener">Jun Group</a>. Since 2013 they&#8217;ve operated <a href="https://techcrunch.com/2013/03/29/jun-group-launches-hyprmx/" target="_blank" rel="noopener">the HyperMX SDK</a>, which is integrated within hundreds of the top apps across multiple marketplaces. HyperMX is a mediation platform for video ads, and it&#8217;s also used to deliver the rewarded traffic experiences we&#8217;ve shown examples of.</p>



<h3 class="wp-block-heading" id="9-which-apps-use-the-hypermx-sdk">Which Apps Use the HyperMX SDK?</h3>



<p>A sample of the Android apps which can be observed generating Jun Group rewarded traffic <a href="https://docs.google.com/spreadsheets/d/1yanikc761TjC9_CHTWWyHh-4x5ptnH9ubFa2t9OVGT4/edit?usp=sharing" target="_blank" rel="noopener">is available here</a>. They also integrate with iOS apps, but it&#8217;s harder to detect the app in such cases due to differences in available measurement signals between iOS and Android. It&#8217;s likely the iOS counterparts of these apps also integrated the HyperMX SDK, and we can certainly confirm that for one in particular.</p>



<p>The most popular app by FAR, and the one we did most of our testing in, is the <a href="https://play.google.com/store/apps/details?id=com.kiloo.subwaysurf" target="_blank" rel="noopener">Subway Surfer app</a>. This app has over a billion installs on Android alone.</p>



<p>In the following video, Jun group&#8217;s CEO explains how they&#8217;ve included their SDK in several hundred of the most popular apps worldwide in order to to reach hundreds of millions of people back in 2013 (certainly much higher now):</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Jun Group Grows with Mobile Apps &amp; Big Brands" width="500" height="281" src="https://www.youtube.com/embed/v-F6TLezX3g?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div></figure>



<p>It&#8217;s not terribly important you watch the whole thing to understand this blog post, but we found the following quote to confirm our understanding of how the SDK is used:</p>



<blockquote class="wp-block-quote"><p><em>&#8220;We can also bring people to any page from our placements. So, a user might have an opportunity to go see something that&#8217;s sitting on a major publisher site, and it might have a video on it, or it might just be a page.&#8221;</em></p><cite>@1:24 in the above video</cite></blockquote>



<p>Basically, this goes to show they&#8217;re not terribly coy about what their tech does. We understand why they feel that way, there are no standards around this traffic, and hardly anyone knows how it&#8217;s truly delivered. </p>



<p>That bravado is further demonstrated in their <a href="https://jungroup.com/case-studies/" target="_blank" rel="noopener">case studies</a>. For example:</p>



<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="636" src="https://deepsee.io/wp-content/uploads/2022/08/Massive-Scale-Per-Day-Case-Study-1024x636.png" alt="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts" class="wp-image-1353" srcset="https://deepsee.io/wp-content/uploads/2022/08/Massive-Scale-Per-Day-Case-Study-1024x636.png 1024w, https://deepsee.io/wp-content/uploads/2022/08/Massive-Scale-Per-Day-Case-Study-300x186.png 300w, https://deepsee.io/wp-content/uploads/2022/08/Massive-Scale-Per-Day-Case-Study-768x477.png 768w, https://deepsee.io/wp-content/uploads/2022/08/Massive-Scale-Per-Day-Case-Study.png 1337w" sizes="(max-width: 1024px) 100vw, 1024px" title="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts"><figcaption><em>This case study demonstrates the value prop to web publisher clients</em></figcaption></figure>



<p>Translated to a non-weaselese dialect of English: &#8220;the publisher oversold direct campaigns, and had no way to deliver the organic inventory to the advertisers. In order to satisfy the advertiser without any uncomfortable conversations, they paid for over 13 million inorganic visits per-day from users playing mobile games, who had no attachment to the content. The advertiser and the IVT tracking vendors, were none the wiser.&#8221;</p>



<p>This next one makes a lot more sense given the forced podcast visits from the example we shared:</p>



<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="582" src="https://deepsee.io/wp-content/uploads/2022/08/Increasing-Unique-Streams-Case-Study-1024x582.png" alt="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts" class="wp-image-1355" srcset="https://deepsee.io/wp-content/uploads/2022/08/Increasing-Unique-Streams-Case-Study-1024x582.png 1024w, https://deepsee.io/wp-content/uploads/2022/08/Increasing-Unique-Streams-Case-Study-300x171.png 300w, https://deepsee.io/wp-content/uploads/2022/08/Increasing-Unique-Streams-Case-Study-768x437.png 768w, https://deepsee.io/wp-content/uploads/2022/08/Increasing-Unique-Streams-Case-Study.png 1469w" sizes="(max-width: 1024px) 100vw, 1024px" title="Rewarded Traffic: The Inorganic User Engine Driving Ad Campaigns on Major Websites and Podcasts"><figcaption><em>This case study demonstrates the value prop to podcast creator clients</em></figcaption></figure>



<p>When we look at this, the words wiggle &amp; dance around until they look more like: &#8220;we forced 6 million people to listen to a couple 15 second snippets of a podcast, and paid them 125 GuGaCoins for their troubles.&#8221;</p>



<p>Weak KPIs will always be exploited, and this traffic fills a hole.</p>



<h2 class="wp-block-heading" id="10-examining-hypermx-sdk-signals-from-our-packet-captures">Examining HyperMX SDK Signals From Our Packet Captures</h2>



<p>Over the course of the past few months, we played Subway Surfer while capturing detailed network logs from our iOS and Android devices. Feel free to skip to the end if additional technical details don&#8217;t interest you.</p>



<p>There&#8217;s a lot we could say about the flow of traffic, but we want to keep it relatively concise. Folks who want the deepest dive possible can reach out to us on <a href="https://www.linkedin.com/company/deepseeio" target="_blank" rel="noopener">Linkedin</a> or <a href="https://mobile.twitter.com/deepsee_io" target="_blank" rel="noopener">Twitter</a> for more details. Suffice it to say,<strong> none of the bid requests we saw from the ad-monetized web pages in the video labeled the placements as rewarded</strong>, and why would they? Almost everyone in the supply chain believes this to be a normal mobile-web experience.</p>



<p>Particularly, if you are a DSP, SSP, or exchange, and you&#8217;re interested in learning how to identify this traffic within your logs, please do reach out. We can help you enact filters for such traffic using the data you have available.</p>



<h3 class="wp-block-heading" id="11-programmatic-display-amp-video-supported-web-page">Programmatic Display &amp; Video Supported Web Page</h3>



<p>This first example relates to the experience from the following video (shared at the top of the article as well):</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Jun Rewarded InApp Web Publisher Traffic Example" width="500" height="281" src="https://www.youtube.com/embed/hZJV3WxYlXo?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div><figcaption><em>2 Young Hollywood pages visited in a rewarded traffic placement</em></figcaption></figure>



<p>Before we are sent anywhere, HyperMX gathers some information relevant to the bid request. That info can be seen in the following request to <strong>https://marketplace-android-*.hyprmx.com/trackings/offerImpressionAttempt</strong> (my personal information removed and replace with &#8220;[REDACTED]&#8221;)</p>



<pre class="wp-block-code"><code>{
	"placement_id": 32910,
	"offer_identifier": "&#91;REDACTED]",
	"offer_type": "web_traffic",
	"distributor_id": "1000214203",
	"uid": "&#91;REDACTED]",
	"msdkv": 316,
	"sdk_version": "6.0.1",
	"device_os_version": "8.1.0",
	"device_type": "android",
	"device_model": "&#91;REDACTED]",
	"device_fingerprint": "&#91;REDACTED]",
	"device_manufacturer": "&#91;REDACTED]",
	"device_brand": "&#91;REDACTED]",
	"device": "&#91;REDACTED]",
	"device_product": "&#91;REDACTED]",
	"device_width": 720,
	"device_height": 1404,
	"pxratio": 2,
	"connection_type": "WIFI",
	"bundle_id": "com.kiloo.subwaysurf",
	"bundle_version": "2.31.0",
	"cleartext_traffic_permitted": true,
	"target_sdk_version": 30,
	"permissions": &#91;"android.permission.WAKE_LOCK", "android.permission.INTERNET", "android.permission.ACCESS_NETWORK_STATE", "com.google.android.c2dm.permission.RECEIVE", "com.google.android.finsky.permission.BIND_GET_INSTALL_REFERRER_SERVICE", "com.google.android.gms.permission.AD_ID", "com.kiloo.subwaysurf.permission.C2D_MESSAGE", "android.permission.ACCESS_WIFI_STATE", "android.permission.RECEIVE_BOOT_COMPLETED", "android.permission.FOREGROUND_SERVICE", "com.android.vending.BILLING", "com.android.vending.CHECK_LICENSE", "android.permission.VIBRATE", "BIND_GET_INSTALL_REFERRER_SERVICE"],
	"user_permissions": {
		"camera_permission": "denied",
		"calendar_permission": "denied",
		"microphone_permission": "denied"
	},
	"gaid": "&#91;REDACTED]",
	"ad_id_opted_out": false,
	"persistent_id": "&#91;REDACTED]",
	"mobile_js_version": 120
}</code></pre>



<p>In this case, Jun Group has an interested client looking for &#8220;web_traffic&#8221;, and we get a correlated response from the <strong>https://marketplace-android-*.hyprmx.com/embedded_offers/player</strong> endpoint:</p>



<pre class="wp-block-code"><code>{
	"offer_skin_path": "boomerang_popup_explore",
	"tracking_view_html": "",
	"tracking_impression_html": "",
	"quarter_1_tracking_html": "",
	"quarter_2_tracking_html": "",
	"quarter_3_tracking_html": "",
	"quarter_4_tracking_html": "",
	"ivt_tracking_html": "https://pixel.adsafeprotected.com/jload?anId=929070\u0026advId=JunGroup\u0026campId=\u0026pubId=1000214203\u0026chanId=\u0026placementId=175484\u0026adsafe_par\u0026uId=&#91;REDACTED]\u0026impID=",
	"viewing_id": "&#91;REDACTED]",
	"token": "&#91;REDACTED]",
	"non_closable_vast": false,
	"svc_clickthrough": false,
	"is_mraid": "false",
	"enable_custom_webview": true,
	"skip_thank_you": true,
	"browser_family": "Chrome",
	"impression_attempt_complete": "",
	"third_party_tracking_provider": "IAS",
	"page_load_timeout": 8,
	"urls": &#91;"https://younghollywood.com/videos/tvfilm/up-close/high-school-musical-the-musical--the-series-cast-play-truth-or-dare.html?utm_source=jun\u0026utm_medium=cpc\u0026utm_campaign=JUN15\u0026wtu_id_h=&#91;REDACTED]", "https://younghollywood.com/videos/lifestyle/star-secrets/how-to-get-into-the-supercars-club-arabia.html?utm_source=jun\u0026utm_medium=cpc\u0026utm_campaign=JUN15\u0026wtu_id_h=&#91;REDACTED]"],
	"page_load_js": {
		"js": &#91;"function ias() {var po = document.createElement(\"script\"); po.type = \"text/javascript\"; po.async = true;po.src = \"https://pixel.adsafeprotected.com/jload?anId=929070\u0026advId=JunGroup\u0026campId=\u0026pubId=1000214203\u0026chanId=\u0026placementId=175484\u0026adsafe_par\u0026uId=8780771519\u0026impID=\";var s = document.getElementsByTagName(\"script\")&#91;0]; s.parentNode.insertBefore(po, s);};ias();", ""],
		"map": {
			"0": 0,
			"1": 1
		}
	},
	"visit_length": 15,
	"maximum_page_load_wait_time_in_seconds": 4,
	"webtraffic_proscenium_delay": 0.0,
	"is_boomeo_web_start": "false",
	"short_first_step": false,
	"is_user_choice": "false",
	"short_step_length": 5,
	"hide_referrer_url": "true",
	"reward_id": 0,
	"reward_quantity": 1,
	"reward_text": "1 reward",
	"reward_token": "&#91;REDACTED]",
	"reward_cost": 1.0e-05,
	"bid": "0.009428",
	"max_bid": "0.00952",
	"bid_throttle": 15.0,
	"step_count": 2,
	"reward_timestamp": "1648504312",
	"open_measurement": {
		"partner_name": "Jungroup",
		"client_version": "1.3.15-iab2507",
		"api_version": "android-6.0.1-316"
	},
	"player_application_origin": "https://marketplace-android-b316.hyprmx.com",
	"cec_url": "https://vast-proxy.hyprmx.com/client_error_captures",
	"redirection_url": "https://static.hyprmx.com/static_skins/boomerang_popup_explore/index.html?device_type=android\u0026distributor_id=1000214203\u0026msdkv=316\u0026offer=web_traffic-a22d044466101ca5d63207773802226e\u0026placement_id=32910\u0026trampoline=&#91;REDACTED]",
	"uid": "&#91;REDACTED]",
	"distributor_id": "1000214203",
	"offer": "web_traffic-a22d044466101ca5d63207773802226e",
	"msdkv": 316,
	"device_type": "android",
	"placement_id": 32910
}</code></pre>



<p>There are some interesting things we can see here:</p>



<ul><li>This placement is being tracked by IAS, their code plainly viewable in the &#8220;ivt_tracking_html&#8221; field</li><li>The urls we will be sent to are in an array called &#8220;urls&#8221;<ul><li><code>https://younghollywood.com/videos/tvfilm/up-close/high-school-musical-the-musical--the-series-cast-play-truth-or-dare.html?<strong>utm_source=jun</strong>\u0026utm_medium=cpc\u0026<strong>utm_campaign=JUN15</strong>\u0026wtu_id_h=[REDACTED]</code></li><li><code>https://younghollywood.com/videos/lifestyle/star-secrets/how-to-get-into-the-supercars-club-arabia.html?<strong>utm_source=jun</strong>\u0026utm_medium=cpc\u0026<strong>utm_campaign=JUN15</strong>\u0026wtu_id_h=[REDACTED]</code></li></ul></li><li>For each page, the &#8220;visit_length&#8221; will be 15 seconds</li><li>&#8220;hide_referrer_url&#8221; is set to &#8220;true&#8221;; one can imagine what that might signal</li></ul>



<p>Next, the first page loads as you can see in the video. There are many bid requests that go out, from many major SSPs and exchanges, and none of them label the inventory as rewarded or otherwise originating from within an app. </p>



<p>If you are a DSP or SSP, and you&#8217;re interested in learning how to identify this traffic within your logs, please do reach out. We can help you enact filters for such traffic using the data you have available.</p>



<h3 class="wp-block-heading" id="12-audio-ad-supported-podcast">Audio-Ad Supported Podcast</h3>



<p>Loading a full 30+ minute podcast in a rewarded ad placement seems even more egregious than loading a web page. At least with the web page, it&#8217;s conceivable a user might accidentally scroll through it in 15 seconds. </p>



<p>In the case of a podcast, the user has no real hope of engaging with the content in 15 seconds; it seems rather blatantly geared towards creating advertising events. Do they really expect people to stop playing a game, and listen to a 30 minute podcast???</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Subway Surfer   Jun Podcast Imps 4" width="500" height="281" src="https://www.youtube.com/embed/vVOXsCzkScM?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div></figure>



<p>In this example, we see the following request to endpoint <strong>https://live.hyprmx.com/trackings/offerImpressionAttempt</strong> (now using an iPhone):</p>



<pre class="wp-block-code"><code>{
	"placement_id": 34115,
	"offer_identifier": "&#91;REDACTED]",
	"offer_type": "web_traffic",
	"sdk_version": "2.36.0",
	"bundle_id": "com.kiloo.subwaysurfers",
	"ad_id_opted_out": true,
	"msdkv": 225,
	"supported_interface_settings": &#91;"UIInterfaceOrientationPortrait"],
	"pxratio": 2,
	"device_os_version": "15.5",
	"ats_settings": {
		"NSAllowsArbitraryLoads": true,
		"NSAllowsLocalNetworking": true,
		"NSAllowsArbitraryLoadsInWebContent": true,
		"NSAllowsArbitraryLoadsForMedia": true
	},
	"screen_traits": {
		"horizontalSizeClass": "compact",
		"verticalSizeClass": "regular",
		"userInterfaceLayoutDirection": "LTR",
		"accessibilityContrast": "normal",
		"userInterfaceIdiom": "iPhone",
		"userInterfaceStyle": "dark",
		"userInterfaceLevel": "base",
		"displayScale": 2
	},
	"connection_type": "WIFI",
	"maccatalyst": false,
	"identifier_for_vendor": "&#91;REDACTED]",
	"ios_app_on_mac": false,
	"device_width": 750,
	"carrier_data": {
		"0000000100000002": {
			"allows_voip": true
		},
		"0000000100000001": {
			"allows_voip": true,
			"mobile_network_code": "260",
			"mobile_country_code": "310",
			"carrier_name": "T-Mobile",
			"cellular_radio_type": "CTRadioAccessTechnologyLTE"
		}
	},
	"xcode_version": "13A1030d",
	"supports_multiple_scenes": false,
	"uid": "&#91;REDACTED]",
	"hypr_modules": {
		"HYPRPermissions": "5"
	},
	"permissions": &#91;"NSCameraUsageDescription", "NSUserTrackingUsageDescription", "NSMotionUsageDescription", "NSPhotoLibraryAddUsageDescription", "NSCalendarsUsageDescription", "NSPhotoLibraryUsageDescription"],
	"distributor_id": "1000214202",
	"device_model": "iPhone12,8",
	"user_permissions": {
		"microphone_permission": "not_determined",
		"calendar_permission": "not_determined",
		"camera_permission": "not_determined"
	},
	"bundle_version": "2.36.0",
	"framework_type": "core_framework",
	"device_type": "iPhone",
	"persistent_id": "00000000-0000-0000-0000-000000000000",
	"device_height": 1334,
	"mobile_js_version": "137"
}</code></pre>



<p>We get the following correlated response from the <strong>https://live.hyprmx.com/embedded_offers/player </strong>endpoint:</p>



<pre class="wp-block-code"><code>{
	"offer_skin_path": "boomerang_popup_explore",
	"tracking_view_html": "",
	"tracking_impression_html": "",
	"quarter_1_tracking_html": "",
	"quarter_2_tracking_html": "",
	"quarter_3_tracking_html": "",
	"quarter_4_tracking_html": "",
	"ivt_tracking_html": "https://pixel.adsafeprotected.com/jload?anId=929070\u0026advId=JunGroup\u0026campId=\u0026pubId=1000214202\u0026chanId=\u0026placementId=191942\u0026adsafe_par\u0026uId=&#91;REDACTED]\u0026impID=",
	"viewing_id": "9249948818",
	"token": "&#91;REDACTED]",
	"coppa": 1,
	"non_closable_vast": false,
	"svc_clickthrough": false,
	"is_mraid": "false",
	"enable_custom_webview": true,
	"skip_thank_you": true,
	"browser_family": "Safari",
	"impression_attempt_complete": "",
	"third_party_tracking_provider": "IAS",
	"page_load_timeout": 8,
	"urls": &#91;"https://www.iheart.com/podcast/256-bloomberg-surveillance-30972795/episode/surveillance-market-timing-with-bitterly-podcast-98773848/?embed=true\u0026sc=widget\u0026pname=JunGroup\u0026campid=Bloomberg\u0026keyid=PageView\u0026cid=1000214202\u0026wtu_id_h=&#91;REDACTED]", "https://www.iheart.com/podcast/256-bloomberg-surveillance-30972795/episode/surveillance-recession-chances-with-hatzius-podcast-98692521/?embed=true\u0026sc=widget\u0026pname=JunGroup\u0026campid=Bloomberg\u0026keyid=PageView\u0026cid=1000214202\u0026wtu_id_h=&#91;REDACTED]"],
	"page_load_js": {
		"js": &#91;"var _jgPlayButton=undefined;var jgPlayButton=function(){var t=document.querySelectorAll(\"button\");if(typeof _jgPlayButton===\"undefined\"){for(var e=0;e\u003ct.length;e++){if(typeof _jgPlayButton===\"undefined\"\u0026\u0026t&#91;e].dataset&#91;\"test\"]===\"play-button\"){_jgPlayButton=t&#91;e];break}}}return _jgPlayButton};var jgIsPodcastPlaying=function(){var t=jgPlayButton();return typeof t!==\"undefined\"\u0026\u0026(t.getAttribute(\"aria-label\")===\"Pause\"||&#91;\"playing\",\"buffering\"].includes(t.dataset&#91;\"testState\"]))};var jgPlayVideo=function(){if(jgIsPodcastPlaying()){clearInterval(jgInterval)}else{var t=jgPlayButton();if(typeof t!==\"undefined\"){t.click()}}};var jgInterval=setInterval(jgPlayVideo,500);function ias() {var po = document.createElement(\"script\"); po.type = \"text/javascript\"; po.async = true;po.src = \"https://pixel.adsafeprotected.com/jload?anId=929070\u0026advId=JunGroup\u0026campId=\u0026pubId=1000214202\u0026chanId=\u0026placementId=191942\u0026adsafe_par\u0026uId=9249948818\u0026impID=\";var s = document.getElementsByTagName(\"script\")&#91;0]; s.parentNode.insertBefore(po, s);};ias();", "var _jgPlayButton=undefined;var jgPlayButton=function(){var t=document.querySelectorAll(\"button\");if(typeof _jgPlayButton===\"undefined\"){for(var e=0;e\u003ct.length;e++){if(typeof _jgPlayButton===\"undefined\"\u0026\u0026t&#91;e].dataset&#91;\"test\"]===\"play-button\"){_jgPlayButton=t&#91;e];break}}}return _jgPlayButton};var jgIsPodcastPlaying=function(){var t=jgPlayButton();return typeof t!==\"undefined\"\u0026\u0026(t.getAttribute(\"aria-label\")===\"Pause\"||&#91;\"playing\",\"buffering\"].includes(t.dataset&#91;\"testState\"]))};var jgPlayVideo=function(){if(jgIsPodcastPlaying()){clearInterval(jgInterval)}else{var t=jgPlayButton();if(typeof t!==\"undefined\"){t.click()}}};var jgInterval=setInterval(jgPlayVideo,500);"],
		"map": {
			"0": 0,
			"1": 1
		}
	},
	"visit_length": 10,
	"maximum_page_load_wait_time_in_seconds": "4",
	"webtraffic_proscenium_delay": 0.0,
	"is_boomeo_web_start": "false",
	"short_first_step": false,
	"is_user_choice": "false",
	"short_step_length": 5,
	"hide_referrer_url": "true",
	"reward_id": 0,
	"reward_quantity": 1,
	"reward_text": "1 reward",
	"reward_token": "&#91;REDACTED]",
	"reward_cost": 1.0e-05,
	"bid": "0.013518",
	"max_bid": 0.0153,
	"bid_throttle": 1.0,
	"step_count": 2,
	"reward_timestamp": "1656627095",
	"open_measurement": {
		"partner_name": "Jungroup",
		"client_version": "1.3.15-iab2507",
		"api_version": "ios-2.36.0-225"
	},
	"player_application_origin": "https://live.hyprmx.com",
	"cec_url": "https://vast-proxy.hyprmx.com/client_error_captures",
	"redirection_url": "https://static.hyprmx.com/static_skins/boomerang_popup_explore/index.html?device_type=iPhone\u0026distributor_id=1000214202\u0026msdkv=225\u0026offer=web_traffic-d0b3a41964ce379a7b6fd732749d4584\u0026placement_id=34115\u0026trampoline=&#91;REDACTED]",
	"uid": "&#91;REDACTED]",
	"distributor_id": "1000214202",
	"offer": "web_traffic-d0b3a41964ce379a7b6fd732749d4584",
	"msdkv": 225,
	"device_type": "iPhone",
	"placement_id": 34115
}</code></pre>



<p>This time we can see the &#8220;urls&#8221; array contains podcast embeds from iHeart media.</p>



<ul><li><meta http-equiv="content-type" content="text/html; charset=utf-8"><code>https://www.iheart.com/podcast/256-bloomberg-surveillance-30972795/episode/surveillance-market-timing-with-bitterly-podcast-98773848/?embed=true\u0026sc=widget\u0026<strong>pname=JunGroup</strong>\u0026campid=Bloomberg\u0026keyid=PageView\u0026cid=1000214202\u0026wtu_id_h=[REDACTED]</code></li><li><code>https://www.iheart.com/podcast/256-bloomberg-surveillance-30972795/episode/surveillance-recession-chances-with-hatzius-podcast-98692521/?embed=true\u0026sc=widget\u0026<strong>pname=JunGroup</strong>\u0026campid=Bloomberg\u0026keyid=PageView\u0026cid=1000214202\u0026wtu_id_h=[REDACTED]</code></li></ul>



<p>We can additionally see tracking tech from IAS applied to this transaction.</p>



<p>The experience is plainly visible in the video; the podcast auto plays, and the user is dropped immediately into an advertisement. Audio ads in this case are by Triton Digital, recently acquired by iHeart media.</p>



<h2 class="wp-block-heading" id="13-conclusion">Conclusion</h2>



<p>Rewarded traffic exploits the lack of MRC standards around traffic sourcing, and the lack of policing around Rewarded ad formats in general. As an industry, we need to decide if we want to allow this precedent that an <strong>entire ad-monetized webpage</strong> is a valid creative. It would be insane to attempt that in any other environment outside of incentivized traffic marketplaces, so why is it accepted for rewarded ads?</p>



<p>Though it may be hard to identify, we have developed several ways to flag such traffic. We had to take multiple approaches given the differences in what certain supply chain participants are able to see in the bidstream.</p>



<p>Are we totally off base? Do you have a success story you&#8217;d like to share from your rewarded traffic campaign? A spectacular failure? Reach out to share it with us on <a href="https://www.linkedin.com/company/deepseeio" target="_blank" rel="noopener">Linkedin</a> or <a href="https://mobile.twitter.com/deepsee_io" target="_blank" rel="noopener">Twitter</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)</title>
		<link>https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Wed, 25 May 2022 16:11:08 +0000</pubDate>
				<category><![CDATA[Research & Development]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1150</guid>

					<description><![CDATA[Introduction In this post, we’ll examine if a Tranco List without Alexa Rank data would be suitable for marketers looking for a new source of site performance data. Would they find the switch over to be jarring? Would the sites they saw on Alexa Rank lists still be present? If so, what sort of consistency &#8230; <a href="https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may">Continued</a>]]></description>
										<content:encoded><![CDATA[<div class="ub_table-of-contents" data-showtext="show" data-hidetext="hide" data-scrolltype="auto" id="ub_table-of-contents-802b9af0-a073-4327-83d1-46316b7195f0" data-initiallyhideonmobile="false"
                    data-initiallyshow="true"><div class="ub_table-of-contents-extra-container"><div class="ub_table-of-contents-container ub_table-of-contents-1-column "><ul><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#0-introduction>Introduction</a><ul><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#1-for-marketers-unconcerned-with-the-technical-findings>For Marketers Unconcerned With the Technical Findings</a></li></ul></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#2-saying-goodbye-to-alexa-ranks>Saying Goodbye to Alexa Ranks</a></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#3-what-is-the-tranco-list-and-why-do-we-use-it>What is the Tranco List, and Why do we Use It?</a></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#4-list-generation-methodology>List Generation Methodology</a></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#5-results-of-our-analysis>Results of Our Analysis</a><ul><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#6-huge-portions-of-the-list-are-sourced-exclusively-from-alexa-data>Huge Portions of the List Are Sourced Exclusively from Alexa Data</a><ul><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#7-interesting-exceptions>Interesting Exceptions</a><ul><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#8-domains-with-backlinks-from-gt10-unique-root-domains>Domains with backlinks from &gt;=10 unique root domains</a></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#9-sites-loading-the-google-publisher-tag-gpt-script>Sites Loading the Google Publisher Tag (GPT) Script</a></li></ul></li></ul></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#10-the-same-sites-are-ranked-quite-differently-between-alexa-and-twa>The Same Sites Are Ranked Quite Differently Between Alexa and TWA</a><ul><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#11-quantifying-rank-bucket-similarity>Quantifying Rank Bucket Similarity</a></li></ul></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#12-analyzing-relative-rank-preservation-with-an-analysis-of-site-pairs>Analyzing Relative Rank Preservation with an Analysis of Site Pairs</a></li></ul></li><li><a href=https://deepsee.io/blog/we-bid-goodbye-to-alexa-rankings-and-measure-its-contribution-to-the-tranco-list-pre-may#13-conclusions-amp-takeaways>Conclusions &amp; Takeaways</a></li></ul></div></div></div>


<h2 class="wp-block-heading" id="0-introduction">Introduction</h2>



<p>In this post, we’ll examine if a Tranco List without Alexa Rank data would be suitable for marketers looking for a new source of site performance data.</p>



<p>Would they find the switch over to be jarring? Would the sites they saw on Alexa Rank lists still be present? If so, what sort of consistency could they expect with respect to how sites rank relative to each other?</p>



<p>We’ll also examine if there are alternative data sources that can fill in the gaps between Alexa and Tranco.</p>



<h3 class="wp-block-heading" id="1-for-marketers-unconcerned-with-the-technical-findings">For Marketers Unconcerned With the Technical Findings</h3>



<p>If the technical &amp; academic research doesn&#8217;t appeal to you, you can head to the <a href="#13-conclusions-amp-takeaways" data-type="internal" data-id="#13-conclusions-amp-takeaways">Conclusions </a>section, where there are open questions you can help contribute answers to!</p>



<h2 class="wp-block-heading" id="2-saying-goodbye-to-alexa-ranks">Saying Goodbye to Alexa Ranks</h2>



<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="133" src="https://deepsee.io/wp-content/uploads/2022/05/image-1-1024x133.png" alt="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)" class="wp-image-1156" srcset="https://deepsee.io/wp-content/uploads/2022/05/image-1-1024x133.png 1024w, https://deepsee.io/wp-content/uploads/2022/05/image-1-300x39.png 300w, https://deepsee.io/wp-content/uploads/2022/05/image-1-768x100.png 768w, https://deepsee.io/wp-content/uploads/2022/05/image-1.png 1407w" sizes="(max-width: 1024px) 100vw, 1024px" title="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)"></figure>



<p>On May 1st, Alexa.com retired its API and web lookup tools. Most importantly, it stopped providing Alexa Rank reports, which are commonly used as a measure of website popularity. </p>



<p>The methodology for creating that rank is described at a high level <a href="https://web.archive.org/web/20220309055237/https://blog.alexa.com/marketing-research/alexa-rank/" target="_blank" rel="noopener">on the Alexa.com blog</a>, which is no longer up (a cached copy is linked):</p>



<blockquote class="wp-block-quote">
<p>&#8220;<em>Alexa rank is calculated using a proprietary methodology that combines a site’s estimated traffic and visitor engagement over the past three months. Traffic and engagement are estimated from the browsing behavior of people in our global panel, which is a sample of all Internet users.&#8221;</em></p>
<cite>-&#8220;What is <strong>Alexa Rank</strong>?&#8221; from the Alexa.com blog</cite></blockquote>



<p>For many years, researchers and marketers from a variety of disciplines relied on Alexa.com&#8217;s site rank data to inform them of trends, and to identify opportunities. Its recommended uses include:</p>



<ul>
<li>Evaluating a site’s commercial potential</li>



<li>Checking to see if a site’s traffic is rising or falling</li>



<li>Finding potential affiliates</li>
</ul>



<p>You should consider checking out <a href="https://www.sitesell.com/blog/alexa-reviews-basics-myths" target="_blank" rel="noopener">this post (&#8220;7 Ways to Use Alexa Rankings to Grow Your Business&#8221;</a>) for more insight into how marketers might have used this data to grow their businesses.</p>



<p>Fraud researchers have an additional concern: to prepare for the next exploit, it&#8217;s very important for us to understand what sites people actually visit. </p>



<h2 class="wp-block-heading" id="3-what-is-the-tranco-list-and-why-do-we-use-it">What is the Tranco List, and Why do we Use It?</h2>



<p>At deepsee.io, we profile the behaviors of websites, and the relations they have with each other, but we can&#8217;t estimate how many people visit a site from this experiential approach. For that reason, one of the few external data sources we consume is <a href="https://tranco-list.eu/" target="_blank" rel="noopener">the Tranco List</a>, &#8220;A Research-Oriented Top Sites Ranking Hardened Against Manipulation&#8221;. </p>



<p>It combines several ranking sources in order to create a single list that gives researchers a wider view of the websites people visit.</p>



<p>Prior to May 1st, 2022, the list was composed of the following 3 data sources:</p>



<ul>
<li><a href="https://blog.alexa.com/marketing-research/alexa-rank/" target="_blank" rel="noopener">Alexa.com:</a> based on page visits reported by a user panel and on a tracking script</li>



<li><a href="https://majestic.com/reports/majestic-million" target="_blank" rel="noopener">Majestic Million</a>: a link-based ranking system</li>



<li><a href="http://s3-us-west-1.amazonaws.com/umbrella-static/index.html" target="_blank" rel="noopener">Cisco Umbrella</a>: based on the number of IPs requesting a certain domain, sourced from Cisco&#8217;s database of DNS traffic</li>
</ul>



<h2 class="wp-block-heading" id="4-list-generation-methodology">List Generation Methodology</h2>



<p>We compared three lists that were generated using the Tranco API on April 5th 2022. This API allows users to choose the data sources they incorporate, which was perfect for the purposes of our analysis. We are providing links to the analysis datasets for transparency:</p>



<ul>
<li>A &#8220;normal&#8221; Tranco list, composed of data from all three sources: <a href="https://tranco-list.eu/list/K2W2W/full" target="_blank" rel="noopener">https://tranco-list.eu/list/K2W2W/full</a></li>



<li>A list that included Majestic Million &#038; Cisco Umbrella data sources, and excluded Alexa.com data: <a href="https://tranco-list.eu/list/25N59/full" target="_blank" rel="noopener">https://tranco-list.eu/list/25N59/full</a>
<ul>
<li><strong>Referred to as &#8220;Tranco Without Alexa,&#8221; or TWA, in the context of this research</strong></li>
</ul>
</li>



<li>A list composed strictly of Alexa.com data: <a href="https://tranco-list.eu/list/5YXYN/full" target="_blank" rel="noopener">https://tranco-list.eu/list/5YXYN/full</a></li>
</ul>



<p><strong>Note</strong>: Tranco recently announced that Domaintools would be allowing them to <a href="https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all" target="_blank" rel="noopener">incorporate their Farsight ranking system into the Tranco List</a>. This means that there won&#8217;t be a &#8220;Tranco Without Alexa&#8221; list exactly like we use for forecasting purposes in this article. </p>



<p>The Tranco List will include Alexa.com data as available (once unavailable, it will decay off their list over 30 days), and is now additionally informed by Farsight passive DNS data. Still, we see the analysis of the Tranco List&#8217;s makeup pre-May as informative to those trying to understand the scope and scale of various website ranking systems. We will conduct additional analysis of the list that includes Farsight data once a sufficient sample of data is available.</p>



<h2 class="wp-block-heading" id="5-results-of-our-analysis">Results of Our Analysis</h2>



<h3 class="wp-block-heading" id="6-huge-portions-of-the-list-are-sourced-exclusively-from-alexa-data">Huge Portions of the List Are Sourced Exclusively from Alexa Data</h3>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="1000" height="696" src="https://deepsee.io/wp-content/uploads/2022/05/pie1@1x.jpg" alt="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)" class="wp-image-1260" srcset="https://deepsee.io/wp-content/uploads/2022/05/pie1@1x.jpg 1000w, https://deepsee.io/wp-content/uploads/2022/05/pie1@1x-300x209.jpg 300w, https://deepsee.io/wp-content/uploads/2022/05/pie1@1x-768x535.jpg 768w" sizes="(max-width: 1000px) 100vw, 1000px" title="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)"></figure>



<p>Of the 7,552,154 unique domains on the &#8220;normal&#8221; Tranco list, nearly 81% of them appeared on the Alexa list, and were not otherwise present on either the Cisco Umbrella or Majestic Million lists. </p>



<p>Even though each list is &lt;=1,000,000 items daily, over 30 days we saw many more unique entries on the Alexa list. This suggests that it&#8217;s more volatile than the Cisco Umbrella &amp; Majestic Million lists, but also gives us a wider view of the web&#8217;s surface.</p>



<h4 class="wp-block-heading" id="7-interesting-exceptions">Interesting Exceptions</h4>



<p>We also repeated this analysis for various subsets of our data, and found some interesting exceptions:</p>



<h5 class="wp-block-heading" id="8-domains-with-backlinks-from-gt10-unique-root-domains">Domains with backlinks from &gt;=10 unique root domains</h5>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="1000" height="696" src="https://deepsee.io/wp-content/uploads/2022/05/pie2@1x.jpg" alt="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)" class="wp-image-1262" srcset="https://deepsee.io/wp-content/uploads/2022/05/pie2@1x.jpg 1000w, https://deepsee.io/wp-content/uploads/2022/05/pie2@1x-300x209.jpg 300w, https://deepsee.io/wp-content/uploads/2022/05/pie2@1x-768x535.jpg 768w" sizes="(max-width: 1000px) 100vw, 1000px" title="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)"></figure>



<p>In contrast to the list at large, this subset of 375,277 domains is much more likely to be found across multiple data sources. Given that the Majestic Million list is primarily based on backlinks, this result could be expected.</p>



<h5 class="wp-block-heading" id="9-sites-loading-the-google-publisher-tag-gpt-script">Sites Loading the Google Publisher Tag (GPT) Script</h5>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="1000" height="696" src="https://deepsee.io/wp-content/uploads/2022/05/pie3@1x-1.jpg" alt="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)" class="wp-image-1263" srcset="https://deepsee.io/wp-content/uploads/2022/05/pie3@1x-1.jpg 1000w, https://deepsee.io/wp-content/uploads/2022/05/pie3@1x-1-300x209.jpg 300w, https://deepsee.io/wp-content/uploads/2022/05/pie3@1x-1-768x535.jpg 768w" sizes="(max-width: 1000px) 100vw, 1000px" title="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)"></figure>



<p>Of 118,132 unique domains we crawled which loaded the GPT.js script, the majority of them are sourced from multiple lists. The presence of this script signals intent to display advertisements.</p>



<h3 class="wp-block-heading" id="10-the-same-sites-are-ranked-quite-differently-between-alexa-and-twa">The Same Sites Are Ranked Quite Differently Between Alexa and TWA</h3>



<p>~96% of the Alexa top 10,000 would be present somewhere within the Tranco file without Alexa, however, they are scattered far and wide (not concentrated at the top). To find 9,500 of the top 10k ranked sites from Alexa.com data, you’d have to expand your search to the top 1,330,000 domains in the TWA list.</p>



<p>The rank reshuffling is further demonstrated by the following chart, which shows how much ranks have changed for sites that exist on both the Alexa &amp; TWA top million lists.</p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="1000" height="711" src="https://deepsee.io/wp-content/uploads/2022/05/difference-2@1x.jpg" alt="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)" class="wp-image-1200" srcset="https://deepsee.io/wp-content/uploads/2022/05/difference-2@1x.jpg 1000w, https://deepsee.io/wp-content/uploads/2022/05/difference-2@1x-300x213.jpg 300w, https://deepsee.io/wp-content/uploads/2022/05/difference-2@1x-768x546.jpg 768w, https://deepsee.io/wp-content/uploads/2022/05/difference-2@1x-280x200.jpg 280w" sizes="(max-width: 1000px) 100vw, 1000px" title="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)"></figure>



<p>For sites that appear on <strong>both </strong>the Alexa top million and the TWA top million, ~48% of sites saw their rank move 200k or more.<br>For sites that appear on <strong>both </strong>the Alexa top million and the TWA top million, ~68% of sites saw their rank move 100k or more.</p>



<h4 class="wp-block-heading" id="11-quantifying-rank-bucket-similarity">Quantifying Rank Bucket Similarity</h4>



<figure class="wp-block-image size-full is-resized"><img decoding="async" loading="lazy" src="https://deepsee.io/wp-content/uploads/2022/05/top-n-sites@1x.jpg" alt="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)" class="wp-image-1198" width="840" height="584" srcset="https://deepsee.io/wp-content/uploads/2022/05/top-n-sites@1x.jpg 1000w, https://deepsee.io/wp-content/uploads/2022/05/top-n-sites@1x-300x209.jpg 300w, https://deepsee.io/wp-content/uploads/2022/05/top-n-sites@1x-768x535.jpg 768w" sizes="(max-width: 840px) 100vw, 840px" title="We Bid Goodbye to Alexa Rankings, and Measure Its Contribution to the Tranco List (Pre-May)"><figcaption class="wp-element-caption">The above table shows a comparison of the Alexa top N vs the TWA top N</figcaption></figure>



<p>This section describes the differences in the makeup of top &#8220;N&#8221; site lists (top 10k, top 100k, etc&#8230;). This is a common way that media buyers consider the popularity of sites.</p>



<p>This tells us that only 32% of the Alexa top 10k are in the TWA top 10k, and that number holds steady all the way up to a million.</p>



<p>Clearly, the rank thresholds aren’t transferable; someone targeting the top 10k Alexa rank sites couldn’t just apply the same rank threshold if they were using a tranco list without Alexa.</p>



<h3 class="wp-block-heading" id="12-analyzing-relative-rank-preservation-with-an-analysis-of-site-pairs">Analyzing Relative Rank Preservation with an Analysis of Site Pairs</h3>



<p>Another way to compare two different sets of rankings is to analyze how sites compare to <strong>each other</strong> across the various lists being compared. </p>



<p>Consider three sites: A, B, and C. <br>The Alexa ranks for those sites are:</p>



<ul>
<li>A: 100</li>



<li>B: 250</li>



<li>C: 1,000</li>
</ul>



<p>The TWA ranks for those sites are:</p>



<ul>
<li>A: 50</li>



<li>B: 100</li>



<li>C: 10,000</li>
</ul>



<p>For both lists, A&gt;B&gt;C, so they are relatively 100% similar with respect to these 3 sites.</p>



<p>We can determine this by looking at all unique pairs that can be constructed from those 3 sites:</p>



<ul>
<li>A:B &#8211; Agreed
<ul>
<li>Alexa: A (100) is ranked better than B (250)</li>



<li>TWA: A(50) is ranked better than B (100)</li>
</ul>
</li>



<li>B:C &#8211; Agreed
<ul>
<li>Alexa: B (100) is ranked better than B (250)</li>



<li>TWA: B (100) is ranked better than C (10,000)</li>
</ul>
</li>



<li>A:C &#8211; Agreed
<ul>
<li>Alexa: A (100) is ranked better than C (1,000)</li>



<li>TWA: A (50) is ranked better than C (10,000)</li>
</ul>
</li>
</ul>



<p>3/3 pairs agree, so pairwise similarity is 100%</p>



<p>Now, Imagine that we change the TWA rankings for those sites to the following:</p>



<ul>
<li>A: 50</li>



<li>B: 100</li>



<li><strong>C: 75</strong></li>
</ul>



<p>The comparison of all unique pairs now changes:</p>



<ul>
<li>A:B &#8211; Agreed
<ul>
<li>Alexa: A (100) is ranked better than B (250)</li>



<li>TWA: A(50) is ranked better than B (100)</li>
</ul>
</li>



<li>B:C &#8211; <strong>Disagreed</strong>
<ul>
<li>Alexa: B (100) is ranked better than B (250)</li>



<li>TWA: B (100) is ranked <strong>worse</strong> than C (75)</li>
</ul>
</li>



<li>A:C &#8211; Agreed
<ul>
<li>Alexa: A (100) is ranked better than C (1,000)</li>



<li>TWA: A (50) is ranked better than C (75)</li>
</ul>
</li>
</ul>



<p>This time, only 2/3 pairs agree, so pairwise similarity is 66.6%</p>



<p>This is the basic logic that fuels the following data points:</p>



<ul>
<li><strong>Of pairs of sites that appear on both the Alexa top 10k and the TWA top 10k, ~61% preserve the same rank order</strong>
<ul>
<li>3,177 sites were on both lists, and each was compared to the other as a pair.
<ul>
<li>Using the formula <code>N(N-1)/2</code> to find the number of unique pairs analyzed, we find that 10,093,329 pairs were analyzed.</li>
</ul>
</li>
</ul>
</li>



<li><strong>The same can be said for sites on both the Alexa top 100k and the TWA top 100k; ~61% preserve the same rank order</strong>
<ul>
<li>32,558 sites were on both lists, and each was compared to the other as a pair.
<ul>
<li>Using the formula <code>N(N-1)/2</code> to find the number of unique pairs analyzed, we find that 529,995,403 pairs were analyzed.</li>
</ul>
</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading" id="13-conclusions-amp-takeaways">Conclusions &amp; Takeaways</h2>



<p>The most applicable conclusions affecting marketers:</p>



<ul>
<li>Someone targeting the top 10k Alexa rank sites couldn’t just apply the same rank threshold if they were using a tranco list without Alexa.</li>



<li>Alexa ranks were more volatile than the Cisco Umbrella &amp; Majestic Million ranks, but also gave us a wider view of the web&#8217;s surface.</li>



<li>For researchers to stay ahead of the next threat, a source of ranking data that accounts for site-visit metrics is likely imperative</li>
</ul>



<p>As we mentioned in the introduction, there won&#8217;t be a &#8220;Tranco Without Alexa&#8221; list exactly like we use for forecasting purposes in this article. Due to the way that Tranco is permitted to use Farsight data, it&#8217;s not possible to create custom lists including/excluding that dataset. Basically, that means we won&#8217;t be able to see exactly how well the Farsight data fills the gaps left by Alexa.com&#8217;s exit.</p>



<p>All that said, this study has highlighted to us the importance of including a data source based on page visits reported by some user panel, or tracking script. So much of what&#8217;s known about the web&#8217;s surface would be lost without the availability of such measurement systems. </p>



<p>Realistically, there are very few companies that can match the breadth of knowledge that Alexa.com had acquired. It makes sense that Amazon, whose ad business is booming, would want to keep those insights close to the chest as a competitive advantage.</p>



<p>It is generous of Domaintools to contribute their Farsight passive DNS dataset to the open-source research efforts of the Tranco team, but it has yet to be seen how DNS requests relate to site visits. This is an admitted blind spot, as the Domaintools team puts it: </p>



<blockquote class="wp-block-quote">
<p><em><strong>&#8220;[&#8230;]since this is only seeing traffic that would go to the internet, if the organization’s nameservers already have a domain cached, that request won’t be seen in this feed.&#8221;</strong></em></p>
<cite><a href="https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all" target="_blank" rel="noopener">Mirror, Mirror, on the Wall, Who’s the Fairest (website) of Them all?</a><br>Aaron Gee-Clough, Senior Data Engineer @ Domaintools</cite></blockquote>



<p>It&#8217;s an open question as to what datasets can give researchers &amp; marketers the best view into what sites are actually getting visited the most. A few top contenders come to mind:</p>



<ul>
<li><a href="https://www.similarweb.com/corp/ourdata/" target="_blank" rel="noopener">SimilarWeb</a>: data sourced from a multiplicity of useful POVs; likely to achieve most parity with Alexa.com&#8217;s methodology
<ul>
<li>Like Alexa.com, this data is partially sourced by a panel of users with their extension installed</li>



<li>Voluntary submission of data from publishers who connect their Google Analytics accounts</li>



<li>Partnerships with ISPs and DSPs</li>
</ul>
</li>



<li><a href="https://ahrefs.com/big-data" target="_blank" rel="noopener">Ahrefs</a>: data is based on clickstream data &#038; backlinks gathered from their massive crawling efforts
<ul>
<li>The organic search data (clickstream) can be extremely helpful in determining the most used sites. </li>



<li>The backlinks data may be comparable to the Majestic data, so it could be expected that this slice of their data wouldn&#8217;t contribute many new unique sites to our view of the web&#8217;s surface.</li>
</ul>
</li>



<li><a href="https://buzzsumo.com/about/" target="_blank" rel="noopener">BuzzSumo</a>: data sourced from social media engagements
<ul>
<li>Along with search, social media is a huge engine for content discovery. </li>



<li>Someone with a view of what&#8217;s being shared &amp; engaged with on social media can contribute much to the way researchers understand the web&#8217;s surface.</li>
</ul>
</li>
</ul>



<p>Obviously, companies like Google &amp; Facebook could contribute a huge amount of knowledge to this topic, but it can&#8217;t be expected that these companies would suddenly open up their historically opaque data to the public.</p>



<p>At this point we turn to you, our readers, to help us understand what you consider the most useful measure of site popularity. Is it one we listed above, or a product we hadn&#8217;t considered? What kind of methodology do they use? How has it helped you achieve your business goals?</p>



<p>If you using Alexa.com ranking data, how have your processes changed since moving to a new data source? </p>



<p>We&#8217;d love to hear all about it on <a href="https://twitter.com/deepsee_io" target="_blank" rel="noopener">Twitter</a> or <a href="https://www.linkedin.com/company/deepseeio" target="_blank" rel="noopener">LinkedIn</a>. Not about socials? E-mail us a line at hello@deepsee.io.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Large Publishers Buy Incentivized Traffic &#038; Popups Too</title>
		<link>https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Thu, 21 Apr 2022 13:04:35 +0000</pubDate>
				<category><![CDATA[Research & Development]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=1065</guid>

					<description><![CDATA[Introduction In this post, we&#8217;ll be sharing what we know about a traffic sourcing campaign that affected multiple Yahoo properties between February 2021 (at minimum; possibly earlier) and February 2022. The sites affected were: yahoo.com &#38; its subdomains, techcrunch.com, and autoblog.com. The traffic was sourced by multiple means, and it&#8217;s hard to quantify exactly how &#8230; <a href="https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too">Continued</a>]]></description>
										<content:encoded><![CDATA[<div class="ub_table-of-contents" data-showtext="show" data-hidetext="hide" data-scrolltype="auto" id="ub_table-of-contents-1aa9b264-7ede-48fb-92f9-7e83a6c2c2aa" data-initiallyhideonmobile="false"
                    data-initiallyshow="true"><div class="ub_table-of-contents-header-container"><div class="ub_table-of-contents-header">
                    <div class="ub_table-of-contents-title">Table of Contents</div></div></div><div class="ub_table-of-contents-extra-container"><div class="ub_table-of-contents-container ub_table-of-contents-1-column "><ul><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#0-introduction>Introduction</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#1-a-quick-primer-on-incentivized-traffic>A Quick Primer on Incentivized Traffic</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#2-is-incentivized-traffic-considered-valid>Is Incentivized Traffic Considered Valid?</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#3-how-did-traffic-arrive-at-yahoo-properties-and-how-can-i-identify-it>How Did Traffic Arrive at Yahoo Properties, and How Can I Identify It?</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#4-describing-the-pop-traffic-with-examples>Describing the Pop Traffic With Examples</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#5-how-to-identify-the-pop-traffic-in-logs>How to Identify the Pop Traffic in Logs</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#6-describing-the-incentivized-traffic-with-examples>Describing the Incentivized Traffic With Examples</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#7-how-to-identify-the-incentivized-traffic-in-logs>How to Identify the Incentivized Traffic In Logs</a></li><li><a href=https://deepsee.io/blog/large-publishers-buy-incentivized-traffic-too#8-conclusion-and-takeaways>Conclusion and Takeaways</a></li></ul></div></div></div>


<h2 class="wp-block-heading" id="0-introduction">Introduction</h2>



<p>In this post, we&#8217;ll be sharing what we know about a traffic sourcing campaign that affected multiple Yahoo properties between February 2021 (at minimum; possibly earlier) and February 2022. The sites affected were: yahoo.com &amp; its subdomains, techcrunch.com, and autoblog.com.</p>



<p>The traffic was sourced by multiple means, and it&#8217;s hard to quantify exactly  how much traffic came via specific paths without analyzing logs of buyers at a large scale. In this post we will describe the avenues by which we observed sourced traffic flowing, and give readers the tools to flag related impressions using their own log data.</p>



<p>If you would like to identify ANY and ALL of the invalid sourced traffic we refer to in this article, you need simply look for the presence of a “<strong>yotavid”</strong> string in the full page URLs you have relating to yahoo advertising events.<br></p>



<h2 class="wp-block-heading" id="1-a-quick-primer-on-incentivized-traffic">A Quick Primer on Incentivized Traffic</h2>



<p>Incentivized traffic, for those unaware of this term (we will be using it quite heavily in this post), comes from users who are paid to visit a certain web property. They may additionally be required to perform subsequent actions on the page to receive their rewards. Compensation usually comes in the form of points that can be redeemed for prizes, or gift cards.</p>



<p>There are many such platforms that offer users rewards for doing these kind of activities; a very popular example of such a platform is Swagbucks, owned by Prodege, who operates several other popular platforms like Swagbucks.</p>



<p>These platforms are often leveraged in order to get prospective customers to take actions like: </p>



<ul><li>Signing up for free trials (that convert to paid membership after a month)</li><li>Filling out surveys</li><li>Installing mobile / ctv apps</li></ul>



<p>Most relevant to this post, users are also sometimes <strong>encouraged to visit ad-supported publishers</strong>, and create multiple page / ad views. In such cases, these platforms open destination pages in new tabs / windows in order to create visits that appear as direct.<br></p>



<h2 class="wp-block-heading" id="2-is-incentivized-traffic-considered-valid">Is Incentivized Traffic Considered Valid?</h2>



<p>In the past, this was more of a gray area. After all, real people are the ones who participate in incentivized traffic programs, right? The MRC has <strong>no guidelines</strong> on intentionally purchased incentivized traffic.</p>



<p>The most important thing you can takeaway from this article is: <strong>human does not matter in this case.</strong> <strong>The biggest ad buying platforms in the world consider this activity invalid, and will offer make-goods on this traffic.</strong></p>



<p>We shared the data from our own internal investigations with Morning Brew in order to get help quantifying the effect on the industry at large. As part of their due diligence, they asked Google about their stance on incentivized traffic being used to fulfill advertising campaigns, and Google offered this on-the-record statement:</p>



<blockquote class="wp-block-quote"><p><em>&#8220;Google considers invalid traffic to be ad traffic that does not represent genuine user intent or interest. This includes both incentivized traffic and traffic from pop-unders. Generally speaking, invalid traffic applies to any clicks or impressions that may artificially inflate an advertiser&#8217;s costs or a publisher&#8217;s earnings.[&#8230;]&#8221;</em></p></blockquote>



<p>This is echoed in numerous places across their publisher policies, for example in the AdSense publisher policies:</p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="836" height="149" src="https://deepsee.io/wp-content/uploads/2022/04/image-3.png" alt="Large Publishers Buy Incentivized Traffic &amp; Popups Too" class="wp-image-1080" srcset="https://deepsee.io/wp-content/uploads/2022/04/image-3.png 836w, https://deepsee.io/wp-content/uploads/2022/04/image-3-300x53.png 300w, https://deepsee.io/wp-content/uploads/2022/04/image-3-768x137.png 768w" sizes="(max-width: 836px) 100vw, 836px" title="Large Publishers Buy Incentivized Traffic &amp; Popups Too"><figcaption><a href="https://support.google.com/adsense/answer/2660562?hl=en#zippy=%2Cusing-an-incentivized-traffic-source" target="_blank" rel="noopener">https://support.google.com/adsense/answer/2660562?hl=en#zippy=%2Cusing-an-incentivized-traffic-source</a></figcaption></figure>



<p>When the Morning Brew asked The Trade Desk their thoughts on incentivized &amp; pop traffic, The Trade Desk said:</p>



<blockquote class="wp-block-quote"><p><em>&#8220;We consider popunders and incentivized traffic as IVT</em>&#8220;</p></blockquote>



<p>Additionally, they said:</p>



<blockquote class="wp-block-quote"><p><em>&#8220;Since our founding, The Trade Desk has been committed to ensuring our clients, brands and their agencies, are buying the most valuable digital media available.</em></p><p><em>Our Marketplace Quality team has long focused on a transparent supply chain, including working on industrywide initiatives that have improved the entire ecosystem such as sellers.json and ads.txt. They work closely with partners and researchers to ensure that we only buy legitimate inventory and clean up the supply chain. Our work here has made fraud reports and discoveries less prominent than many years ago, and our team continues to stay at the forefront of new technologies and ways to maintain a transparent and fair marketplace.</em></p><p><em>Our publisher team focuses on building an efficient supply chain, with close relationships with publishers and their SSP partners, which ensures we are closely integrated with the biggest media companies in the world and are directly connected through them. This is even more evident in initiatives such as OpenPath, where we buy directly from the publisher themselves.</em></p><p><em>The Trade Desk has led the way on these initiatives and will continue to do so in service of a transparent, competitive and trusted market for advertising on the open internet, and both teams continue to evolve to stay at the forefront of the marketplace.&#8221;</em></p></blockquote>



<p><strong>Readers: be sure to ask your DSP their policy on incentivized traffic, and the means they have in place to detect it</strong>. You should not be paying for ads served to visitors without genuine interest in visiting whatever site they&#8217;re on.<br></p>



<h2 class="wp-block-heading" id="3-how-did-traffic-arrive-at-yahoo-properties-and-how-can-i-identify-it">How Did Traffic Arrive at Yahoo Properties, and How Can I Identify It?</h2>



<p>As mentioned in the intro, multiple means were leveraged in order to create visits to Yahoo properties. The most observable means were:</p>



<ul><li>Pop Traffic, coming from a vendor called &#8220;Viral Sparks&#8221;<ul><li>This was identified during Q4 of 2021, but may have been occurring earlier</li></ul></li><li>Incentivized Traffic, coming from Prodege properties (and perhaps others, but it hasn&#8217;t been observed directly)<ul><li>Happened as early as February 2021, likely detectable earlier as well</li></ul></li></ul>



<h2 class="wp-block-heading" id="4-describing-the-pop-traffic-with-examples"><br>Describing the Pop Traffic With Examples</h2>



<p>In previous articles, we&#8217;ve covered how pop traffic is leveraged to create &#8220;clean&#8221; impressions that take advantage of real users on real devices; take the below links as examples.</p>


<div class="ub_styled_list " id="ub_styled_list-ba7662a4-4dfc-4379-b8cb-a5b58ee0e5bb"><ul class="fa-ul"><li><a href="https://deepsee.io/blog/turning-bad-news-into-ad-views" target="_blank" rel="noreferrer noopener">Turning Bad News Into Ad Views: How Some Publishers Get Laundered Traffic from Controversial Content</a></li><li><a href="https://deepsee.io/blog/how-pop-traffic-fuels-affiliate-fraud" target="_blank" rel="noreferrer noopener">How Pop Traffic Fuels Affiliate Fraud</a></li></ul></div>


<p>In typical fashion, when visitors to porn, piracy, and other such sites interact with the page, they were sometimes taken to yahoo properties.</p>



<p>This is exemplified in this video from November 2021:</p>



<figure class="wp-block-video"><video controls src="https://deepsee.io/wp-content/uploads/2022/04/Yahoo-pops-among-others-from-nov-2021.mp4"></video></figure>



<p>In this video, it seems the pop campaign destination leads to a redirect domain meant to leech off bit.ly&#8217;s reputation, &#8220;getbitly.pro;&#8221; again, this is not a domain operated by bit.ly, the popular link shortening service.<br></p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="919" height="36" src="https://deepsee.io/wp-content/uploads/2022/04/image.png" alt="Large Publishers Buy Incentivized Traffic &amp; Popups Too" class="wp-image-1073" srcset="https://deepsee.io/wp-content/uploads/2022/04/image.png 919w, https://deepsee.io/wp-content/uploads/2022/04/image-300x12.png 300w, https://deepsee.io/wp-content/uploads/2022/04/image-768x30.png 768w" sizes="(max-width: 919px) 100vw, 919px" title="Large Publishers Buy Incentivized Traffic &amp; Popups Too"></figure>



<p>Subsequently, users are sent to another redirect domain: viralsparks.io<br></p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="925" height="27" src="https://deepsee.io/wp-content/uploads/2022/04/image-1.png" alt="Large Publishers Buy Incentivized Traffic &amp; Popups Too" class="wp-image-1074" srcset="https://deepsee.io/wp-content/uploads/2022/04/image-1.png 925w, https://deepsee.io/wp-content/uploads/2022/04/image-1-300x9.png 300w, https://deepsee.io/wp-content/uploads/2022/04/image-1-768x22.png 768w" sizes="(max-width: 925px) 100vw, 925px" title="Large Publishers Buy Incentivized Traffic &amp; Popups Too"></figure>



<p>In these examples, we can see that the query string parameters are preserved; the user is simply bounced to anonymize upstream sources of traffic.</p>



<p>We believe that the Viral Sparks (the entity in the domain field above) is the one who purchased the pop traffic to be sent to Yahoo properties. This is kind of obvious, when you consider that viralsparks.io sent users to <a href="https://webcache.googleusercontent.com/search?q=cache:vrKGUTHwMqMJ:https://viralsparks.com/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=us#/about" data-type="URL" data-id="https://webcache.googleusercontent.com/search?q=cache:vrKGUTHwMqMJ:https://viralsparks.com/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=us#/about" target="_blank" rel="noopener">viralsparks.com</a> (cached version linked) until late Q4 2021. </p>



<p>Likely they realized their fraud activity would become too obvious if they used their corporate domain to redirect pop traffic, so they later changed the redirect domain to adverify.cloud while still using the same query string params / values during the act of redirecting pop traffic to yahoo properties; more on this later.</p>



<p>Ultimately, the user lands on a yahoo page with a video ad (this is a consistent observation from both the pop &amp; incentivized traffic).</p>



<figure class="wp-block-image size-large"><img decoding="async" loading="lazy" width="1024" height="23" src="https://deepsee.io/wp-content/uploads/2022/04/image-4-1024x23.png" alt="Large Publishers Buy Incentivized Traffic &amp; Popups Too" class="wp-image-1087" srcset="https://deepsee.io/wp-content/uploads/2022/04/image-4-1024x23.png 1024w, https://deepsee.io/wp-content/uploads/2022/04/image-4-300x7.png 300w, https://deepsee.io/wp-content/uploads/2022/04/image-4-768x17.png 768w, https://deepsee.io/wp-content/uploads/2022/04/image-4-1536x34.png 1536w, https://deepsee.io/wp-content/uploads/2022/04/image-4.png 1812w" sizes="(max-width: 1024px) 100vw, 1024px" title="Large Publishers Buy Incentivized Traffic &amp; Popups Too"></figure>



<p>Again, in this url we can see preserved query string parameters from the original &#8220;getbitly.pro&#8221; request. This is one way that we are able to flag the sourced traffic, and this lets us give readers the means to identify traffic in their own data.</p>



<p>The behavior is quite similar for pops that took users to other Yahoo family properties, like techcrunch.com:</p>



<figure class="wp-block-video"><video controls src="https://deepsee.io/wp-content/uploads/2022/04/Techcrunch-Pop-session.mp4"></video></figure>



<p>The methodology in this case is nearly exactly the same, though the entity redirecting the traffic in this case has changed their url from viralsparks.io to adverify.cloud.<br></p>



<h2 class="wp-block-heading" id="5-how-to-identify-the-pop-traffic-in-logs">How to Identify the Pop Traffic in Logs</h2>



<p>For anyone who has access to full page URLs on their programmatic buys, this will be easy. The pop visits are signified by the presence of several possible query string parameters relating to Viral Sparks. These filters should only be applied to the following domains &amp; their subdomains: yahoo.com, techcrunch.com, autoblog.com</p>



<ul><li>Look for the presence of a query string parameter called &#8220;aolapubid&#8221; which has a value starting with &#8220;vsp&#8221; (vsp likely signals <strong>V</strong>iral <strong>Sp</strong>arks)<ul><li>Example:  <strong>aolapubid=vsp</strong>237bedbaa9038610fe9b2c34c8d50432</li></ul></li><li>Look for the presence of a query string parameter called &#8220;cid&#8221; which has a value starting with &#8220;vs&#8221; (vs likely signals <strong>V</strong>iral <strong>S</strong>parks)<ul><li>Example: <strong>cid=vs</strong>_yv</li></ul></li></ul>



<p>For anyone looking for more examples, we&#8217;ve compiled hundreds of cases from November 2021: <a href="https://docs.google.com/spreadsheets/d/1xxAIqVBHandZdLWCmprHSVsg53sCnr_GBrZTJ-CSEi4/edit?usp=sharing" target="_blank" rel="noopener">https://docs.google.com/spreadsheets/d/1xxAIqVBHandZdLWCmprHSVsg53sCnr_GBrZTJ-CSEi4/edit?usp=sharing</a></p>



<p>In the examples, the &#8220;from_domain&#8221; is the domain we visited, and which prompted the pop / redirect activity. The &#8220;to_domain&#8221; is the ultimate destination of that pop / redirect activity. The full urls in the &#8220;examples&#8221; tab are cases from which we were able to reverse engineer the identification strategy.</p>



<h2 class="wp-block-heading" id="6-describing-the-incentivized-traffic-with-examples">Describing the Incentivized Traffic With Examples</h2>



<p>In February 2021 we decided it would be in the best interest of the industry if we checked into who was currently buying traffic from some of the top incentivized traffic platforms. We started by checking into the active campaigns on: mypoints.com, swagbucks.com, prizerebel.com, zoombucks.com, and inboxdollars.com.</p>



<p>We found that yahoo had active campaigns on swagbucks.com, mypoints.com, and inboxdollars.com. They weren&#8217;t the only site purchasing traffic, but we did find it particularly noteworthy due to the fact that yahoo inventory is purchased across many campaigns worldwide. Their volume is very high, and we were curious to know how much was supported by incentivized traffic.</p>



<p>It was additionally noteworthy due to the time period across which traffic was purchased. Between these sites, there was at least one active campaign until mid-February 2022, which makes at least a year-long incentivized traffic campaign.</p>



<p>The following videos show the flow of traffic from swagbucks.com to yahoo.com:</p>



<figure class="wp-block-video"><video controls src="https://deepsee.io/wp-content/uploads/2022/04/Prodege-Yahoo-flow.mp4"></video></figure>



<p>As we can see, participation in this program requires enabling pop-ups in your browser. This is likely to obfuscate the true source of traffic.</p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="763" height="497" src="https://deepsee.io/wp-content/uploads/2022/04/image-5.png" alt="Popups required to participate in incentivized traffic platforms" class="wp-image-1090" srcset="https://deepsee.io/wp-content/uploads/2022/04/image-5.png 763w, https://deepsee.io/wp-content/uploads/2022/04/image-5-300x195.png 300w" sizes="(max-width: 763px) 100vw, 763px" title="Large Publishers Buy Incentivized Traffic &amp; Popups Too"></figure>



<p>As we can see, users are required to view multiple pages for 30 seconds+ each in order to claim their swagbucks / points. This amounts to a blatant manipulation of bounce &amp; time-on-page metrics; no way around it.</p>



<p>Here&#8217;s another video showing how the experience appears to potential swagbucks users; they don&#8217;t even need to be tabbed-in to the content, they can be browsing in another tab and still reap the rewards:</p>



<figure class="wp-block-video"><video controls src="https://deepsee.io/wp-content/uploads/2022/04/Swagbucks-Yahoo-Jan-session-multiple-page-views-documented.mp4"></video></figure>



<p><br></p>



<h2 class="wp-block-heading" id="7-how-to-identify-the-incentivized-traffic-in-logs">How to Identify the Incentivized Traffic In Logs</h2>



<p>As we mention in the intro, and as exemplified below, the traffic can be characterized by the presence of a query string value like &#8220;<strong>yotavid</strong>&#8220;:</p>



<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="926" height="134" src="https://deepsee.io/wp-content/uploads/2022/04/image-7.png" alt="Large Publishers Buy Incentivized Traffic &amp; Popups Too" class="wp-image-1094" srcset="https://deepsee.io/wp-content/uploads/2022/04/image-7.png 926w, https://deepsee.io/wp-content/uploads/2022/04/image-7-300x43.png 300w, https://deepsee.io/wp-content/uploads/2022/04/image-7-768x111.png 768w" sizes="(max-width: 926px) 100vw, 926px" title="Large Publishers Buy Incentivized Traffic &amp; Popups Too"></figure>



<p>In every case of incentivized activity, this query string value appears, and it is confirmed to not appear during organically generated sessions. The exact meaning behind it is not as clear as the Viral Sparks related query string parameters.</p>



<p>Additionally, the pop traffic we identified ALSO carries the &#8220;yotavid&#8221; string in the full page URL. Basically, you can use that information in the following way (only for yahoo, techcrunch, and autoblog urls):</p>



<ul><li>To find incentivized traffic that was NOT a popup, look for the following combination of factors:<ul><li>Full page URL contains &#8220;yotavid&#8221;</li><li>URL does NOT contain the <a href="#5-how-to-identify-the-pop-traffic-in-logs" data-type="internal" data-id="#5-how-to-identify-the-pop-traffic-in-logs">Viral Sparks indicators</a></li></ul></li></ul>



<h2 class="wp-block-heading" id="8-conclusion-and-takeaways"><br>Conclusion and Takeaways</h2>



<p>If you would like to identify ANY and ALL of the invalid sourced traffic we refer to in this article, you need simply look for the presence of a &#8220;<strong>yotavid&#8221;</strong> string in the full page URLs you have relating to yahoo advertising events.</p>



<p>However, in the course of researching the effects of this scheme on the industry at large, we found that it&#8217;s very common for buyers to not even be privy to the full page URLs they serve ads on. This can sometimes be related to the technological limitations of measuring the top level URL from within the context of an ad-unit, but often it&#8217;s just information the SSP / DSP doesn&#8217;t deign relevant to include in reporting.</p>



<p>Incentivized traffic is a huge industry, and it touches all device types and ad experiences. One of the best, if not ONLY, ways to identify such traffic is by analyzing full page URLs inclusive of query strings. Informed media buying platforms need access to this information, so be sure to request it of all supply partners you work with. Unwillingness to provide such information can be a big red flag, so tread carefully.</p>



<p>Thus-far-unreleased findings of ours indicate that this scheme is just a drop in the bucket, so look forward to learning about the even larger incentivized traffic events we are currently tracking.</p>



<p><strong>Remember</strong>: the line for invalid traffic has officially been drawn at the <strong>user&#8217;s intent</strong> to visit an ad-supported publisher page.</p>
]]></content:encoded>
					
		
		<enclosure url="https://deepsee.io/wp-content/uploads/2022/04/Yahoo-pops-among-others-from-nov-2021.mp4" length="3839602" type="video/mp4" />
<enclosure url="https://deepsee.io/wp-content/uploads/2022/04/Techcrunch-Pop-session.mp4" length="17935087" type="video/mp4" />
<enclosure url="https://deepsee.io/wp-content/uploads/2022/04/Prodege-Yahoo-flow.mp4" length="10812353" type="video/mp4" />
<enclosure url="https://deepsee.io/wp-content/uploads/2022/04/Swagbucks-Yahoo-Jan-session-multiple-page-views-documented.mp4" length="21060226" type="video/mp4" />

			</item>
		<item>
		<title>A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us</title>
		<link>https://deepsee.io/blog/a-case-study-in-monetizing-piracy</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Fri, 29 Oct 2021 15:24:10 +0000</pubDate>
				<category><![CDATA[Research & Development]]></category>
		<category><![CDATA[News & Events]]></category>
		<category><![CDATA[Laundering]]></category>
		<category><![CDATA[pin]]></category>
		<category><![CDATA[Piracy]]></category>
		<category><![CDATA[Traffic]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=852</guid>

					<description><![CDATA[Even pirates need a retirement plan]]></description>
										<content:encoded><![CDATA[<div class="ub_table-of-contents" data-showtext="show" data-hidetext="hide" data-scrolltype="auto" id="ub_table-of-contents-54bdefe3-73b0-4f7a-b5f3-8afa8f580111" data-initiallyhideonmobile="false"
                    data-initiallyshow="true"><div class="ub_table-of-contents-header-container"><div class="ub_table-of-contents-header">
                    <div class="ub_table-of-contents-title"><strong>Table of Contents</strong></div></div></div><div class="ub_table-of-contents-extra-container"><div class="ub_table-of-contents-container ub_table-of-contents-1-column "><ul><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#0-introduction>Introduction</a><ul><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#1-maintaining-amp-monetizing-a-piracy-site>Maintaining &amp; Monetizing a Piracy Site</a></li><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#2-teasing-out-the-grey-areas>Teasing Out the Grey Areas</a></li></ul></li><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#3-introducing-the-players-amp-explaining-the-evidence>Introducing the Players &amp; Explaining the Evidence</a><ul><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#4-the-players>The Players</a></li><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#5-the-evidence>The Evidence</a><ul><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#6-there-is-a-common-owner-between-chessmobaus-and-mangaowl>There Is a Common Owner Between Chessmoba.us and MangaOwl</a></li><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#7-when-mangaowl-users-are-reading-pirated-amp-adult-content-advertisers-think-they-are-on-chessmobaus>When MangaOwl Users Are Reading Pirated &amp; Adult Content, Advertisers Think They Are On Chessmoba.us</a></li><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#8-piracy-is-the-primary-reason-anyone-ends-up-on-these-content-sites>Piracy is the Primary Reason Anyone Ends Up On These Content Sites</a></li></ul></li></ul></li><li><a href=https://deepsee.io/blog/a-case-study-in-monetizing-piracy#9-conclusion>Conclusion</a></li></ul></div></div></div>


<h2 class="wp-block-heading" id="0-introduction">Introduction</h2>



<p>In this research we dig into traffic sourcing on chessmoba.us, a site creating well over a hundred million daily impression opportunities. It is one of a handful of sites which have acted as cover for MangaOwl (a site which links readers to pirated web cartoons), and the extended MangaOwl family of web properties.</p>



<p>Using code snippets captured from live traffic, and other publicly available data sources, we will show how the vast majority of visits to chessmoba.us &amp; its cohorts are actually laundered visits of users who are reading pirated manga / web cartoons (often sexually explicit in nature). </p>



<p>What&#8217;s of additional interest here is the laundering mechanism; users get an experience free of pop-unders &amp; redirects, but advertisers still get zero visibility into the true source of traffic.</p>



<h3 class="wp-block-heading" id="1-maintaining-amp-monetizing-a-piracy-site">Maintaining &amp; Monetizing a Piracy Site</h3>



<p>Pirated content is, and always has been, one of the top things people will search for on the internet. Due to the questionable legal status of the content, most supply-side platforms (SSPs; aggregators of publisher inventory) have rules against pirated content appearing on their member publishers&#8217; sites, and some advertisers try to avoid this content.</p>



<p>Google, the worlds largest aggregator of publisher inventory, has <a href="https://support.google.com/adsense/answer/10502938?hl=en" target="_blank" rel="noreferrer noopener">policies which specifically forbid pirated &amp; sexually explicit content</a>, both of which are in the scope of our conversation today. Without access to Google ad products, it becomes much harder to monetize at a large scale, especially if you are an individual, or small team.</p>



<p>Depending on the number of copyright claims made against these sites, they may have to constantly rotate domain names in order to avoid users arriving to a site that has been disabled by the hosting provider. The chances of this are more likely if the site hosts links to content produced by big Hollywood studios, because these studios have the resources to programmatically hunt down &amp; make copyright complaints against each new site.</p>



<p>Sites hosting pirated content often turn to less-reputable ad networks which have no policy against their content. These 2nd / 3rd string monetization partners usually litter the page with near pornographic ads (adult sites are limited by which networks they can run their ads with; example below), or they create tons of pop-unders / redirects that take users off-site. </p>



<figure class="wp-block-image size-large nsfw"><img decoding="async" loading="lazy" width="1024" height="950" src="https://deepsee.io/wp-content/uploads/2021/10/example-anime-streaming-site-2-1024x950.png" alt="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us" class="wp-image-1032" srcset="https://deepsee.io/wp-content/uploads/2021/10/example-anime-streaming-site-2-1024x950.png 1024w, https://deepsee.io/wp-content/uploads/2021/10/example-anime-streaming-site-2-300x278.png 300w, https://deepsee.io/wp-content/uploads/2021/10/example-anime-streaming-site-2-768x713.png 768w, https://deepsee.io/wp-content/uploads/2021/10/example-anime-streaming-site-2.png 1080w" sizes="(max-width: 1024px) 100vw, 1024px" title="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us"><figcaption class="wp-element-caption"><em>Example of advertisements on one of the most popular anime streaming sites: gogoanim</em>e</figcaption></figure>



<p>As we&#8217;ve written about in the past, these pop/redirect ad networks play a large part in the global cookie stuffing &amp; traffic laundering marketplace:</p>


<ul class="ub_styled_list" id="ub_styled_list-75dbbccc-eaf9-45dd-8e9c-676d1d1d6c4b">
<li class="ub_styled_list_item"><a href="https://deepsee.io/blog/how-pop-traffic-fuels-affiliate-fraud" target="_blank" rel="noreferrer noopener">How Pop Traffic Fuels Affiliate Fraud</a></li>

<li class="ub_styled_list_item"><a href="https://deepsee.io/blog/turning-bad-news-into-ad-views" target="_blank" rel="noreferrer noopener">Turning Bad News Into Ad Views: How Some Publishers Get Laundered Traffic from Controversial Content</a></li>
</ul>


<p>If you&#8217;re looking for further reading into this topic at a high level, the <a href="https://www.digitalcitizensalliance.org/clientuploads/directory/Reports/Breaking-Bads-Report.pdf" target="_blank" rel="noreferrer noopener">Digital Citizens Alliance and Whitebullet have also done significant research</a> into who supports piracy sites with on-page advertising (our writing today focuses more on the laundering aspect of the problem).</p>



<h3 class="wp-block-heading" id="2-teasing-out-the-grey-areas">Teasing Out the Grey Areas</h3>



<p>There exists a moral / technical grey area here when it comes to advertising to users on sites with pirated content: advertisers want to reach real people, and real people consume pirated content in high volume. As a consequence, there are billions of laundered impressions generated every day.<br><br>Advertisers often optimize their campaigns towards human users &amp; viewable placements; both these things are often true of impressions generated from laundered traffic.</p>



<p>One of the main reasons that we built DeepSee is to suss out these opaque traffic laundering marketplaces. The laundering process tends to remove the true source a visitor was sent from, and creates destination-site visits that appear direct. This is clearly visible when you look at the inbound traffic mix for sites who buy a lot of laundered traffic using the free version of tools like Alexa or SimilarWeb, but more on that later.</p>



<p>Honestly, we wouldn&#8217;t think as poorly of these traffic acquisition channels if advertisers were alerted about the nature of the traffic sourcing in subsequent bid requests generated by the destination-site (the &#8220;clean&#8221; content site). This never happens though, and we won&#8217;t hold our breath.</p>



<p>We&#8217;re very interested to hear the opinions of our readers when it comes to the value of laundered traffic, please drop us a line on Twitter or LinkedIn to voice your opinion!</p>



<h2 class="wp-block-heading" id="3-introducing-the-players-amp-explaining-the-evidence">Introducing the Players &amp; Explaining the Evidence</h2>



<p>There are 2 <em>seemingly</em> separate factions involved here. In truth, they are all owned and operated by the same entity, allowing for some unique publisher fraud opportunities that would be very difficult to detect from within an ad creative.</p>



<p>On the other hand, the user experience here is one of the best you can find when it comes to reading pirated comics. As opposed to sites who monetize with pop-unders &amp; redirects, users aren&#8217;t routed through dozens of nefarious intermediaries; instead, users stays in a single environment, controlled end-to-end by the MangaOwl developer.</p>



<h3 class="wp-block-heading" id="4-the-players">The Players</h3>



<ol>
<li>A group of piracy sites centered around MangaOwl, including (but not limited to):
<ul>
<li><strong>mangaowl.net</strong> &#8211; Alexa rank ~17.8k</li>



<li><strong>mangaowl.com </strong>&#8211; Alexa rank ~193k</li>



<li><strong>mangaowls.com</strong> &#8211; Alexa rank ~51k</li>



<li><strong>animeow.me</strong> &#8211; Alexa rank ~1.1mil</li>



<li><strong>animeowl.net</strong> &#8211; Alexa rank ~525k </li>
</ul>
</li>



<li>A group of content sites headed by chessmoba.us in terms of visit volume (all bid volume projections made using Xandr&#8217;s forecasting API):
<ul>
<li><strong>chessmoba.us</strong> &#8211; Estimated daily bid volume of over 100 million(forecast date: 10-27-2021)</li>



<li><strong>mostraveller.com</strong> &#8211; Estimated daily bid volume of over 30 million  (forecast date: 10-22-2021; seemingly discovered &amp; cut-off by 10-27-2021)</li>



<li><strong>fromyourinside.com</strong> &#8211; Estimated daily bid volume of over 10 million (forecast date: 10-27-2021)</li>



<li><strong>chill-game.com</strong> &#8211; Estimated daily bid volume of just a few thousand (forecast date: 10-27-2021)</li>



<li><strong>portablegamingdude.com</strong> &#8211; Estimated daily bid volume  currently ~0 (forecast date: 10-27-2021) </li>
</ul>
</li>
</ol>



<p>The piracy sites are most obviously linked by footer links which are clearly visible on mangaowl.net, and also by naming convention</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" loading="lazy" src="https://deepsee.io/wp-content/uploads/2021/10/mangaowl-family.png" alt="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us" class="wp-image-917" width="840" height="316" srcset="https://deepsee.io/wp-content/uploads/2021/10/mangaowl-family.png 985w, https://deepsee.io/wp-content/uploads/2021/10/mangaowl-family-300x113.png 300w, https://deepsee.io/wp-content/uploads/2021/10/mangaowl-family-768x289.png 768w" sizes="(max-width: 840px) 100vw, 840px" title="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us"><figcaption class="wp-element-caption"><em>Footer on mangaowl.net showing the link to animeow &amp; animeowl</em></figcaption></figure>



<p>The Content sites are not as clearly linked, but we argue that they behave in such a specific &amp; atypical way that there is no other possibility besides that they have been configured by the same entity to do the same thing.</p>



<h3 class="wp-block-heading" id="5-the-evidence">The Evidence</h3>



<h4 class="wp-block-heading" id="6-there-is-a-common-owner-between-chessmobaus-and-mangaowl">There Is a Common Owner Between Chessmoba.us and MangaOwl</h4>



<p>Let&#8217;s get this out of the way first: chessmoba.us is absolutely registered by the operator of MangaOwl, and the proof is surprisingly easy to acquire.</p>



<p>It&#8217;s particularly easy to prove who the owner of chessmoba.us is, because <a href="https://www.about.us/faqs" target="_blank" rel="noreferrer noopener">you can&#8217;t register a domain with a .US suffix using WHOIS privacy protection</a>. You can even check this right now by typing in the domain name on <a href="https://www.whois.us/" target="_blank" rel="noreferrer noopener">the .US WHOIS lookup service</a>.</p>



<p>Here is a summary of what you&#8217;ll see:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><img decoding="async" loading="lazy" src="https://deepsee.io/wp-content/uploads/2021/10/Chessmoba-registration.png" alt="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us" class="wp-image-908" width="654" height="472" srcset="https://deepsee.io/wp-content/uploads/2021/10/Chessmoba-registration.png 654w, https://deepsee.io/wp-content/uploads/2021/10/Chessmoba-registration-300x217.png 300w" sizes="(max-width: 654px) 100vw, 654px" title="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us"><figcaption class="wp-element-caption"><em>Summary of information gathered from a chessmoba.us whois lookup</em></figcaption></figure></div>


<p>The <strong>&#8220;owl&#8221;</strong> in the registrant email &#8220;zz<strong>owl</strong>099@gmail.com&#8221; is a good hint, but we can additionally see that this email is linked to the moderator of MangaOwl by doing a quick google search for the email address.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="1317" height="776" src="https://deepsee.io/wp-content/uploads/2021/10/MangaOwl-Moderator.png" alt="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us" class="wp-image-909" srcset="https://deepsee.io/wp-content/uploads/2021/10/MangaOwl-Moderator.png 1317w, https://deepsee.io/wp-content/uploads/2021/10/MangaOwl-Moderator-300x177.png 300w, https://deepsee.io/wp-content/uploads/2021/10/MangaOwl-Moderator-1024x603.png 1024w, https://deepsee.io/wp-content/uploads/2021/10/MangaOwl-Moderator-768x453.png 768w" sizes="(max-width: 1317px) 100vw, 1317px" title="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us"><figcaption class="wp-element-caption"><em>MangaOwl Moderator &#8220;ZZ&#8221; lists the email zzowl099@gmail.com as the inbox where copyrighted materials should be sent</em></figcaption></figure></div>


<p>If you go through that process, you&#8217;ll end up on the above post before reaching the 2nd page of the Google search results. It shows a message from a MangaOwl moderator named &#8220;<strong>ZZ</strong>&#8221; who instructs members to send their copyrighted materials to &#8220;<strong>zz</strong>owl099@gmail.com&#8221; in order to have them shared with the community.</p>



<p>The other domains in our &#8220;content-site&#8221; group are registered as sites with a .COM suffix, and they are WHOIS privacy protected such that you can&#8217;t as easily infer ownership. However, we will show they are linked by means of behavior, in such a way that they could ONLY be managed by the same person.</p>



<h4 class="wp-block-heading" id="7-when-mangaowl-users-are-reading-pirated-amp-adult-content-advertisers-think-they-are-on-chessmobaus">When MangaOwl Users Are Reading Pirated &amp; Adult Content, Advertisers Think They Are On Chessmoba.us</h4>



<p>The following video demonstrates how a typical mangaowl.net reader generates laundered impressions to chessmoba.us:</p>


<ul class="ub_styled_list" id="ub_styled_list-123ac2a2-dcdc-4950-a37e-5e58ad55b182">
<li class="ub_styled_list_item">Note: <em>Due to the fact that many of our readers may be watching this while at work, or in a professional setting, we chose to showcase a comic that is safe for work. However, you will have no trouble finding sexually explicit content on MangaOwl if you go looking for it.</em></li>
</ul>


<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Pirate Site Impression Laundering" width="500" height="281" src="https://www.youtube.com/embed/2OuUIw-t3Cg?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div><figcaption class="wp-element-caption"><em>Video showing specifics of the exploit; best viewed full screen</em></figcaption></figure>



<p>We asked our CTO to break down the activity into a summary for those who can&#8217;t watch the video, or just need more elaboration. Particularly, we asked him to elaborate on if it&#8217;s even possible for this to happen without collaboration between the piracy &amp; &#8220;clean&#8221; content sites:</p>



<blockquote class="wp-block-quote">
<p>The behavior kicks off when a user clicks a link from mangaowl, and is navigated to a reader URL over at chessmoba.us such as (https://chessmoba.us/reader/reader/73165/1461957/{&#8230;}). If the advertiser saw this URL, the whole scheme would become obvious to them, because this is the URL which takes you to the manga content.<br><br>This reader page loads an image based slideshow viewer, presenting the user with the content they intended to browse to, however, upon page load there is a somewhat obfuscated script that constructs a new, advertising friendly URL (https://chessmoba.us/team/{&#8230;}) that replaces the reader one using <a href="https://developer.mozilla.org/en-US/docs/Web/API/History/replaceState" target="_blank" data-type="URL" data-id="https://developer.mozilla.org/en-US/docs/Web/API/History/replaceState" rel="noreferrer noopener">History.replaceState()</a>.</p>



<p>The important detail is that the containing site does not react to this change by redirecting to the advertiser friendly page. This allows chessmoba to render ad impressions attributed to a URL that shows different content if you browse to it directly. Furthermore, History.replaceState() only works to modify the URL the browser displays if the new URL belongs to the same origin as the one housing the script making the change.</p>



<p>That, along with other signals (like a verified SSL certificate) point to the fact that, barring an exploit and a very (to put it lightly) relaxed system administrator at chessmoba, this would definitely require cooperation from the operators of mangaowl and chessmoba.</p>



<p>Moreso there is evidence of referrer spoofing, and the presence of chessmoba&#8217;s google tag manager account IDs on the fraudulent reader page signals that chessmoba is 100% in on it. Another tidbit that shows intent is the fact that the script that updates the URL to the advertiser friendly one removes itself after 500 milliseconds.</p>
<cite>-Antonio Torres, CTO &amp; Co-Founder @ deepsee.io</cite></blockquote>



<p>For anyone interested, he also did a <a href="https://pastebin.com/gyuENnCx" target="_blank" rel="noreferrer noopener">breakdown of the code </a>which manipulates the URL that advertisers would detect.</p>



<p>As you may recall from the video, this reader code is recycled across all of the content sites that show manga, further confirming the shared purpose / ownership assigned to each. You can take a &#8220;reader/reader/&#8221; link from chessmoba.us, replace the host with mostraveller.com, or fromyourinside.com, and the link still works the exact same way. Essentially, the content sites are interchangeable; just fancy fronts for the advertising business of MangaOwl.</p>



<p>The anime video laundering fronts aren&#8217;t as interchangeable, but work in basically the same way to obfuscate the source of traffic. For example:</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Pirate Site Impression Laundering Pt. 2" width="500" height="281" src="https://www.youtube.com/embed/eWtUS976ti4?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div><figcaption class="wp-element-caption"><em>Video showing how the video side of the equation operates</em>;  <em>best viewed full screen</em> </figcaption></figure>



<h4 class="wp-block-heading" id="8-piracy-is-the-primary-reason-anyone-ends-up-on-these-content-sites">Piracy is the Primary Reason Anyone Ends Up On These Content Sites</h4>



<p>Using publicly available information, such as Alexa traffic flows, we can confirm that over 50% of visits to these content sites are precipitated by a visit to one of the MangaOwl sites.<br><br>For example, take chessmoba.us&#8217; traffic flow:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="697" height="475" src="https://deepsee.io/wp-content/uploads/2021/10/chessmoba-alexa-site-flow.png" alt="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us" class="wp-image-952" srcset="https://deepsee.io/wp-content/uploads/2021/10/chessmoba-alexa-site-flow.png 697w, https://deepsee.io/wp-content/uploads/2021/10/chessmoba-alexa-site-flow-300x204.png 300w" sizes="(max-width: 697px) 100vw, 697px" title="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us"><figcaption class="wp-element-caption"><em>The site flow report clearly shows that 55%+ of visits to chessmoba.us came from mangaowl sites</em></figcaption></figure></div>


<p>Or, take the mostraveller.com flow:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="703" height="426" src="https://deepsee.io/wp-content/uploads/2021/10/mostraveller-alexa-site-flow.png" alt="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us" class="wp-image-953" srcset="https://deepsee.io/wp-content/uploads/2021/10/mostraveller-alexa-site-flow.png 703w, https://deepsee.io/wp-content/uploads/2021/10/mostraveller-alexa-site-flow-300x182.png 300w" sizes="(max-width: 703px) 100vw, 703px" title="A Case Study in Monetizing Piracy: MangaOwl and Chessmoba.us"><figcaption class="wp-element-caption"> <em>The site flow report clearly shows that 58%+ of visits to mostraveller.com came from mangaowl sites</em> </figcaption></figure></div>


<p>This exercise can be repeated ad-nauseam for each of the content sites we list, but we don&#8217;t want to bloat the page. Suffice it to say, 50% is a conservative estimate of how much traffic is spoofed to each of these sites. We project the number is closer to 80-100% depending on the site.</p>



<h2 class="wp-block-heading" id="9-conclusion">Conclusion</h2>



<p>This novel approach to impression laundering is only made possible by shared management between the content sites and the pirate sites. It succeeds, because:</p>



<ol>
<li>The URLs that advertisers &amp; verification services detect takes you to a real content page, with a completely &#8220;safe&#8221; look &amp; feel.</li>



<li>The pirate sites don&#8217;t have any advertising on them; they were very careful to keep the advertising aspect of the business separate from the pirated content indexing.
<ul>
<li>This means there is no chance for advertisers, or any ad-tech org whose primary source of data comes from scripts appended to ad creatives, to leverage user cookies to detect that a user was on MangaOwl before they went to chessmoba. </li>



<li>Tools like Alexa are able to infer the traffic flow because they track users outside of the context of advertising.</li>
</ul>
</li>



<li>Users get a clean browsing experience, and don&#8217;t really have a reason to complain. For this same reason, the site doesn&#8217;t set off malware / maladverising flags.</li>
</ol>



<p>Based on this research, and using our robust historical data on how sites relate to each other, we have developed the tools to flag such sites dynamically as they appear. If you are looking for an audit of your media spend, don&#8217;t hesitate to reach out! We are happy to look over delivery reports with you, and point out any risky patterns &amp; publishers.</p>



<p>If you would like to continue the conversation, or if you have any questions that are not covered here, please reach out to us on <a href="https://twitter.com/deepsee_io" target="_blank" rel="noreferrer noopener">Twitter</a>, or on <a href="https://www.linkedin.com/company/65591952" target="_blank" rel="noreferrer noopener">LinkedIn</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>DeepSee Targeting Options: Now Available Through Peer39&#8217;s Contextual Marketplace!</title>
		<link>https://deepsee.io/blog/deepsee-targeting-and-blocking-now-available-through-peer39</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Wed, 04 Aug 2021 00:00:35 +0000</pubDate>
				<category><![CDATA[News & Events]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=780</guid>

					<description><![CDATA[Introducing the newest way to easily protect your campaigns from ad fraud &#038; sites with bad user experiences]]></description>
										<content:encoded><![CDATA[<p>Earlier this week, <a href="https://twitter.com/deepsee_io/status/1419710713992466434" target="_blank" rel="noopener">on our twitter</a>, we revealed a new way to protect your campaigns from ad fraud, and suspicious publishers: targeting and blocking using <a href="https://peer39.com/" target="_blank" rel="noopener">Peer39</a> segments!</p>
<p>To quote their <a href="https://www.peer39.com/product/" target="_blank" rel="noopener">product page</a>, Peer39 is &#8220;[t]he industry’s largest and most accurate pre-bid contextual category library to find the most relevant, suitable, safe, and quality inventory to target,&#8221; and we are super excited to be a part of that!</p>
<p>The benefit to us, and to you readers out there, is that Peer39 is already integrated into many of the worlds leading ad-buying platforms!</p>
<p>To that marketplace, we are adding the following lists which can be positively or negatively targeted (we detail the intended purpose in the <strong>Purpose</strong> column):</p>
<table class="targeting-table" dir="ltr" border="1" cellspacing="0" cellpadding="0">
<colgroup>
<col width="471" />
<col width="144" />
<col width="498" /></colgroup>
<tbody>
<tr>
<th data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Category Name&quot;}">Category Name</th>
<th data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Purpose&quot;}">Purpose</th>
<th data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Description&quot;}">Description</th>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Best User Experience - Ads Txt Enabled - English Language Focus&quot;}">Best User Experience &#8211; Ads Txt Enabled &#8211; English Language Focus</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Targeting&quot;}"><span class="targeting-category"><strong>Targeting</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have low-medium Bloat &amp; Resource Hog scores. This list is filtered only to sites with COM, NET, TV, CO.UK, CA, AU, and ORG TLDs&quot;}">These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have low-medium Bloat &amp; Resource Hog scores. This list is filtered only to sites with COM, NET, TV, CO.UK, CA, AU, and ORG TLDs</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Best User Experience - Ads Txt Enabled - International&quot;}">Best User Experience &#8211; Ads Txt Enabled &#8211; International</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Targeting&quot;}"><span class="targeting-category"><strong>Targeting</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have low-medium Bloat &amp; Resource Hog scores.&quot;}">These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have low-medium Bloat &amp; Resource Hog scores.</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Lowest Risk - Ads Txt Enabled - English Language Focus&quot;}">Lowest Risk &#8211; Ads Txt Enabled &#8211; English Language Focus</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Targeting&quot;}"><span class="targeting-category"><strong>Targeting</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have risk scores &lt;=40%. This list is filtered only to sites with COM, NET, TV, CO.UK, CA, AU, and ORG TLDs&quot;}">These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have risk scores &lt;=40%. This list is filtered only to sites with COM, NET, TV, CO.UK, CA, AU, and ORG TLDs</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Lowest Risk - Ads Txt Enabled - International&quot;}">Lowest Risk &#8211; Ads Txt Enabled &#8211; International</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Targeting&quot;}"><span class="targeting-category"><strong>Targeting</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have risk scores &lt;=40%.&quot;}">These sites host a valid ads.txt file, making it possible to verify domain spoofing events, and they have risk scores &lt;=40%.</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Negative - Suspicious Ranking History or Newly Registered - Ads Txt Enabled&quot;}">Negative &#8211; Suspicious Ranking History or Newly Registered &#8211; Ads Txt Enabled</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Blocking&quot;}"><span class="blocking-category"><strong>Blocking</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites have climbed or fallen in rank at a conspicuous pace, or they have just been regestered in the past 180 days. They also host valid ads.txt files, showing an intent to advertise&quot;}">These sites have climbed or fallen in rank at a conspicuous pace, or they have just been registered in the past 180 days. They also host valid ads.txt files, showing an intent to advertise</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Negative - Arbitrage Sites - Chumbox Destinations&quot;}">Negative &#8211; Arbitrage Sites &#8211; Chumbox Destinations</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Blocking&quot;}"><span class="blocking-category"><strong>Blocking</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites source nearly all of their traffic from sponsored content boxes, and their sole purpose is to make money from advertising revenue. Sites in this category are more likely to take advantage of users in order to make money.&quot;}">These sites source nearly all of their traffic from sponsored content boxes, and their sole purpose is to make money from advertising revenue. Sites in this category are more likely to take advantage of users in order to make money.</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Negative - Forced Navigation Risk - Ads Txt Enabled&quot;}">Negative &#8211; Forced Navigation Risk &#8211; Ads Txt Enabled</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Blocking&quot;}"><span class="blocking-category"><strong>Blocking</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites take advantage of users by forcing them to visit any number of destination sites. This category also includes sites that receive a signifigant portion of their traffic from users who are forcibly navigated. They also host valid ads.txt files, showing an intent to advertise&quot;}">These sites take advantage of users by forcing them to visit any number of destination sites. This category also includes sites that receive a significant portion of their traffic from users who are forcibly navigated. They also host valid ads.txt files, showing an intent to advertise</td>
</tr>
<tr>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Negative - Misleading Content Format - Hidden Slideshows&quot;}">Negative &#8211; Misleading Content Format &#8211; Hidden Slideshows</td>
<td style="text-align: center;" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;Blocking&quot;}"><span class="blocking-category"><strong>Blocking</strong></span></td>
<td data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;These sites source nearly all of their traffic from sponsored content boxes, and their sole purpose is to make money from advertising revenue. They also represent themselves as sites with Single-Page format content, when most users are actually served Slideshow format content.&quot;}">These sites source nearly all of their traffic from sponsored content boxes, and their sole purpose is to make money from advertising revenue. They also represent themselves as sites with Single-Page format content, when most users are actually served Slideshow format content.</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Today you can find our targeting options in Xandr, and soon they&#8217;ll be available wherever Peer39 is available. Until the end of the year, we are making these options available at the price of <strong>$.11 CPM</strong> (cost for a thousand impressions), which is extremely competitive for this space!</p>
<p>Our goal, as it has always been, is to show buyers just how much better their campaigns can perform when advertising on sites with great user experiences. This integration makes it easier then ever to do just that!</p>
<p>Please reach out to us at hello@deepsee.io with any questions, or visit our socials:<br />
Twitter: <a href="https://twitter.com/deepsee_io" target="_blank" rel="noopener">@deepsee_io</a><br />
Linkedin: <a href="https://www.linkedin.com/company/deepseeio" target="_blank" rel="noopener">linkedin.com/company/deepseeio</a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats</title>
		<link>https://deepsee.io/blog/2-tales-one-site-how-arbitrage-sites-manipulate-metrics</link>
		
		<dc:creator><![CDATA[Rocky Moss]]></dc:creator>
		<pubDate>Thu, 15 Jul 2021 00:00:52 +0000</pubDate>
				<category><![CDATA[Research & Development]]></category>
		<guid isPermaLink="false">https://deepsee.io/?p=580</guid>

					<description><![CDATA[Ad arbitrage is the practice of buying web traffic, then selling ad space on your website for more than you paid to acquire the user. It's not a new practice, not by any means, but we've detected a new strategy used to keep it profitable.]]></description>
										<content:encoded><![CDATA[<div class="blog-toc">
<h2><strong>Table of Contents:</strong></h2>
<ul>
<li style="list-style-type: none;">
<ul>
<li><strong><a href="#Introduction_to_Ad_Arbitrage">Introduction to Ad Arbitrage</a></strong></li>
<li><a href="#A_Layman's_Approach_to_Identifying_Arbitrage_Sites"><strong>A Layman&#8217;s Approach to Identifying Arbitrage Sites</strong></a></li>
<li><a href="#How_Does_One_End_Up_Visiting_An_Arbitrage_Site?"><strong>How Does One End Up Visiting An Arbitrage Site?</strong><br />
</a></p>
<ul>
<li><a href="#Checking_the_Inbound_Link_Profile">Checking the Inbound Link Profile</a></li>
<li><a href="#Analyzing_Inbound_Traffic_Channels?">Analyzing Inbound Traffic Channels</a></li>
</ul>
</li>
<li><a href="#Two_Tales_of_One_Website"><strong>Two Tales of One Website</strong><br />
</a></p>
<ul>
<li><a href="#Why_Two_Formats?">Why Show Two Formats? Why Not Just Embrace the Slideshow?</a></li>
</ul>
</li>
<li><a href="#Scaling_Our_Insights_With_Data"><strong>Scaling Our Insights With Data</strong><br />
</a></p>
<ul>
<li><a href="#page_activity_levels">Page Activity Levels &#8211; Direct Link vs Paid</a></li>
<li><a href="#ads_above_fold">Ads Units Above The Fold &#8211; Direct Link vs Paid</a></li>
<li><a href="#total_ad_frames">Ads Units on Page &#8211; Direct Link vs Paid</a></li>
</ul>
</li>
<li><a href="#Conclusion"><strong>Conclusion</strong><br />
</a></li>
</ul>
</li>
</ul>
</div>
<h2 id="Introduction_to_Ad_Arbitrage">Introduction to Ad Arbitrage</h2>
<p>Ad arbitrage is the practice of buying web traffic, then selling ad space on your website for more than you paid to acquire the user. It&#8217;s not a new practice, not by any means, but we&#8217;ve detected a new strategy used to keep it profitable.</p>
<p>Many doubt that arbitrage is still a lucrative strategy; take this poster on BlackHatWorld (a forum for marketers to share how they game the system to make $$$):</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-585 size-full bordered" src="https://deepsee.io/wp-content/uploads/2021/06/so-2011png.png" alt="Is Arbitrage Still Happening?" width="568" height="290" srcset="https://deepsee.io/wp-content/uploads/2021/06/so-2011png.png 568w, https://deepsee.io/wp-content/uploads/2021/06/so-2011png-300x153.png 300w" sizes="(max-width: 568px) 100vw, 568px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>In short: yes they still do it. You need significant volume to make it work, and the tools have changed a little bit from days past, but it still is happening at a staggering scale. In this article, we focus on arbitrage which starts from native ads placements, aka sponsored content boxes.</p>
<p>If you&#8217;re not familiar with the term, you&#8217;re probably familiar with the look of these placements (highlighted red below):<br />
<img decoding="async" loading="lazy" class="aligncenter wp-image-770 size-large bordered" src="https://deepsee.io/wp-content/uploads/2021/07/sponsored-content-box-example-1024x825.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="825" srcset="https://deepsee.io/wp-content/uploads/2021/07/sponsored-content-box-example-1024x825.png 1024w, https://deepsee.io/wp-content/uploads/2021/07/sponsored-content-box-example-300x242.png 300w, https://deepsee.io/wp-content/uploads/2021/07/sponsored-content-box-example-768x619.png 768w, https://deepsee.io/wp-content/uploads/2021/07/sponsored-content-box-example.png 1381w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>When buying traffic from native ads providers with the intention of performing ad arbitrage, the key factors you have to optimize are:</p>
<ol>
<li><strong>CPC (cost-per-click) when acquiring a visitor</strong>
<ul>
<li>This goes down as CTR (Click-through rate) increases, because people buying space in these boxes pay on a per-impression basis.</li>
<li>There are tons of levers to pull here, including:
<ul>
<li>the geos you target</li>
<li>the sites your ads show up on</li>
<li>the user&#8217;s device make/model</li>
<li>the keywords you target</li>
<li>your creative / destination article headline</li>
<li>much more!</li>
</ul>
</li>
</ul>
</li>
<li><strong>RPM (revenue-per-thousand-impressions), or how much you make from your visitors</strong>
<ul>
<li>Someone focused on long-term stability might focus on organic user growth by writing unique &amp; helpful content.
<ul>
<li>The more return users you have, the more predictable your revenues will be</li>
<li>Reputable content providers get access to better affiliate deals, and private programmatic marketplace opportunities</li>
</ul>
</li>
<li>Someone focused on getting as much money as possible from a user who arrives on their site might:
<ul>
<li>Add more ad units, increasing the clutter of the page
<ul>
<li>Particularly, adding a bunch of placements above the fold guarantees you will have coveted viewable placements that you can charge for.</li>
</ul>
</li>
<li>Paginate the content so that users are always triggering new page loads, and digital ad auctions. Paginated content is also known as &#8220;Slideshow&#8221; content.</li>
<li>Put ad placements on an aggressive refresh schedule.</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>In this article we seek to educate the reader on the lengths these publishers go to milk readers for revenue, but we also want to answer a question that&#8217;s been tumbling around in our heads while writing: <strong>is the activity we are profiling even of concern to advertisers?</strong></p>
<p>Anyone who claims that content quality correlates with campaign performance will be skeptical of arbitrage sites, and has probably already added such sites to their domain block list. However, there are many marketers who still prefer to buy inventory on these sites. After all, the arbitrage sites won&#8217;t make money if their users are found to be fakes; they specialize in bringing real humans to their sites with eye-catching headlines &amp; common-denominator content.</p>
<p>The real question is: <strong>do advertisers get a good value for their ad-dollars when they buy space on these sites?</strong> We&#8217;d love to hear your answer to this after reading.</p>
<h2 id="A_Layman's_Approach_to_Identifying_Arbitrage_Sites"><strong>A Layman&#8217;s Approach to Identifying Arbitrage Sites</strong></h2>
<p>Arbitrage practitioners tend to choose some niche, or vertical, and design their site around that. It&#8217;s important to note that the content on these sites is often non-unique, and shared between any number of other sites. Because of this, the content rarely lines up with what you&#8217;d expect from the name of the site.</p>
<p>Take the site parentingfactor[.]com for instance:</p>
<p><img decoding="async" loading="lazy" class="wp-image-588 size-large aligncenter blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-front-page-1024x668.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="668" srcset="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-front-page-1024x668.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-front-page-300x196.png 300w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-front-page-768x501.png 768w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-front-page.png 1497w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>This is the landing page, and you can see that they are trying to cultivate the image of being a source for family / parenting content. However, the links that are promoted do not necessarily meet that mold.</p>
<p>The top promoted link for this site that we detected in June led users to the following page:</p>
<p><img decoding="async" loading="lazy" class="wp-image-589 size-large aligncenter blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-1024x677.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="677" srcset="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-1024x677.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-300x198.png 300w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-768x508.png 768w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link.png 1501w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>There are quite a few alarming things about this image, but before we deep dive into that, lets reflect on the content a bit.</p>
<p>We grabbed a couple random lines of text from the article above, and plugged them into Google:</p>
<p><img decoding="async" loading="lazy" class="wp-image-590 size-full aligncenter blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-text-search-2.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="641" height="405" srcset="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-text-search-2.png 641w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-text-search-2-300x190.png 300w" sizes="(max-width: 641px) 100vw, 641px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>It seems affluenttimes[.]com has an article sharing the same exact text.</p>
<p><img decoding="async" loading="lazy" class="wp-image-591 size-full aligncenter blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-text-search-e1625180298379.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="678" height="384" srcset="https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-text-search-e1625180298379.png 678w, https://deepsee.io/wp-content/uploads/2021/06/parenting-times-native-link-text-search-e1625180298379-300x170.png 300w" sizes="(max-width: 678px) 100vw, 678px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>It seems magellantimes[.]com ALSO has an article sharing the same exact text.</p>
<p>Lets take a look at the design for those sites:</p>
<p><img decoding="async" loading="lazy" class="wp-image-594 size-large aligncenter blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/affluenttimes-att-girl-article-1024x666.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="666" srcset="https://deepsee.io/wp-content/uploads/2021/06/affluenttimes-att-girl-article-1024x666.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/affluenttimes-att-girl-article-300x195.png 300w, https://deepsee.io/wp-content/uploads/2021/06/affluenttimes-att-girl-article-768x499.png 768w, https://deepsee.io/wp-content/uploads/2021/06/affluenttimes-att-girl-article.png 1497w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p><img decoding="async" loading="lazy" class="wp-image-595 size-large aligncenter blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/magellan-times-att-girl-article-1024x667.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="667" srcset="https://deepsee.io/wp-content/uploads/2021/06/magellan-times-att-girl-article-1024x667.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/magellan-times-att-girl-article-300x195.png 300w, https://deepsee.io/wp-content/uploads/2021/06/magellan-times-att-girl-article-768x500.png 768w, https://deepsee.io/wp-content/uploads/2021/06/magellan-times-att-girl-article.png 1513w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>As we can see, the sites header has the exact same font. It may be too small for you to see, but the articles even share the same author: David Rule.</p>
<p>Confirming what we can tell for ourselves based on the design &amp; authorship, the footer of these sites shows they all share the same corporate owner.</p>
<p>Arbitrage sites tend to be created &amp; managed this way; the owners either diversify their portfolio by creating site templates for a variety of verticals (which really share a lot of content between each other), or they specialize in a certain vertical that gets a lot of attention, like: pets, fitness, finance, pop-culture, etc&#8230;</p>
<h2></h2>
<h2 id="How_Does_One_End_Up_Visiting_An_Arbitrage_Site?"><strong>How Does One End Up Visiting An Arbitrage Site?</strong></h2>
<p>Unless you just woke up with a burning desire to find out why Lily from the AT&amp;T ads is causing a stir, chances are you clicked an ad in a sponsored content box, or on a social media platform.</p>
<p>It used to be that arbitrage experts would accomplish the whole feat using Google products. In the before times, one could cost-effectively buy traffic using the AdWords PPC advertising platform, and direct users to your blog monetized with AdSense. These days, not so much. The <a href="https://support.google.com/adspolicy/answer/6368661?hl=en" target="_blank" rel="noopener">GoogleAds destination requirements</a> state that &#8220;Destination content that is designed for the primary purpose of showing ads&#8221; is forbidden.</p>
<p>ibmjango, another user of BlackHatWorld describes the process fairly succinctly:</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-613 size-full blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/ibmjango-explains.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1039" height="681" srcset="https://deepsee.io/wp-content/uploads/2021/06/ibmjango-explains.png 1039w, https://deepsee.io/wp-content/uploads/2021/06/ibmjango-explains-300x197.png 300w, https://deepsee.io/wp-content/uploads/2021/06/ibmjango-explains-1024x671.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/ibmjango-explains-768x503.png 768w" sizes="(max-width: 1039px) 100vw, 1039px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>There are some key statements here that are relevant to the rest of the article that we want to reiterate:</p>
<div class="blog-quote">
<h2>&#8220;most of people are buying native traffic for adsense website[s,] as it is profitable process. you can use networks like taboola, outbrain, even bing ads if you know how to buy cheap traffic on these networks&#8221;</h2>
</div>
<p>This will be confirmed later based on our own research, and insights from Similarweb.</p>
<div class="blog-quote">
<h2>&#8220;you cant make profit with adsense only, you need to put more ad network ads[;] just visit any news site in your country, you will see how many sidebar or content ads they are posting along with adsense&#8221;</h2>
</div>
<p>The sites we are exploring today are being monetized by many different platforms/networks. This isn&#8217;t a small-potatoes blog site; we&#8217;re talking billions of avails per-day across hundreds of well-ranked arbitrage sites.</p>
<div class="blog-quote">
<h2>&#8220;slide show content generate more revenue for arbitrage so if you are really going for this, create any engaging slide show content, then run FB traffic, thats how people still do it.&#8221;</h2>
</div>
<p>Having a content format that exploits every available inch of space for advertising is key to bringing up revenue-per-page. This strategy is tried and true, but off-putting to SSPs (Supply-Side Platforms; publishers apply to these platforms to get access to programmatic demand).</p>
<p>It&#8217;s common for these sites to have articles on the front page that have a single-page format, and few ad placements. This gives the illusion of quality, while the most clicked links are actually stuffed to the gills with ad units that refresh aggressively (more on this later).</p>
<p>Let&#8217;s test these claims with the example family of sites we have identified: parentingfactor[.]com, magellantimes[.]com, and affluenttimes[.]com; there&#8217;s a couple ways we can go about determining how users are ending up there.</p>
<h3></h3>
<h3 id="Checking_the_Inbound_Link_Profile">Checking the Inbound Link Profile</h3>
<p>Using data from our crawlers, we checked for any sites that loaded a link to these 3 domains, and checked if those links were from paid sponsored content placements belonging to Revcontent, Taboola, or Outbrain:</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-621 blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/Inbound-Link-Domains-Minimum-June-2021-Sites-With-Paid-Native-Ads-Links-and-Native-Ads-Rate1.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1096" height="163" srcset="https://deepsee.io/wp-content/uploads/2021/06/Inbound-Link-Domains-Minimum-June-2021-Sites-With-Paid-Native-Ads-Links-and-Native-Ads-Rate1.png 713w, https://deepsee.io/wp-content/uploads/2021/06/Inbound-Link-Domains-Minimum-June-2021-Sites-With-Paid-Native-Ads-Links-and-Native-Ads-Rate1-300x45.png 300w" sizes="(max-width: 1096px) 100vw, 1096px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>We can see that there are hundreds of inbound links for each site that we&#8217;ve discovered in just the past month, but almost 100% of them are from sponsored content boxes. That is to say, they have no real reputational authority.</p>
<p>Another way to go about this would be to analyze the referral values for all the users visiting these sites.</p>
<h3 id="Analyzing_Inbound_Traffic_Channels">Analyzing Inbound Traffic Channels</h3>
<p>Since we don&#8217;t base our insights on user-generated data at DeepSee, we sometimes turn to Similarweb for second opinions on these matters.</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-623 size-large blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/parenting-factor-traffix-mix-similar-1024x437.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="437" srcset="https://deepsee.io/wp-content/uploads/2021/06/parenting-factor-traffix-mix-similar-1024x437.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/parenting-factor-traffix-mix-similar-300x128.png 300w, https://deepsee.io/wp-content/uploads/2021/06/parenting-factor-traffix-mix-similar-768x328.png 768w, https://deepsee.io/wp-content/uploads/2021/06/parenting-factor-traffix-mix-similar.png 1109w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>Echoing our own analysis, Similarweb shows the vast majority of parentingfactor[.]com&#8217;s traffic comes from display ads. Let&#8217;s see how it looks for the other two as well:</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-624 size-large blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/affluent-times-traffix-mix-similar-1024x411.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="411" srcset="https://deepsee.io/wp-content/uploads/2021/06/affluent-times-traffix-mix-similar-1024x411.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/affluent-times-traffix-mix-similar-300x120.png 300w, https://deepsee.io/wp-content/uploads/2021/06/affluent-times-traffix-mix-similar-768x308.png 768w, https://deepsee.io/wp-content/uploads/2021/06/affluent-times-traffix-mix-similar.png 1118w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-625 size-large blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/magellan-times-traffix-mix-similar-1024x435.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="435" srcset="https://deepsee.io/wp-content/uploads/2021/06/magellan-times-traffix-mix-similar-1024x435.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/magellan-times-traffix-mix-similar-300x128.png 300w, https://deepsee.io/wp-content/uploads/2021/06/magellan-times-traffix-mix-similar-768x327.png 768w, https://deepsee.io/wp-content/uploads/2021/06/magellan-times-traffix-mix-similar.png 1117w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>Further solidifying the point that ibmjango of BlackHatWorld made, we can see that all 3 sites rely heavily on paid display, and social traffic. Similarweb doesn&#8217;t separate out paid social from organic in this chart, but judging by the extremely high rate of paid display, it&#8217;s not a far leap to assume that the social is paid as well.</p>
<p><strong>This is extremely relevant, because our research shows that many arbitrage sites behave <em>VERY</em> differently when accessed via paid link vs directly.</strong></p>
<h2></h2>
<h2 id="Two_Tales_of_One_Website"><strong>Two Tales of One Website</strong></h2>
<p><figure id="attachment_634" aria-describedby="caption-attachment-634" style="width: 1920px" class="wp-caption alignnone"><img decoding="async" loading="lazy" class="wp-image-634 blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1.png" alt="The same site visited directly vs visited by paid link" width="1920" height="905" srcset="https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1.png 5176w, https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1-300x141.png 300w, https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1-1024x483.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1-768x362.png 768w, https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1-1536x724.png 1536w, https://deepsee.io/wp-content/uploads/2021/06/worryboutit-1-2048x965.png 2048w" sizes="(max-width: 1920px) 100vw, 1920px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"><figcaption id="caption-attachment-634" class="wp-caption-text">The same site visited directly vs visited by paid link</figcaption></figure></p>
<p>Now we get to the core of the &#8220;Misleading Content Formats&#8221; issue that we warn about in the title. Believe it or not, the image above shows the same page on the same site; the image on the left is what you&#8217;d see if you visited directly, and the one on the right is how it looks when visiting via paid link.</p>
<p>On the left we see zero ads above the fold, on the right, there are 6 display ads plainly visible, with an out-stream video placement out of view of my screenshot. It&#8217;s not clearly visible in the image, but the paid page is actually a slideshow (compared to the direct link, which is single-page format).</p>
<p>Let&#8217;s recall something we mentioned in the intro:</p>
<ul>
<li><em>Someone focused on getting as much money from a user as soon as they arrive on their site might:</em>
<ul>
<li><em><em>Add more ad units, increasing the clutter of the page </em></em>
<ul>
<li><em>Particularly, adding a bunch of placements above the fold guarantees you will have coveted viewable placements that you can charge for.</em></li>
</ul>
</li>
<li><em>Paginate the content so that users are always triggering new page loads, and digital ad auctions. Paginated content is also known as &#8220;Slideshow&#8221; content. </em></li>
<li><em>Put ad placements on an aggressive refresh schedule. </em></li>
</ul>
</li>
</ul>
<p>We know the paid version of the site has way more ad units, and paginated content, but it&#8217;s not clear from the image if there is an aggressive refresh schedule. We can clear that up with the following video, which shows the different ways this site presents content:</p>
<p><iframe loading="lazy" title="YouTube video player" src="https://www.youtube.com/embed/YBz92r1JsjU" width="100%" height="432" frameborder="0" allowfullscreen="allowfullscreen"><span style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" data-mce-type="bookmark" class="mce_SELRES_start">﻿</span><span style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" data-mce-type="bookmark" class="mce_SELRES_start">﻿</span></iframe></p>
<p>To summarize:</p>
<ul>
<li>We saw ~30 second refresh timing on the slideshow ad placements.
<ul>
<li>Not particularly aggressive, but still, refresh on slideshows seems entirely unnecessary
<ul>
<li>Slideshow readers are expected to click to a page every 30 secs to 1 minute, and adding refresh on top of that is just another way to milk ~10 ad loads out of a user in between slide clicks.</li>
</ul>
</li>
<li>This isn&#8217;t captured in the video, but out-of-view placements did not refresh on the single-page style version of the site.</li>
</ul>
</li>
<li>The single page format had 1-2 ads per content block, while the slideshow format has 10+ placements per content block</li>
</ul>
<p>This video shows a single site, but the pattern is typical of SO many more sites.</p>
<h3></h3>
<h3 id="Why_Two_Formats?">Why Show Two Formats? Why Not Just Embrace the Slideshow?</h3>
<p>This is a tough question to answer, and we&#8217;re going to venture into the realm of educated guesswork here while trying to answer.</p>
<p>There are certainly many sites which are upfront about their slideshow content, and manage to monetize it to some extent. Why hide the format that your content predominantly appears in?</p>
<p>Think back to what our spirit guide, ibmjango, says:</p>
<div class="blog-quote">
<h2>&#8220;you cant make profit with adsense only, you need to put more ad network ads&#8221;</h2>
</div>
<p>As we mentioned earlier, slideshow content can be off-putting to SSPs, because their reputation is staked on the quality of their publisher network. Having content that makes SSPs uneasy limits the amount of demand you can get access to. For modern arbitrage operations to work, they need access to multiple SSPs / ad-networks, and perhaps this is the reason we find so many sites with a disconnect between direct &amp; paid visits.</p>
<p>As part of the onboarding process at each SSP, there is likely to be a human reviewing the sites of each publisher who applies. While each SSP has different publisher requirements, it hardly seems a stretch to imagine that reputable SSPs don&#8217;t want sites with slideshows that are 80% ads / 20% content.</p>
<p>So, in order to protect themselves, the arbitrage specialists design sites in such a way that advertising analysts who click around their home page wouldn&#8217;t find anything objectionable.</p>
<p>Once they do make it into a reputable ad-network, that&#8217;s when the problems begin. We spoke off the record to someone responsible for ensuring publisher quality at a major SSP, and they told us &#8220;that&#8217;s the problem we have; it&#8217;s not straight fraud, and there&#8217;s a huge demand for this inventory.&#8221;</p>
<p>This is confirmed by a contact of ours on the demand side, who described the inventory as &#8220;crack for advertisers&#8221; due to the high viewability, and the high availability of inventory (due to the inflated placement count per-page).</p>
<p>Thus back to our initial question: <strong>do advertisers get a good value for their ad-dollars when they buy space on these sites?</strong></p>
<p>One might venture a guess to say these ones did not:</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-660 size-large blog-img-wrap bordered" src="https://deepsee.io/wp-content/uploads/2021/06/mortgage-fanatic-cray-1024x618.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="1024" height="618" srcset="https://deepsee.io/wp-content/uploads/2021/06/mortgage-fanatic-cray-1024x618.png 1024w, https://deepsee.io/wp-content/uploads/2021/06/mortgage-fanatic-cray-300x181.png 300w, https://deepsee.io/wp-content/uploads/2021/06/mortgage-fanatic-cray-768x463.png 768w, https://deepsee.io/wp-content/uploads/2021/06/mortgage-fanatic-cray-1536x926.png 1536w, https://deepsee.io/wp-content/uploads/2021/06/mortgage-fanatic-cray.png 1633w" sizes="(max-width: 1024px) 100vw, 1024px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<h2></h2>
<h2 id="Scaling_Our_Insights_With_Data"><strong>Scaling Our Insights With Data</strong></h2>
<p>The following research is based on analysis of 111 domains with Misleading Content Formats we discovered during the last week in June (available <a href="https://docs.google.com/spreadsheets/d/1kTfSj6N21IXXq3TWYV7VFMlU4ovNHm2jzf1h8xfExcE/edit?usp=sharing" target="_blank" rel="noopener">here</a>). To arrive at this sample, we visited thousands of sites who had high representation in sponsored content boxes, and <strong>compared how they performed based on if the visit was direct vs paid</strong>. This comparison included a lot of manual verification while we built the capability for automated detection.</p>
<p>While many sites buy most of their visitors from sponsored content boxes, it is unique for a site to completely transform their layout, and this is the activity we highlight in this study. That&#8217;s how we ultimately arrived at this sample; these sites, of all the sites buying visitors for arbitrage purposes, showed anomalous behavior when visited by direct visit vs sponsored content box.</p>
<p>You may notice the list of publishers we provide is over 111 items; this is because we discovered additional sites in the same publishing groups as Misleading Content Format sites by carefully analyzing ads.txt &amp; sellers.json records. In order to give you maximum protection, we proactively flag these sites.</p>
<p>For the purpose of this article, we compared three things:</p>
<ol>
<li>The activity level of the page; this is approximated by the count of document, script, and XHR requests intercepted.</li>
<li>Ad units above the fold</li>
<li>Ad units on page</li>
</ol>
<p>Many thanks to data scientist <a href="https://www.linkedin.com/in/edkrueger/" target="_blank" rel="noopener">Edward Krueger</a>, who assisted us with visualizations and statistical analysis for this research.</p>
<h3 id="page_activity_levels">Page Activity Levels &#8211; Direct Link vs Paid</h3>
<p>This chart shows how much more busy these sites are when accessed from the sponsored content box vs loaded directly.</p>
<p><figure id="attachment_758" aria-describedby="caption-attachment-758" style="width: 880px" class="wp-caption aligncenter"><img decoding="async" loading="lazy" class="blog-img-wrap wp-image-758 size-full bordered" src="https://deepsee.io/wp-content/uploads/2021/07/Avg-Events-Per-Page.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="880" height="526" srcset="https://deepsee.io/wp-content/uploads/2021/07/Avg-Events-Per-Page.png 880w, https://deepsee.io/wp-content/uploads/2021/07/Avg-Events-Per-Page-300x179.png 300w, https://deepsee.io/wp-content/uploads/2021/07/Avg-Events-Per-Page-768x459.png 768w" sizes="(max-width: 880px) 100vw, 880px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"><figcaption id="caption-attachment-758" class="wp-caption-text">For a one-tailed (greater) paired t-test (N = 111) the difference between the number of events paid (M = 472.90, SD = 343.86) and number of events direct (M = 228.39, SD = 134.89) was found to be statistically significant (p &lt; 0.001).</figcaption></figure></p>
<p>Simple but effective, the takeaway is pretty clear from this bar chart: we can see that paid visits are over twice as active on average!</p>
<h3 id="ads_above_fold">Ads Units Above The Fold &#8211; Direct Link vs Paid</h3>
<p>Ad units that are &#8220;above the fold&#8221; are visible right when you land on a page. These are coveted, because they tend to be marked viewable by the measurement tech most advertisers employ, and this gives advertisers more confidence the ads were seen.</p>
<p><figure id="attachment_757" aria-describedby="caption-attachment-757" style="width: 868px" class="wp-caption aligncenter"><img decoding="async" loading="lazy" class="blog-img-wrap wp-image-757 size-full bordered" src="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Pctls.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="868" height="525" srcset="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Pctls.png 868w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Pctls-300x181.png 300w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Pctls-768x465.png 768w" sizes="(max-width: 868px) 100vw, 868px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"><figcaption id="caption-attachment-757" class="wp-caption-text">For a one-tailed (greater) paired t-test (N = 111) the difference between the the average number of ads above the fold paid (M = 3.62, SD = 2.66) and the average number of ads above the fold direct (M = 0.66, SD = 1.04) was found to be statistically significant (p &lt; 0.001).</figcaption></figure></p>
<p>Right away, we can see there is a stark difference in the median number of ad placements above the fold here. Remember, we are talking about the <em>same exact sites</em> <em>&amp; pages</em> here; the only difference is how the user arrives.</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-755 size-full bordered" src="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Violin.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="868" height="525" srcset="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Violin.png 868w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Violin-300x181.png 300w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-ATF-Violin-768x465.png 768w" sizes="(max-width: 868px) 100vw, 868px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>This violin plot is another way to visualize the typical number of ads above the fold for direct vs paid visitors. The fat bottom of the &#8220;Direct&#8221; plot shows that many sites in this group never display ads above the fold. It&#8217;s clear that much of the &#8220;Paid&#8221; figure&#8217;s area lies at 2 or above, with a big bulge around 4 ad units. These are completely different shapes.</p>
<h3 id="total_ad_frames">Ads Units on Page &#8211; Direct Link vs Paid</h3>
<p>Similar to the last chart, this one shows the difference in the total number of ad units encountered per-page.</p>
<p><figure id="attachment_756" aria-describedby="caption-attachment-756" style="width: 871px" class="wp-caption aligncenter"><img decoding="async" loading="lazy" class="blog-img-wrap wp-image-756 size-full bordered" src="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Pctls.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="871" height="526" srcset="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Pctls.png 871w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Pctls-300x181.png 300w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Pctls-768x464.png 768w" sizes="(max-width: 871px) 100vw, 871px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"><figcaption id="caption-attachment-756" class="wp-caption-text">For a one-tailed (greater) paired t-test (N = 111) the difference between the average number of ads document paid (M = 6.54, SD = 3.41) and the average number of ads document direct (M = 4.27, SD = 4.74) was found to be statistically significant (p &lt; 0.001).</figcaption></figure></p>
<p>The maximum figure here is especially interesting, because it shows the difference in limitations between single page and slideshow design formats. The single page format can go on forever hypothetically, and so the upward bounds of ad units per page are higher. On a slide, where most of the content is above the fold, there&#8217;s only so many ad units you can cram into the page before it fills up.</p>
<p><img decoding="async" loading="lazy" class="aligncenter wp-image-754 size-full bordered" src="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Violin.png" alt="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats" width="871" height="526" srcset="https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Violin.png 871w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Violin-300x181.png 300w, https://deepsee.io/wp-content/uploads/2021/07/Ad-Units-on-Page-Violin-768x464.png 768w" sizes="(max-width: 871px) 100vw, 871px" title="Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats"></p>
<p>While the differences visible in the violin chart here are not quite as stark as the same visualization for &#8220;above the fold&#8221; ad units, there is a clear takeaway: the paid visits result in many more ad units loading on the page. The higher maximum value in the &#8220;Direct&#8221; plot shows that there are exceptions to the previous statement, but they appear as outliers.</p>
<h2></h2>
<h2 id="Conclusion"><strong>Conclusion</strong></h2>
<p>Publishers with Misleading Content Formats straddle the line between valid &amp; invalid activity. Unlike <a href="https://support.google.com/adspolicy/answer/6368661?hl=en" target="_blank" rel="noopener">GoogleAds</a>, sending visitors to &#8220;[d]estination content that is designed for the primary purpose of showing ads&#8221; is not forbidden by the tech companies providing these sponsored content boxes.</p>
<p>In this situation, any violations that exist likely occur between the publisher and the SSPs / ad-networks they are a part of. Such organizations may have specific publisher content requirements that forbid the behaviors displayed by the hidden, hyper-active, versions of these sites.</p>
<p>Now, having made it to the end of the article, <strong>do you think advertisers get a good value for their ad-dollars when they buy space on these sites?</strong></p>
<p>Is there something we&#8217;re missing here? Feel we are off base? Please keep the conversation going, and drop us a line on <a href="https://twitter.com/deepsee_io" target="_blank" rel="noopener">twitter</a> @deepsee_io, or on <a href="https://www.linkedin.com/company/deepseeio" target="_blank" rel="noopener">Linkedin</a>!</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
