Peter Elst

Flash Platform Consultant
  • Home
  • About me
  • Articles
  • Downloads
  • Contact me

Update on SWF Indexing Issues

6 07 2008

After Tuesday’s announcement about SWF now getting fully indexed I thought I’d do a little experiment and put up a few test SWF files.

Its difficult to accurately deduct what exactly is happening but thought I’d write down what I’ve tried and what the results are thus far.

 

What did I use?

I created a Flash 9 SWF exported from Flash CS3, added some component instances in a variety of ways, added an input text field and set up a function that triggers a PHP script on my server and subsequently sends me an email with the value of the input text field.

Embed methods

Embedded SWF with object/embed tag, SWFObject and the standard publish from Flash CS3 (i.e. AC_FL_RunContent).

Result: all three SWF files were getting hits from Google searchbot and triggered an email to be sent, no arguments were being sent to the script either through POST or GET.

What is getting indexed?

Manually added a Button component instance on Stage, programmatically added one to the DisplayList, instantiated one but didn’t add it to the DisplayList and added one to the DisplayList outside of the Stage bounds so not visible to the user.

Result: of these four only two got a MouseEvent.CLICK triggered, the one manually added and the one programmatically added within the visual bounds of the Stage.

Trace statements

Added some trace statements throughout the code to see if those would get picked up.

Result: trace statements do not appear to be getting indexed.

 

Preliminary conclusion

I didn’t have a lot of static text and no dynamically loaded text to be indexed in my test SWF. I’m working on an updated version of the test SWF to put up and look into what exactly is happening with that, see what and how it gets indexed.

This morning I got a comment on my previous blog post by my brother Kristof saying he noticed Google was now indexing URLs to photographs and music files from his band he referenced from his Flash content. From what I can see what Google has done there is follow a reference to an XML file and indexed that file containing the URLs.

This is what Google says: “We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource..”

http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html

 
Why? Adobe, please tell me why this is a good thing and how this would help SEO of Flash content. It makes no sense whatsoever to index calls to .xml files and server-side scripts referenced from an SWF and link to those URLs.

Just to make this clear, if you do a filetype:swf search in Google no dynamically loaded data will show up. What happens instead is the URLs you use in your SWF get crawled separately. You’ll increasing start seeing .xml, .php etc. file show up in the search results that are used in your SWF but do not link to your SWF file that uses it but that .xml, .php, etc. file itself.

In short:

- Google follows URLRequest links, indexes XML and other referenced files in your SWF that return text (though not in context of the SWF, i.e. links to those URLs directly rather than the SWF that uses it)
- Only instances added to the DisplayList and visual on stage or getting triggered
- Using URLVariables, no values seem to get sent along with the URLRequest

 

Remaining issues

These are two things I’ve seen happen that could be troublesome:

- URLs to files loaded in from SWF content are getting exposed in search results (and not in reference to the SWF that uses it)
- Server-side scripts referenced in the SWF are getting hits from search bots, potentially causing unwanted behavior.

 
I really want to see Adobe, Google and Yahoo! urgently come out with additional information for developers on how to prevent unwanted files getting indexed, how the indexing works for the various search engines and how they individually handle things like follow URLRequests etc.

 

« Linux Flash Player 10 Ryan Stewart’s Flex SEO Contest »



Actions

  • rss Comments rss
  • trackback Trackback

Informations

  • Date : 6 July 2008
  • Categories : General
On FriendFeed, this post was liked by 0 people and commented on 0 times show
View this post on FriendFeed

Add a comment on FriendFeed




Logged in as [logout]

23 responses to “Update on SWF Indexing Issues”

6 07 2008
Savvas Malamas (12:18:59) :

It looks so messed up..
Thanks for your efforts

6 07 2008
SEO en Flash y el miedo de que el contenido sea público | eleZeta - Lucas Zallio (17:16:57) :

[...] un par de artículos muy interesantes al respecto y vale la pena leer, el primero en el blog de Peter Elst y el otro en el blog de Aral [...]

6 07 2008
John Dowdell (22:42:29) :

Howdy Peter, thanks for the tests.

“Why? Adobe, please tell me why this is a good thing and how this would help SEO of Flash content.”

I think this is in Google’s realm. Adobe created a “headless Player”, presumably with some automation and logging APIs. But the spidering policy itself, like the databasing and ranking aspects, is something that each search engine itself determines.

There’s still a lot I don’t understand about the Google implementation (why worry about “copyright” text, eg?). This decision on dynamic files and data seems a tricky one — Ajax apps would face similar questions — I’d defer to Google on this one.

jd/adobe

6 07 2008
Peter (23:34:06) :

Thanks for your thoughts John, I agree — the issue here is essentially with Google, wish Adobe had worked with them to publish some documentation before having this new indexing behavior going live.

Google did a blog post on Tuesday so I assume the Flash Player team already knew how things were going to be handled.

The issue of Google indexing URLs and linking to that instead of the SWF (even if that is just temporary) is pretty serious and think its fair to say that as a result there is not much if any improvement in the ability to do SEO for Flash content at this point.

Is there any resource where we can flag possible issues at Google? Would guess its out of the hands of Adobe at this point.

Hope this issue doesn’t repeat itself with Yahoo! with possible different indexing behavior to have to deal with.

6 07 2008
Erki Esken (23:59:13) :

First, thanks for doing this research. We’re planning on doing some SWF indexing tests ourselves as well so it’s good to know what has already been tried.

And your remark on “…come out with additional information for developers on how to prevent unwanted files getting indexed…”, well there’s standard robots.txt file that all search engine spiders adhere to. Use robots.txt to disallow stuff spiders shouldn’t touch.

7 07 2008
Peter (00:15:05) :

Thanks Erki, I wanted to raise the issue of robots.txt as well — if you have an SWF that references XML or scripts on a different domain you will essentially need to exclude your SWF from being indexed if you don’t want those scripts to show up.

That means it won’t do the old static text indexing behavior either and that Flash content won’t show up at all in search results.

7 07 2008
John Dowdell (01:49:12) :

“Google did a blog post on Tuesday so I assume the Flash Player team already knew how things were going to be handled.”

That wasn’t the impression I got the day before the announcement, although I imagine someone at Adobe saw a draft of the Google description before it was later released.

The “deep-linking” aspect is still too complex for me to have an opinion on… I’d do better with looking at specific searches, figuring how people actually try to find applications, and whether they want to be dumped to a specific state within that application or whether they just need to find that application itself. More clients worry about SEO for discoverability, I’d wager, rather than for data-extraction. But there are many different SEO priorities within different constituencies — the meaning of the word “seo” varies with the ear which hears it. Difficult area.

(The most interesting angle for me in this whole thing is actually about the Headless Player, its automation, and what we might be able to do with such capabilities in the future.)

jd/adobe

7 07 2008
Aran (02:08:21) :

Peter.

Thanks for your investigations. Your article is the 1st decent write up I have seen with actual results.

I have asked the Adobe engineers on a certain beta program to comment on any specifics they can give about what is/isn’t possible with the player (e.g. can the player index swf content embedded with js libraries such as swfobject). I am hoping for at least a bit of enlightenment…

7 07 2008
Evan Mullins (16:48:08) :

Thanks! This is the first place I’ve actually seen someone trying to diagnose what Google is doing, rather than just complain about lazy “security by obscurity” flash dev…

I’m really interested in your testing results and hoping if Google doesn’t ever come out and tell us the how or why about what they’re doing, with more tests and community efforts we can create some guidelines for designing swf to receive correct indexing.

Hoping as well that Google realizes that just giving out our back end xml and php files they’re not really giving better data in searches, but out of context code that will not really help users. (except of course those who want to sniff around for files and weren’t informed enough to be using existing tools already)

Please keep us informed on your research.

7 07 2008
Ryan Stewart (19:09:13) :

Hey Peter, great post. I’m also curious about robots.txt. Specifically in your brother’s case, I wonder if he could modify the robots.txt file so it can’t look in his music directory or the XML files. I wonder if that would keep them from being indexed.

=Ryan
rstewart@adobe.com

7 07 2008
Ryan Stewart - Rich Internet Application Mountaineer » Blog Archive » Announcing the Flex SEO Contest (19:37:31) :

[...] Elst has a really good post up (with some concerns) about SWF Indexing in some of his Flash files. And it’s clear that a lot is changing and we’re not sure what’s going to be [...]

7 07 2008
Benny (20:52:09) :

This is what Google says: “We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource..”

Why? Adobe, please tell me why this is a good thing and how this would help SEO of Flash content.

Well it’s good thing if your flash site maintains state (deep linking). If every dynamically loaded php file would be seen as part of the main movie then we would loose the state benefits.

As I see things Google & Adobe are on the right track, now we - as developers/designers - have to do our bit:

1. (If needed then) prevent the indexing of XML/images etc by Robots.txt
2. Wrap the xml files in some server side scripting page like php and call that form our movie
3. Next we always should implement deep linking (and of course flash detection).
4. Help Google and other SE’s a hand by providing a sitemap.xml

Now I guess there will be a (small) number of sites that don’t have (or don’t want to use) the availability of server side scripting. I think they should still implement 1,3 and 4. Instead of step 2 they could still do SEO the ‘old’ way by using something like SWFObject (although I would prefer a better solution, see following proposal).

Maybe if the SE’s would process the link info delivered by the SE-FlashPlayer as proposed next, then we all could be happy:

1) The SE-Flash player reports a XML file being loaded in.
2) The SE checks if that XML is excluded for direct access in robots.txt
\-> if so then index the content as be being an integral part of the loading swf
\-> if not then index the linked page separately

7 07 2008
Benny (21:22:59) :

I was just thinking of another option to get a bit control over what should and what shouldn’t be indexed. In HTML - in addition to robots.txt - we have the robots meta tag if we would have an equivalent of that in Flash/Flex that would be passed on or followed by the special Search Engine (Proxy) Flash Player then we would have much more control. We could even tell the Proxy Player to report the loaded data as internal data, e.g. “index_internal”, besides to already standard “noindex”, “nofollow”, etc.

Where to add the robots info? I think there are several options, e.g. in AS3 we could extend URLReqeust with a property “robots:Array” or we could add it to URLRequest.data or maybe as a URLRequest.requestHeader, …

?

8 07 2008
Gerd Kamp (07:52:41) :

We are also in urgent need of some place to put robots info / crawling restrictions.

At work we make use of remote calls to xml files e.g. for our live event tickers . In addition to our reporting we also include near-time stats and position data. This data is licensed to us under the condition that it is not (easily) publicly available.

The application and the data is then pushed to our customers sites (e.g. around 100 newspapers, publishers and portals for EURO2008). Since these sites are not ours we do not have access to the robots.txt files. Using META tags is also not an option since we are using XML files.

Switching over to serving the data files only from servers that are under our control is (currently) not an option.

So where do we put the robots infos? Google/Adobe? I guess there are more having the exact same problems. Some clarification would definitely help

All this would not be an issue if instead of indexing the xml file etc. separately. Google would use the information in order to link back to the originating swf file.

8 07 2008
A Thought About Flash Indexing « Dead Ink Vinyl (19:15:37) :

[...] has announced the Flex SEO Contest to encourage the community to establish best practices for Flash/Flex indexing. The rules are straightforward and the top prize is CS4 Master Collection. I like the challenge and [...]

11 07 2008
savvas.malamas » Blog Archive » Saving {my} twitter.. (21:35:16) :

[...] Just because I use it everyday it is obvious that sometimes I tweet things that I might don’t want to be available to the public timeline (Flex doesn’t have a timeline) and especially to search engines (Google please leave my twitter account and try to use Adobe’s help for swf indexing). [...]

12 07 2008
More on the Flex SEO contest | switch for SWFObject to Object/Embed tags | zedia flash blog (03:40:09) :

[...] around the fact the Google would not reference website that use javascript to embed the flash file. Peter Elst seems to think the contrary but I think it is worth a try. I was previously using SWFObject to [...]

12 07 2008
Hans Van de Velde (22:56:03) :

When you say that Google will start indexing XML files, I wonder about how Google will be able to interpret the text. For example, what I experienced is that a title-tag or h1,h2,h3,… tags have more weight when your content gets indexed and ul/ol-tags are simply lists of thing.s But when you don’t use HTML, there is no more standard meaning in your content (!)

To prove my point, I also have a funny (quite nicely working) example/experiment I’d like to share here. For one particular website, I used XHTML pages to put my content in (and no XML files as usual). Surprisingly, parsing the XHTML was easier that I thought.
Result: the whole site is fully indexed by Google (yippie !).
You can check this example by typing “site:www.alternativ.be” in Google (even all the images have been indexed too).

When structuring dynamic content it’s important to have tags that are semantically correct. For example, a title in your Flash site should be treated differently than body-text and so on, and so on…

Side note: the Alternativ website itself was built in ActionScript 2 and the content management system with Flex 2 with ActionScript 3 and I must say that with E4X, extracting content from any XHTML page is a breeze.

14 07 2008
Flex SEO Contest - my two first entries | Cyril Hanquez (15:05:07) :

[...] the comments and concerns regarding SEOing Flash and Flex files, Ryan Stewart decided to launch a contest to [...]

16 07 2008
Google Indexes SWFs and external content | Fleximagically Searchable | Ryan Stewart's Flex SEO Contest | Circle Cube Studio (01:55:21) :

[...] information: I’ve found that’s helpful at Peter Elst’s post. And Ryan explains Google and Flash’s relatoinship development here. Here and here is what [...]

18 07 2008
Mitchell Thomas (01:23:34) :

Nice article Peter. I wonder what is the use of indexing a swf file that is dependent upon the data (i.e. flashvars) that get passed to it? I’m specifically talking about flash video players and widgets here - but even a flash website that is programmed to respond to deep linking, could potentially have the same problem. It seems that there are a lot of swf files out there that are not very useful outside of the context of the HTML page they are embedded in. And even if the swf can stand by itself one of the reasons we embed them in HTML pages is so the swf can be viewed at the size it was designed to be viewed at. I realize you can set the scaleMode to noScale in actionscript - but what if your application requires scaling?

As for your tests - have you tried filling in the title/description in the Document properties so see how that affects indexing? The title and description are part of the compiled swf, so they should be readable by the Googlebot.

4 08 2008
Like A Fish Needs A Bicycle: RIA and SEO | Visualrinse | Design and Development by Chad Udell (05:50:23) :

[...] and searchable. Much written, with no real concrete answers. Lots of tests (A very good one done by Peter Elst), a contest held by Ryan Stewart (Flexmagically Searchable, FTW!), and some well formed opinions [...]

18 08 2008
Like A Fish Needs A Bicycle: RIA and SEO | Iona.LABS (05:07:29) :

[...] and searchable. Much written, with no real concrete answers. Lots of tests (A very good one done by Peter Elst), a contest held by Ryan Stewart (Flexmagically Searchable, FTW!), and some well formed opinions [...]

Leave a comment

You can use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>





Conferences

Flash on the Beach Speaker

Training Partners

multimediacollege

SkillsMatter

Badges

Alltop, confirmation that I kick ass

  • Categories

    • AIR RSS category feed
    • Ajax RSS category feed
    • CSS RSS category feed
    • Eclipse RSS category feed
    • Events RSS category feed
    • Flash RSS category feed
    • Flash Lite RSS category feed
    • Flex RSS category feed
    • Gadgets RSS category feed
    • General RSS category feed
    • JSFL RSS category feed
    • Mac RSS category feed
    • Open Source RSS category feed
    • PHP RSS category feed
    • Podcasts RSS category feed
    • Publications RSS category feed
    • Rants RSS category feed
    • Reviews RSS category feed
    • Thought of the Day RSS category feed
    • Training RSS category feed
    • Twitter RSS category feed
    • Video RSS category feed
  • Resources

    • Stefan Richter
    • André Michelle
    • Edwin van Rijkom
    • Sas Jacobs
    • Mario Klingemann
    • Veronique Brossier
    • Sam Robbins
    • Darron Schall
    • Matthew David
    • Keith Peters
    • Richard Leggett
    • Marco Casario
    • LordAlex Leon
    • Matt Voerman
    • Peter Hall
    • Lee Brimelow
    • Scott Barnes
    • Ralf Bokelberg
    • Peldi Guilizzoni
    • Owen van Dijk

     
     

    Adobe Community Expert

    See my profile on LinkedIn



    Harz Ferienwohnung Suchmaschinenoptimierung Geschenkideen Harz Ferienwohnung Pagerank Webkatalog Webhosting