<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!-- generator="wordpress/2.0.4" --><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>HOMEPAGE [GREY HAT SEO BLOG]</title>
	<link>http://en.kerouac3001.com</link>
	<description />
	<pubDate>Tue, 01 May 2007 14:30:01 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.4</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/GreyHatSeoBlog" /><feedburner:info uri="greyhatseoblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Keyword Difficulty Tool</title>
		<link>http://feedproxy.google.com/~r/GreyHatSeoBlog/~3/_qcIR8iyvTc/keyword-difficulty-tool-11.htm</link>
		<comments>http://en.kerouac3001.com/keyword-difficulty-tool-11.htm#comments</comments>
		<pubDate>Wed, 25 Apr 2007 19:06:34 +0000</pubDate>
		<dc:creator>kerouac3001</dc:creator>
		
	<category>Tools</category>
		<guid isPermaLink="false">http://en.kerouac3001.com/keyword-difficulty-tool-11.htm</guid>
		<description><![CDATA[With this tool you can know the difficulty level for a keyword. The main difference between this script and the others is that you can know also the difficulty level for a keyword in a specific language/country. For example the keyword hotel is more easy in italian search engines (google.it) than in english ones.

]]></description>
			<content:encoded><![CDATA[<p>With this tool you can know the difficulty level for a keyword. The main difference between this script and the others is that you can know also the difficulty level for a keyword in a specific language/country. For example the keyword hotel is more easy in italian search engines (google.it) than in english ones.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://en.kerouac3001.com/keyword-difficulty-tool-11.htm/feed/</wfw:commentRSS>
		<feedburner:origLink>http://en.kerouac3001.com/keyword-difficulty-tool-11.htm</feedburner:origLink></item>
		<item>
		<title>Regex Tutorial</title>
		<link>http://feedproxy.google.com/~r/GreyHatSeoBlog/~3/9Sf-mS1-AV0/regex-tutorial-8.htm</link>
		<comments>http://en.kerouac3001.com/regex-tutorial-8.htm#comments</comments>
		<pubDate>Thu, 11 Jan 2007 15:10:13 +0000</pubDate>
		<dc:creator>kerouac3001</dc:creator>
		
	<category>Inside The Engine</category>
		<guid isPermaLink="false">http://en.kerouac3001.com/regex-tutorial-8.htm</guid>
		<description><![CDATA[INTRODUCTION
The regex (regular expressions) are very useful for programmers. Using this device you can describe every string that presents to its inside a certain regularity.
We don&#8217;t want to talk about formal languages or formal grammars, we are going to bring you some examples that show how it works.
Think about having a web page with a [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.kerouac3001.com/hipowl.gif" alt="Regular Expressions" align="left" /><strong>INTRODUCTION</strong></p>
<p>The <strong>regex</strong> (<em>regular expressions</em>) are very useful for programmers. Using this device you can describe every string that presents to its inside a certain regularity.</p>
<p>We don&#8217;t want to talk about <em>formal languages</em> or <em>formal grammars</em>, we are going to bring you some examples that show how it works.</p>
<p>Think about having a web page with a form with the following fields:</p>
<ul>
<li>- <strong>Name</strong></li>
<li>- <strong>Surname</strong></li>
<li>- <strong>E-mail</strong></li>
<li>- <strong>Phone number</strong></li>
</ul>
<p>Once you have filled in the format and sent the data to the script, it&#8217;s very important to check if they are correct.</p>
<p>You need to define the specific area:</p>
<blockquote><ul>
<li><strong>Name</strong>: It&#8217;s made by one word only and  by alphabetical letter. According to us this is not a compulsory camp.</li>
<li><strong>Surname</strong>: It&#8217;s made by one or more words that can be made only by alphabetical letter</li>
<li><strong>Email</strong>: It&#8217;s made by 3 part: the first one is made by alphanumeric issues, underscore (_) and period (.), there&#8217;s the second one made by alphanumeric issues and dash, followed by a period , which is always followed by 2 to 4 alphabetical letters. This one is compulsory.</li>
<li><strong>Phone number</strong>: made by 2 part. It&#8217;s divided by a dash.</li>
</ul>
</blockquote>
<p><strong>All the following fields owe a specific regularity and there are specific expressions that defines them</strong>. These are the expression:</p>
<blockquote><ul>
<li><strong>name</strong>: [a-zA-Z]*</li>
<li><strong>surname</strong>: [a-zA-Z&#8217; ]+</li>
<li><strong>email</strong>: [a-zA-Z0-9_\.]+@[a-zA-Z0-9-]+\.[a-zA-Z]{0,4}</li>
<li><strong>phone number</strong>: [0-9]+\-[0-9]+</li>
</ul>
</blockquote>
<p align="center">- - -</p>
<p><strong>CLASSES</strong></p>
<p>The operator sign <strong>[]</strong> it&#8217;s made by two square brackets. In this <a href="http://en.wikipedia.org/wiki/Metacharacter">metacharacter</a> can be insert several constant character. <strong>Trough this metacharacter it&#8217;s possibile to characterize a single occurancy of one of the present characters to its inside</strong>, if it&#8217;s insert like normal characters or if it&#8217;s insert using constants: the characters set defined trough this operator takes the <strong>class</strong> name. For example the class <strong>[a]</strong> represents the single occurancy of the <strong>a</strong> character and allows to verify if it is inside a string and in that case executing some operation on it.  Otherwise the class <strong>[abcd]</strong> represents the single occurency in one of the four characters presents inside it and permit to verify if they are present in the strings and execute operations on them.</p>
<p align="center">- - -</p>
<p><strong>RANGE OPERATOR</strong></p>
<p><strong>-</strong> it&#8217;s an operator that permits to identify a range, for example:</p>
<blockquote><ul>
<li><strong>a-z</strong> for all the lower case letters</li>
<li><strong>A-Z</strong> for all the upper case letters</li>
<li><strong>0-9</strong> for all the numbers.</li>
</ul>
</blockquote>
<p>A part from this classic range, it&#8217;s always possibile to personalize them, for example the <strong>a-f</strong> contains all the lower case letters from <strong>a</strong> to <strong>f</strong>, it&#8217;s very useful to characterize hexadecimal numbers.<br />
The class <strong>[a-fA-F0-9]</strong> individualizes all the <strong>figures</strong> and the letters from <strong>a</strong> to <strong>f</strong> ( lower case and upper case) all the characters that are inside an hexadecimal figure.</p>
<p align="center">- - -</p>
<p><strong>CLASS REPETITION</strong></p>
<p>Now we are going to describe the <strong>class repetition operators</strong>.</p>
<p>The first one we are going to analyse it&#8217;s the star <strong>*</strong>. It&#8217;s the one that can verify how many time a class is repeated inside a string and to select all the consecutive occurency.<br />
For example, the following regular expression <strong>[a-z]*</strong> selects in a string all the consecutive occurency of alphabetical letters, how it&#8217;s shown here:</p>
<p>I <strong>have got</strong> 7 <strong>telephone number</strong>, <strong>but this is my cell</strong>-<strong>phone</strong>: 0004578907</p>
<p>This operator considers an empty set as positive solution and it&#8217;s used to verify the exactness of the <strong>NAME</strong> field, it could also be empty, but if it&#8217;s not it must be made by one word only. The regular expression refereed to it is: <strong>[a-zA-Z]*</strong></p>
<p>That expression contains all the alphabetic letter <strong>a-z</strong> and <strong>A-Z</strong>.<br />
Very similar to the star it&#8217;s the plus <strong>+</strong> operator that works in the same way, but it verify if a class it&#8217;s repeated inside a string one or more times. We use it for the <strong>SURNAME</strong> fields, that can contain one or more words separated by spaces. This is the regular expression refereed to it: <strong>[a-zA-Z]+</strong></p>
<p>Another operator it&#8217;s made by 2 <strong>{}</strong> braces, in their inside it can be a number <strong>{3}</strong> or a numerical range <strong>{12,58}</strong>. The first one individualizes all the repetitions of 3 characters that verify the class. The second one individualizes from 12 to 58 repetitions of characters that verify the class.</p>
<p>For example <strong>[0-9]{3,4}\-[0-9]{7}</strong> individualizes all the telephone number in an area code made by 3 or 4 figures and a suffix of 7 figures.</p>
<p align="center">- - -</p>
<p><strong>BACKSLASH</strong></p>
<p>In the last example we also talked about another operator the backslash \. We put this sign before a character if it is an operator and it makes not considering it as character, if we put it before a letter it is a constant. The dash it&#8217;s used to indicate a range and therefore if we want to use as a character we have to write it down this way: <strong>\-</strong></p>
<p>Now you can understand the regex that we used to verify the email:</p>
<blockquote><ul>
<li><strong>[a-zA-z0-9_\.]+@[a-zA-Z0-9-]+\.[a-zA-Z]{0,4}</strong></li>
</ul>
</blockquote>
<p>And the one for the telephone number:</p>
<blockquote><ul>
<li><strong>[0-9]+\[0-9]+</strong></li>
</ul>
</blockquote>
<p align="center">- - -</p>
<p><strong>REPETITION OPERATOR&#8217;S SPECIFIC CHARACTERISTIC</strong></p>
<p>One of the characteristics of the repetition operators is selecting everything is related to the expressions. This characteristic could be counterproductive sometimes. If we want to eliminate from a html page all the tags, we can use the following regular expression:</p>
<blockquote><ul>
<li><strong>&lt;.+&gt;</strong></li>
</ul>
</blockquote>
<p>This kind of regex selects a consecutive series of characters inside a string. The first one is <strong>&lt;</strong> followed by some different consecutive characters followed by a <strong>&gt;</strong>. Therefore the regular expression described before in the following string will be such this:</p>
<blockquote><ul>
<li><strong>&lt;HTML&gt;&lt;HEAD&gt; &lt;TITLE&gt; REGULAR EXPRESSIONS &lt;/TITLE&gt; &lt;/HEAD&gt; &lt;BODY&gt; &lt;/BODY&gt; &lt;/HTML&gt;</strong></li>
</ul>
</blockquote>
<p>Inside a line we take everything that is between the firs part of the character <strong>&lt;</strong> and the last part of the character <strong>&gt;</strong>.</p>
<p>If this operation doesn&#8217;t satisfy our demand we need to use one of the following method:</p>
<blockquote><ul>
<li>1. <strong>&lt;.+?&gt;</strong></li>
<li>2. <strong>&lt;[ ^&lt;&gt;]+&gt;</strong></li>
</ul>
</blockquote>
<p>The first one makes the repetition operator less strong and it makes it stops in the first part of the closing character.</p>
<p>The second individuates inside a strings a series of characters that start with <strong>&lt;</strong> followed by any characters different from <strong>&lt;</strong> and <strong>&gt;</strong> followed by an <strong>&gt;</strong>.</p>
<p>The regex that we have just described will appear in the former string like this:</p>
<blockquote><ul>
<li><strong>&lt;HTML&gt;&lt;HEAD&gt; &lt;TITLE&gt;</strong> REGULAR EXPRESSIONS <strong>&lt;/TITLE&gt; &lt;/HEAD&gt; &lt;BODY&gt; &lt;/BODY&gt; &lt;/HTML&gt;</strong></li>
</ul>
</blockquote>
<p align="center">- - -</p>
<p><strong>CLASS DENING</strong></p>
<p>Let&#8217;s focus on a different problem. Let&#8217;s suppose having a story and we need to individuate all the sentences present inside it. If inside the story the period is used only at the end of the sentences, we have to deny a class in order to individuate a sentence in a easier way.</p>
<blockquote><ul>
<li><strong>[^\.]+</strong></li>
</ul>
</blockquote>
<p>The <strong>^</strong> sign if it&#8217;s put immediately after the first bracket  of a class, it denies the class. Therefore in our case it&#8217;s individuated the consecutive repetition of all that characters that are not the period. Basically a sentences it is individuated.</p>
<p align="center">- - -</p>
<p><strong>THE PERIOD</strong></p>
<p>The period it&#8217;s a constant, and if it is inserted in a regex it&#8217;s equivalent to a class that has all the characters but the &#8220;new line&#8221;.</p>
<p>This is just an example to better understand the function of the period:</p>
<blockquote><ul>
<li><strong>c.s.</strong></li>
</ul>
</blockquote>
<p>The former regex individuates all the 4 characters sequences that starts with c followed by any characters and then followed by a an s. It creates different combinations such as:</p>
<blockquote><ul>
<li><strong>case</strong></li>
<li><strong>casa</strong></li>
<li><strong>cosa</strong></li>
<li><strong>cose</strong></li>
<li><strong>c%s9</strong></li>
<li><strong>cÂ£sl</strong></li>
</ul>
</blockquote>
<p align="center">- - -</p>
<p><strong>ALTERNANCY OPERATOR</strong></p>
<p>Another very useful operator is the pipe <strong>|</strong> which has the same function of th OR. For example, the regex <strong>george|stuart</strong> individuates inside a string the word george or the word stuart:</p>
<blockquote><ul>
<li>Both <strong>george</strong> and <strong>stuart</strong> are two famous seo, but <strong>george</strong> has a forum, <strong>stuart</strong> has a web agency.</li>
</ul>
</blockquote>
<p align="center">- - -</p>
<p><strong>ANCHORS</strong></p>
<p>Another problem can be faced if you need to modify one or more elements inside a CSV (comma-separeted value) database, a textual database in which fields are separated by commas and which records are divided by a new line. The following database is an example that represents the daily gain of an adsense made by three friends.</p>
<blockquote><ul>
<li>12â‚¬, 50â‚¬, 70â‚¬</li>
<li>30â‚¬, 46â‚¬, 68â‚¬</li>
<li>15â‚¬, 52â‚¬, 73â‚¬</li>
<li>16â‚¬, 30â‚¬, 85â‚¬</li>
</ul>
</blockquote>
<p>If one day one of the friends was banned from adsense, his data would not be useful anymore and could be necessary to remove them. In the former example there are very few data therefore it is very easy to do a manual change. If there were thousands data the regex would be the fastest solution. If the data of the banned friend is the ones in the third column, the fastest solution to remove them would be to eliminate all the occurency in the following regex:</p>
<blockquote><ul>
<li><strong>,[0-9]*â‚¬$</strong></li>
</ul>
</blockquote>
<p>The <strong>$</strong> character doesn&#8217;t identify any characters, but a position, the end of a line. Therefore the former regex finds all the consecutive characters series that start with a comma followed by some numbers, followed by the â‚¬, followed by the ending of a line.</p>
<p>It&#8217;s always possible to identify the beginning of a line with the <strong>^</strong> character. This one has to be used very carefully because you can use it to deny a class itself. Therefore you always have to remember to use it outside a class. Also the <strong>$</strong> operator must be used this way, if it is used inside a class you can refer to it as a character.</p>
<p align="center">- - -</p>
<p><strong>GROUPS</strong></p>
<p>We can consider a characters series as a single <strong>group</strong>, <strong>we can operate on it using some of the operators that build the regex</strong>. We could find out inside a text a code we don&#8217;t know its lenght, which is composed by 5 numbers followed by a letter, followed by 5 numbers followed by a  letter etc etc&#8230;until it terminates with a new  line. There is only a solution to find this code, we need to use a group. In this example the group it&#8217;s made by a class which has numbers only repeated five times, followed by an only letters class. This group has to be repeated at least once and must end with a new line. It could be written down as:</p>
<blockquote><ul>
<li><strong>([0-9]{5}[a-zA-Z])+$</strong></li>
</ul>
</blockquote>
<p>The regex creates this effect:</p>
<blockquote><ul>
<li>My secret code is <strong>12345T45345R12343F34567j</strong></li>
<li>Phil&#8217;s secret code is <strong>34526g54638j92725K63723H72829D12345I</strong></li>
<li>12345T45345R12343F34567j is not phil&#8217;s code.</li>
</ul>
</blockquote>
<p align="center">- - -</p>
<p><strong>BACKREFERENCES</strong></p>
<p>We could need to modify the positions of different part of text inside a string. For example, let&#8217;s suppose having a database csv made by 5 columns and 10000 rows with an error: the second column is in the fourth column position. Changing the position manually it will take hours and hours, but with regex we can solve that problem in less than 5 seconds.</p>
<p><strong>One of the group property is to memorize in a variable the selected text trough them, in order to use it in a substitution phase</strong>. For example, we need to create 5 groups that selects the fields present in a inside a rows of our csv. We have to admit that the database it&#8217;s structured as it follows:</p>
<blockquote><ul>
<li>1,45,589, phil, bob</li>
<li>2,56,79,mary,bob</li>
<li>3,57,89,phil,frank</li>
<li>..,..,..,..,..</li>
</ul>
</blockquote>
<p>We can use the following regex to select to select each of the single fields inside a row:</p>
<blockquote><ul>
<li><strong>([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),$</strong></li>
</ul>
</blockquote>
<p>With the former regex each fields will be memorized in a variable, the first will have the first one, the second will have the second one and etc etc. We need only to substitute the text selected with a new structure (1,4,3,2,5) to get the result we desire.</p>
<p>But there&#8217;s a little problem because the way to retrieve the variables is very different.</p>
<p><strong>Htaccess</strong>, <strong>dreamweaver</strong>, <strong>PERL</strong> retrieve the variables using the <strong>$</strong> character. Example: <strong>$1</strong> to retrieve the first one, <strong>$2</strong> to retrieve the second one. Furthermore <strong>$0</strong> retrieves the match of the whole regex. In the former example we would have replaced with this row:</p>
<blockquote><ul>
<li><strong>$1,$4,$3,$2,$5</strong></li>
</ul>
</blockquote>
<p><strong>EditPad pro</strong>, <strong>PowerGREP</strong> retrieve the variables using the <strong>\</strong> character. Example: <strong>\1</strong> retrieves the first one, <strong>\2</strong> retrieve the second one. Furthermore <strong>\0</strong> retrieves the match of the whole regex. In the former example we would have replaced the regex with the following expression:</p>
<blockquote><ul>
<li><strong>\1,\4,\3,\2,\5</strong></li>
</ul>
</blockquote>
<p><strong>.NET</strong>, <strong>Javascript</strong>, <strong>PHP</strong>, <strong>etc..</strong> each of them retrieves the variables in different ways and we advice you to read their guides.</p>
<p><strong>WARNING</strong>: if you use the repetition to repeat whole groups, the variables will be refereed each to a single selected group and not to the whole group repeated another time. If you use the regex<strong> ([0-9]{5}[a-zA-Z])+$</strong> to select the code on this text</p>
<p>My secret code is <strong>12345T</strong>45345R12343F34567j</p>
<p>the 1 variable will correspond with the selected part of the text and not to the whole code. This kind of things happens because the repetition is outside the backreference, therefore to solve this problem the solution is to change the group we have to repeat without backreference ( in order not to save it) and set up the backreference on the whole repetition:</p>
<blockquote><ul>
<li><strong>((?:[0-9]{5}[a-zA-Z]?)+)$</strong></li>
</ul>
</blockquote>
<p>In the former regex you can notice this particular structure <strong>(?: ?)</strong> in which are inserted two classes. This kind of structure is a group without backreferences; we can apply a repetition and memorize it.<br />
Now the variable 1 the following code (bold):</p>
<p>My secret code is <strong>12345T45345R12343F34567j</strong></p>
<p align="center">- - -</p>
<p><strong>QUESTION MARK</strong></p>
<p>In the groups the question mark can be used to avoid the match memorization. We have already seen that question mark could be used to restrict the repetitions. Now we will see that exit a lot of different functions for this simple character.</p>
<p>The first function makes a group optional, as you can see in the following example:</p>
<blockquote><ul>
<li><strong>michael (owen)?</strong></li>
</ul>
</blockquote>
<p>In the former regex the group <strong>(owen)</strong> is made optional and therefore it will be possible to select both the simple occurency of the word <strong>michael</strong> and the occurency of the word couple <strong>michael owen</strong>.</p>
<p>The second function is being an <strong>anchor</strong>. As we have already seen there are a lot of operators such as <strong>^</strong> and <strong>$</strong> that could be keepers, they individuate inside a string a position. The question mark can be  also used in the groups as a keeper, to individuate it as a position inside the text. Example:</p>
<blockquote><ul>
<li><strong>michael(?=owen)</strong></li>
</ul>
</blockquote>
<p>The former regex selects the word <strong>michael</strong> in a text only if it is followed by the group <strong>(owen)</strong> that will not be selected. Examples:</p>
<blockquote><ul>
<li><strong>michael</strong> owen</li>
<li>michael</li>
<li>today owen scored a goal</li>
<li>yesterday <strong>michael</strong> owen didn&#8217;t scored</li>
</ul>
</blockquote>
<p>You can also use the question mark to individuate the absence of a position. For example the following function selects the word <strong>michael</strong> only if it&#8217;s not followed by the group <strong>( owen)</strong>:</p>
<blockquote><ul>
<li><strong>michael(?!owen)</strong></li>
</ul>
</blockquote>
<p>Example:</p>
<blockquote><ul>
<li>michael owen</li>
<li><strong>michael</strong></li>
<li>today owen scored a goal</li>
<li>yesterday michael owen didn&#8217;t scored</li>
</ul>
</blockquote>
<p>The two properties that we have just described works only when the anchor follows the text (or group or class) that has to be selected. If the anchor it&#8217;s placed before the selected text, we have to use two structure, the first one to verify the presence of the anchor, the second to verify the absence:</p>
<blockquote><ul>
<li><strong>(?&lt;=owen) michael</strong></li>
<li><strong>(?&lt;!owen) michael</strong></li>
</ul>
</blockquote>
<p>Basically the character <strong>&lt;</strong> is inserted after the question mark.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://en.kerouac3001.com/regex-tutorial-8.htm/feed/</wfw:commentRSS>
		<feedburner:origLink>http://en.kerouac3001.com/regex-tutorial-8.htm</feedburner:origLink></item>
		<item>
		<title>The Grey</title>
		<link>http://feedproxy.google.com/~r/GreyHatSeoBlog/~3/AwHtGV7m6gs/the-grey-7.htm</link>
		<comments>http://en.kerouac3001.com/the-grey-7.htm#comments</comments>
		<pubDate>Thu, 30 Nov 2006 12:19:24 +0000</pubDate>
		<dc:creator>kerouac3001</dc:creator>
		
	<category>Grey Hat Blogs</category>
		<guid isPermaLink="false">http://en.kerouac3001.com/the-grey-7.htm</guid>
		<description><![CDATA[Like specified by Whitman words, this blog will propose an impartial look at SEO world, removing any sort of moral implication, looking at the evil and at its contrary, staying indifferent to both of them. No prejudgment, no intention to respect other&#8217;s rules, no intention to break the rules if not necessary, no shame to [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" title="Grey hat seo" alt="Grey hat seo" src="http://www.kerouac3001.com/grey_hat_seo.jpg" />Like specified by <strong>Whitman</strong> words, this blog will propose an impartial look at SEO world, removing any sort of moral implication, <em>looking at the evil and at its contrary, staying indifferent to both of them</em>. No prejudgment, no intention to respect other&#8217;s rules, no intention to break the rules if not necessary, no shame to inquire through whichever way. <strong>This blog is made to study</strong>. Everything discussed in these pages will have the only aim to inquire on the mystery of the search engines: no gain goals, nobody wants to brail up users.</p>
<p>For these reasons I have chosen the title of <strong>Grey Hat SEO</strong>: a grey seo, equidistant from the good and the evil, <em>indifferent to both</em>.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://en.kerouac3001.com/the-grey-7.htm/feed/</wfw:commentRSS>
		<feedburner:origLink>http://en.kerouac3001.com/the-grey-7.htm</feedburner:origLink></item>
		<item>
		<title>Markov Chains [Spam that Search Engines like - Pt 1]</title>
		<link>http://feedproxy.google.com/~r/GreyHatSeoBlog/~3/3HGfw0jxMgI/markov-chains-spam-that-search-engines-like-pt-1-5.htm</link>
		<comments>http://en.kerouac3001.com/markov-chains-spam-that-search-engines-like-pt-1-5.htm#comments</comments>
		<pubDate>Thu, 02 Nov 2006 15:21:26 +0000</pubDate>
		<dc:creator>kerouac3001</dc:creator>
		
	<category>Grey Hat Blogs</category>
		<guid isPermaLink="false">http://en.kerouac3001.com/markov-chains-spam-that-search-engines-like-pt-1-5.htm</guid>
		<description><![CDATA[With this post I am starting a series of articles dealing about topics related to spam, as seen from a Grey Hat SEO&#8217;s point of view.
Let this be quite clear: I am not going to provide you with codes to develop spam engines, but I am going to use what I&#8217;ve learnt through these engines [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" title="Oh no, we're being spammed!" alt="Oh no, we're being spammed!" src="http://www.kerouac3001.com/spam.jpg" />With this post I am starting a series of <strong>articles</strong> dealing about topics related to <strong>spam</strong>, as seen from a Grey Hat SEO&#8217;s point of view.</p>
<p>Let this be quite clear: I am not going to provide you with codes to develop <strong>spam engines</strong>, but I am going to use what I&#8217;ve learnt through these engines <strong>in order to provide you with theories and/or information </strong>that can be implemented both in white hat and black hat sites.</p>
<p>It&#8217;s up to you how to make best use of this knowledge and it&#8217;s also up to you to prove me wrong whenever you want to.</p>
<p>Enough of digressions, let&#8217;s start&#8230;</p>
<p>(<em>please read slowly or you&#8217;ll have to learn about <a href="http://en.wikipedia.org/wiki/Markov_chains">Markov Chains</a> on Wikipedia</em>)</p>
<p><strong><img align="right" title="Giorgio Taverniti" alt="Giorgio Taverniti" src="http://www.kerouac3001.com/gtfluxx.jpg" />Let&#8217;s suppose we are in a town of 100,000 inhabitants where everyone meets, during his/her life, only a certain number of people from that town, each of them at a different frequency. And let&#8217;s suppose that I am an inhabitant of this strange place and that one day I will meet by chance a good friend of mine called Giorgio.</strong></p>
<p><strong><img align="left" title="Stefano Gorgoni" alt="Stefano Gorgoni" src="http://www.kerouac3001.com/unclesem.jpg" />It&#8217;s no neews</strong>, on the contrary, I have to say that I <strong>do meet</strong> this friend of mine <strong>very often</strong>, <strong>almost once a day </strong>to be precise. But today something has happened that <strong>does not happen very often</strong>: we met in front of a bar at 8am and there was no-one there. So we decided to go to <strong>Stefano&#8217;s place</strong>, a friend of ours, who lives not far away from this bar. <strong>I do not see him very often, but he does know him very well</strong>, he has known him for a couple of years and he says they meet up at least <strong>five times a week</strong>. Now that I&#8217;m thinking about it I have never happened to meet Stefano alone, <strong>the few times I&#8217;ve seen him Giorgio was almost always with us</strong>.</p>
<p><img align="right" title="Stuart Delta" alt="Stuart Delta" src="http://www.kerouac3001.com/stustu.jpg" />Anyway, we get to his place in a couple of minutes and ring the intercom, we are told to wait a little, Stefano is getting ready; also, he says we have to go and see a certain <strong>Stuart Delta</strong>, a friend of his who has just called a few minutes before, telling him he&#8217;s moving and asking whether he could help him. Stefano says hello, he says he will be ready right away and I ask Giorgio who this Stuart is. He replies that he has just got to know him, adding that <strong>he is a very good friend of Stefano&#8217;s</strong>, and they are almost always together and that the three of them go out every saturday night: <strong>he&#8217;s the person Giorgio sees the most when he goes out with Stefano.</strong><br />
But here comes Stefano, he says he&#8217;s sorry for taking so much time. We are told that Stuart lives just a bit further on and that it would be nice of us to help him moving.</p>
<p><img align="left" title="Enrico Altavilla" alt="Enrico Altavilla" src="http://www.kerouac3001.com/low-rayban.png" />We get to his place and a few minutes later <strong>Enrico, a very good friend of Stefano&#8217;s and Stuart&#8217;s</strong>, joins us: he has come to help. To my great surprise I find out that he is an <strong>old friend of mine </strong>and I tell him I was not expecting to see him there. He says &#8220;Hi&#8221; and asks me how come I am at Stuart&#8217;s  and then, without waiting for a reply, turns around and <strong>shakes hands with Giorgio, introducing himself</strong>.</p>
<p><strong><em>Markov chains base on one or more starting parameters</em></strong> (Giorgio and Me) <strong><em>to obtain one or more final parameters according to the probability the starting parameters are connected to the final parameter</em></strong> (Stefano): <em>I do not know Stefano very well, Giorgio often sees him, I almost always meet him when Giorgio is with us: Me -/ Giorgio -/ Stefano.</em></p>
<p>Giorgio does not know Stuart very well, Stefano often sees him, <strong>that&#8217;s why he and Giorgio are meeting Stuart today</strong>, because, as I said, <strong>he is the person they see the most when they are together</strong>, so: <em>Giorgio -/ Stefano -/ Stuart.</em></p>
<p>And this is why I have met Stuart today, because: Me -/ Giorgio -/ Stefano -/ Stuart.</p>
<p>And, finally, <strong>the fact that I know Enrico is not decisive for him coming to Stuart&#8217;s place</strong>. He actually did not even take too much notice of me. <strong>Giorgio had never met him, anyway</strong>.</p>
<p><img align="right" title="Jacopo Gonzales" alt="Jacopo Gonzales" src="http://www.kerouac3001.com/jacopo.jpg" />And yet I might meet <strong>Jacopo</strong> tomorrow: a friend of mine, of Giorgio&#8217;s and Stefano&#8217;s. We might meet him just because we are together (Giorgio, Stefano and Me). Or maybe one day we decide to meet him, only Giorgio and Me. Or maybe I will meet him alone.</p>
<p><strong>With a Markov chain, it is not necessary to have a certain number of starting parameters to obtain a final parameter. The final parameter, though, is likely to be generated by the starting paramenters, and the probability depends on these.</strong></p>
<p>Just don&#8217;t worry if it&#8217;s not all clear, I will give more examples and as we go further into the issue you wil find out why it is interesting both for the Black Hat SEOs and for the White Hat SEOs.</p>
<p><strong><img align="left" title="Markov Chains Graph" alt="Markov Chains Graph" src="http://www.kerouac3001.com/grafo-markov.jpg" />Let&#8217;s now suppose we have a start set </strong>(it was the town in the above example) <strong>where the two following cases are established: an ordered relationship among the elements </strong>(in the above example it was: Giorgio and Me; Stefano and Me, Enrico and Me; Stuart and Me; Giorgio and Me; Giorgio and Stefano; Giorgio and Stuart; Giorgio and Enrico; ..), <strong>and an ordered relationship between an ordered subset of elements and one element</strong> (In the town example it was: [Me and Giorgio] -/ Stefano is different from [Giorgio and Me] -/ Stefano).</p>
<p><strong>Also, let&#8217;s suppose that every kind of relationship is coupled with a frequency ranging from 0 to 1.</strong></p>
<p>Example (forget about Me, Stefano, Giorgio and so on.. let&#8217;s now use letters only and let&#8217;s define a new type of relationship):</p>
<ol>
<li>I -/ G = 1/2;</li>
<li>G -/ I = 1/3;</li>
<li>I -/ S = 1/7;</li>
<li>S -/ G = 1/4;</li>
<li>G -/ S = 1/6;</li>
<li>[G -/ I] -/ S = 1/20;</li>
<li>[I -/ G] -/ S = 1/10</li>
</ol>
<p>As you can see now, unlike in the town example,<strong> frequency is based on the elements order, too</strong>. But don&#8217;t worry, let&#8217;s get rid of letters and give an example with words.</p>
<p><img align="left" title="Percentage Graph" alt="Percentage Graph" src="http://www.kerouac3001.com/markov.png" />Let&#8217;s suppose we have a text document dealing with dogs. <strong>This document is our set, whereas the words composing it are the elements</strong>.</p>
<p><strong>Each word in the document will be followed by another with a certain frequency </strong>(number of cases of the ordered couple divided by the number of cases of the first word). For example, the word <strong>the</strong> is followed in the text by the word <strong>dog</strong> with a frequency of 1/20, as there are 200 cases of the word <strong>the</strong> and 10 of the ordered couple <strong>the dog</strong>. I&#8217;ll make clear what I mean with <em>ordered couple</em> for those who haven&#8217;t quite understood it: <strong>the dog</strong> is a couple formed by <strong>the</strong> and by <strong>dog</strong>, but it is different from the couple  <strong>dog the</strong>: in fact, the couple <strong>the dog</strong> can have a different <em>frequency</em> compared to the couple <strong>dog the</strong>.</p>
<p>Also, in our document <strong>every ordered couple of words is followed by another couple with a certain frequency</strong>, for example in our document only these words follow the ordered couple <strong>the dog</strong>:</p>
<ol>
<li>is</li>
<li>white</li>
<li>Snoopy</li>
</ol>
<p><img align="right" title="Snoopy" alt="Snoopy" src="http://www.kerouac3001.com/snoopy.gif" />The word <strong>is</strong> follows the couple <strong>the dog</strong> with a frequency of 1/2; in fact, there are 5 cases of the ordered triple <strong>the dog is</strong> in the document, out of a total of 10 cases of the couple <strong>the dog</strong>. The word <strong>white</strong> follows the couple <strong>the dog</strong> with a frequency of 2/5 and the word <strong>Snoopy</strong> follows the couple  <strong>the dog</strong> with a frequency of 1/10.</p>
<p>Let&#8217;s suppose we know the <strong>frequency linking every couple of the document to each word of the document</strong>; we are talking about an ordered type of link. We will use this information in an array and will define a <strong>start couple</strong>, randomly chosen among all couples.</p>
<p>If we have a big array available we can get a fairly long and correct text, even if it&#8217;s nonsense to us humans.</p>
<p>Let&#8217;s suppose our start couple is <strong>the dog</strong>.</p>
<p>If we look for it in the array we will find out that it can be followed by 3 words (<strong>is</strong>; <strong>white</strong>; <strong>Snoopy</strong>), each of them with a certain frequency. One of these word is chosen at random (<strong>is</strong>) and we continue.</p>
<p><img align="left" title="Black Hat &#038; White Hat" alt="Black Hat &#038; White Hat" src="http://www.kerouac3001.com/black-white-seo.jpg" />We now have the couple <strong>dog is</strong>, we look for it in the array and find out that it can be followed by 2 words (<strong>white</strong>, <strong>black</strong>). One of these two words is chosen randomly, based on the frequency (the word <strong>white</strong> is chosen) and we keep on with looking for the couple <strong>is white</strong> in the array &#8230;</p>
<p>Clearly, the text produced will be getting to a dead end in the following cases only:</p>
<ol>
<li>look in the array for a couple not associated with any other word (if it exists, there is only one for each array)</li>
<li>look in the array for a couple A associated with a single word only and the second word in the couple forms a couple associated with one word only, and so on until you either get to a couple not associated with any word, or to a couple among those that necessarily follow couple A. In this second case, we are basically creating a loop.</li>
</ol>
<p><img align="right" title="White Hat SEO" alt="White Hat SEO" src="http://www.kerouac3001.com/whitehat.jpg" />Ok.. I have now explained to you what a Markov chain is, or at least I have explained them to you from the point of view of random text production. At this point spammers can stop reading. <strong>This is what my friends White SEOs might be interested in</strong>:</p>
<p>The question to be asked is: <strong>why do spammers use these texts?</strong> Because they are lazy? Or just because they want to reach many rankings right away and for many keywords without having to write the contents?<br />
Sure! 99% do it for that reason, but the rest 1% is wondering: <strong>why do search engines &#8220;like&#8221; these texts?</strong></p>
<p>My idea is that these texts, if they are <strong>produced correctly</strong>, can interpret the <strong>trend</strong> of a SERP both from the point of view of the <strong>topic</strong> and of the <strong>language</strong>. This way, they climb up the very SERP, because <strong>search engines consider them appropriate.</strong></p>
<p><strong>This kind of pages may be considered as a good feeler to roughly comprehend what is the trend of a SERP and to understand how to create clean pages based on the results of the Markov pages.</strong></p>
<p>Ranking processes most likely rely on the mathematical/geometric idea of <strong>similitude.</strong> Therefore a search engine will consider appropriate a page that is <em>similar</em> to all the other pages that have been considered interesting inside a certain SERP and a page that is, at the same time, <em>different</em> from each one of these pages, as much as the copyproof filters are overcome.</p>
<p>It can be deduced that Search Engines Trend can actually exist, at least as an ideal <strong>model</strong> a web page tends to, <strong>determining its ranking in a certain SERP</strong>.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://en.kerouac3001.com/markov-chains-spam-that-search-engines-like-pt-1-5.htm/feed/</wfw:commentRSS>
		<feedburner:origLink>http://en.kerouac3001.com/markov-chains-spam-that-search-engines-like-pt-1-5.htm</feedburner:origLink></item>
		<item>
		<title>Search Engines Trend</title>
		<link>http://feedproxy.google.com/~r/GreyHatSeoBlog/~3/sL2zA1IZk4s/search-engines-trend-4.htm</link>
		<comments>http://en.kerouac3001.com/search-engines-trend-4.htm#comments</comments>
		<pubDate>Thu, 02 Nov 2006 12:26:48 +0000</pubDate>
		<dc:creator>kerouac3001</dc:creator>
		
	<category>Inside The Engine</category>
		<guid isPermaLink="false">http://en.kerouac3001.com/search-engines-trend-4.htm</guid>
		<description><![CDATA[Since there is no accounting for taste, the search engine needs to base on what one likes in oder to understand what is beautiful.
Therefore, it should come natural to talk about traffic and links, but I prefer to surprise you and divert the topic onto somenthing less visible, less tangible and perhaps, something existing only [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" title="Faye Wong" alt="Faye Wong" src="http://www.kerouac3001.com/faye_wong.jpg" />Since there is no accounting for taste, the search engine needs to base on what one likes in oder to understand what is beautiful.</p>
<p>Therefore, it should come natural to talk about <strong>traffic </strong>and <strong>links</strong>, but I prefer to surprise you and divert the topic onto somenthing less visible, less tangible and perhaps, something existing only in my perverse mind: <strong>search engines trend</strong>.</p>
<p>Let&#8217;s start right away, with a little example.<br />
<strong>Let&#8217;s suppose we are today creating a site dealing with a nonexistent subject and basing on nonexistent keywords</strong>. Even though Google is not a semantic search engine, it generally attaches a <strong>correlation between keywords of the same sector</strong> in order to understand what a site is talking about and whether this site has the right or not to be at the top of a certain SERP. It is basically the same principle of <strong><a target="_blank" href="http://homepages.inf.ed.ac.uk/srenals/pubs/1999/esca99-thisl/node6.html">query expansion</a></strong> that has been discussed for a while. <em>Giorgio Taverniti</em> explained it very well , not very technically perhaps, but in a way that is clear to unexperienced SEOs, too. It should be specified, though, that <strong>query expansion is not a tool meant for SEOs</strong>, it is rather a tool allowing search engines to correlate more topics.</p>
<p>As in the case of BackLinks, the existence of both a <strong>visible</strong> <strong>correlation </strong>(the one we get by using the code ~ before the query) and an <strong>invisible correlation </strong>is likely to be supposed. In any case, both the former and the latter must establish the <strong>pertinence degree a web page has, compared to a certain query</strong> or at least compared to a certain subject.</p>
<p><img align="left" alt="Query Expansion with Clusty" title="Query Expansion with Clusty" src="http://www.kerouac3001.com/clusty.jpg" />Just to avoid empty theories and provide some tools useful for ranking, I recommend to all query expansion lovers to use <strong><a target="_blank" href="http://clusty.com/">clusty.com</a></strong>. For those who don&#8217;t know  what I am talking about, <strong>clusty is a search engine basing on keyword clustering. </strong>Therefore, unlike Google &#038; Co., it stands a bit closer to a semantic search engine, at least as far as the consistency of its results is concerned.</p>
<p>I personally use this search engine for query expansion. I will now explain how to do it, but, since it is very simple, I reckon you will grasp it immediately. Search for the keyword you are interested in with clusty.com (for example, I am searching for the word Fashion) and look to your left. <strong>Clusty will provide a list of Topics, believed to be correlated to that specific query</strong>.</p>
<p>In my case, it thought appropriate to correlate the query fashion to the following keywords (Subjects):</p>
<ul>
<li>Fashion Week (32)</li>
<li>Clothing (35)</li>
<li>Magazine (27)</li>
<li>Photography (22)</li>
<li>Beauty (18)</li>
<li>Schools, Design (14)</li>
<li>Women&#8217;s (11)</li>
<li>Models (11)</li>
<li>Fashion industry (10)</li>
<li>Dress (9)</li>
</ul>
<p>Every Topic can be expanded and the related keywords can be further extended: <strong>we are talking about a very useful tool, if used properly and especially if used together with a tool for <a target="_blank" href="http://tools.seobook.com/general/keyword/">Keyword Suggestion</a> as the SEO Book and the classical dictionary of synonims and antonyms</strong>.</p>
<p>So, now that I have provided you with something practical you can use as you like, let&#8217;s go back to the main topic: what does this have to do with <strong>Search Engines Trend</strong>?</p>
<p>In the first example, you were told to assume you are creating a site dealing with a <strong>totally new </strong>subject, with <strong>new keywords </strong>and a slang slightly different from the one used by other sites in the same language: it is possible to think that this first site will <strong>affect the Trend of the new SERPS </strong>it is creating, by suggesting to the search engine the opportunity of correlating certain keywords with others belonging to the same sector.</p>
<p>Therefore, it is possible to suppose there are algorithms whose task is managing Search Engines Trend. Likewise, we can also assume it is these algorithms that provide Google with much information, such as information about <strong>topic</strong>, <strong>language</strong> and &#8220;<strong>level of spam</strong>&#8220;. We are talking about information created both <em>onPage</em> and <em>offPage</em>, through both page contents and the links to the page. <strong><em>In other words, the idea of language, topic and spam are not stable in the search engine, they actually may undergo a slight change through onPage and offPage factors</em></strong>. So, after the creation of the word <strong>SEO</strong>, this term is now part of various languages and certain subjects such as: <em>marketing</em>, <em>google</em>, <em>msn</em>, <em>yahoo</em>, <em>webmaster</em>, <em>internet</em>, <em>blog </em>and so on.</p>
<p><img align="left" alt="Theory of Chaos" title="Theory of Chaos" src="http://www.kerouac3001.com/farfalla.jpg" />What I am stating is that a search engine should not be thought of as a classifying tool based on a certain idea of language, topic, and so on. <strong>A search engine is a tool undergoing deep changes, as Trends change</strong>. So for example developing new technologies may upset your SERPs because it generally involves both an interest shift and the development of new related terms.</p>
<p>Besides: The flapping of a butterfly&#8217;s wing in Brazil can set off a tornado in Texas.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://en.kerouac3001.com/search-engines-trend-4.htm/feed/</wfw:commentRSS>
		<feedburner:origLink>http://en.kerouac3001.com/search-engines-trend-4.htm</feedburner:origLink></item>
		<item>
		<title>Passing On Information through Links</title>
		<link>http://feedproxy.google.com/~r/GreyHatSeoBlog/~3/AmjK9gaqOKw/passing-on-information-through-links-3.htm</link>
		<comments>http://en.kerouac3001.com/passing-on-information-through-links-3.htm#comments</comments>
		<pubDate>Fri, 27 Oct 2006 11:03:31 +0000</pubDate>
		<dc:creator>kerouac3001</dc:creator>
		
	<category>Inside The Engine</category>
		<guid isPermaLink="false">http://en.kerouac3001.com/passing-on-information-through-links-3.htm</guid>
		<description><![CDATA[I have been talking about it for months, I have been actually going thoroughly into the issue for months. That makes most of the ideas I have been expressing obsolete, which is why my theories about links and languages need be revised.
In this article, I aim at explaining to myself and to those who follow [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" title="Graph theory" alt="Graph theory" src="http://www.kerouac3001.com/grafoaula.jpg" />I have been talking about it for months, I have been actually going thoroughly into the issue for months. That makes most of the ideas I have been expressing obsolete, which is why my <strong>theories about links and languages </strong>need be revised.</p>
<p>In this article, I aim at explaining to myself and to those who follow me what I think about <strong>passing on information</strong>, and I would like to stress that I will be analyzing only how it is connected to links. Therefore, I am not going to talk about all other information useful for search engines, such as: <strong><em>whois</em></strong>, <em><strong>structure </strong></em>and <strong><em>traffic</em></strong>.</p>
<p><strong><img align="left" title="Force: length, direction, orientation" alt="Force: length, direction, orientation" src="http://www.phy6.org/stargaze/Sfigs/Sloop.gif" /></strong><strong>Links are crucial both for the SEO</strong> (in their mind the idea that links give the site a boost is inculcated ever since they were children), and <strong>for the search engine</strong>. Through this system, the engine can get to know the different aspects of one site, simply by connecting it to the different sites it is linked by and to the sites it links to.</p>
<p><strong>However, the link is not not be considered as a mere boost, as it is a sort of vectorial container of information. </strong>It does make more sense to think of it as a <strong>Complete Force</strong> (needless to say, as Forces always have this nature, despite the average SEO being convinced of the contrary), <strong>a force featuring magnitude, direction and orientation</strong>.</p>
<p>Let&#8217;s suppose that every link is determined by these three factors only and let&#8217;s see how links are far complexer than one would think.</p>
<p>We shall define the magnitude of a link as the value indicating the boost it will have on the linked page; we shall define orientation and direction as what links head for in a multidimensional space.</p>
<p>It does not matter what these dimensions actually are in the search engines (they may be languages, topics or other things). Now what matters is that we understand <strong>that any link has a boost determined by magnitude, but it also features a direction and an orientation this magnitude is applied to</strong>, moving our object (the page) towards a specific place.</p>
<p>As SEOs, we do not want to just move the page, but we do want to move it aiming at a <strong>specific target</strong>.</p>
<p>So, only by considering the link as a force endowed with magnitude, direction and orientation, it is possible to sense 3 main ideas:</p>
<ol>
<li><em><strong>Linking a page in order to reach a good ranking is not enough</strong></em>, it is necessary to give the link a specific direction and orientation, otherwise we may end up further away from our goal.</li>
<li><em><strong>An absolute penalty for search engines does not exist, but relative penalties do exist: </strong></em> that is, penalties due to failure to reach a goal. Introducing a link that moves us towards the wrong direction is a relative penalty, as it moves us further away from the goal we had targeted, but it does take us closer to something else. Nevertheless, this &#8220;something else&#8221; might be not only totally useless to us, but even harmful.</li>
<li>Also, it is clear that the &#8220;penalty&#8221; is not due to the magnitude of the link, but to its direction. In order to reach our goal, <strong><em>the direction we are heading for is more important than how strongly we are heading for that direction</em></strong>.</li>
</ol>
<p>The task of a skilled webmaster is not to link a web page, but to understand where the link will take us.</p>
<p>Now forget about links as vector forces. I&#8217;ll make myself clearer: <em>what search engines use is probably the idea of links as vector forces, but restricting ourselves to that might be  counterproductive.</em> What currently matters to us most is the information behind links, rather than their mathematical side.</p>
<p><img align="left" title="SEOs News" alt="SEOs News" src="http://www.nuovofilmstudio.it/RLimg/logo-informazione.gif" />So let&#8217;s concentrate on the links and continue to define them as vectors, but devoid of direction and orientation: <strong>vectors featuring magnitude and information only</strong>.</p>
<p><strong>Magnitude is, as usual, the simplest side and it is always possible to define it as the boost a page gets</strong>.</p>
<p>The <strong>information content, on the contrary, becomes our direction and our orientation and indicates where we are moving to</strong>.</p>
<p>As many of you know, the information I prefer to talk about is <strong>language</strong>. This is in fact the information we can be more easily aware of, for example by doing a simple test such as that by  <a target="_blank" title="Doopcircus Reverse Test" href="http://www.giorgiotave.it/forum/gara-test-fattori-arcani/8892-esperimento-doopcircus-al-contrario.html">Tagliaerbe</a> [Link in Italian]. It is generally easy to experience that a page with a context in language A, linked mainly and massively by pages with contents (and/or BackLinks) in language B, will be considered by the search engine as a page written in the language B. <strong>Also, it is clear how this is not an absolute penalty, but rather a penalty relative to our goal</strong>. This is due to the fact that, in general, the goal of all sites in language A is to rank by the term in language A and, above all, in the search engine in language A, whereas being ranked by terms in language B is totally useless to these sites.</p>
<p>However, this is not due to the search engine. <strong>It is our fault for having passed the wrong information</strong>.</p>
<p>A different type of information considered by the search engines and passed on (also) through links is the <strong>topic</strong>. The topic characterizes our page and it brings it closer or moves it away from specific keywords. I doubt whether there is much to be said about this, as it is very easily understood what is the interest that connects this information to the search engine. It is perhaps more difficult to understand how the search engine manages to determine how close to or how far from a certain topic we are.</p>
<p>As far as this is concerned, I only have vague theories, therefore I suggest you don&#8217;t trust what I&#8217;ll be saying and <em>I hope someone will provide a better explanation</em>.</p>
<p>I believe all information passed on through links is organized in a multidimensional space that considers the page as a point in this space. <strong>Therefore, a single piece of information does not exist, for example</strong>:</p>
<ol>
<li>Language: English</li>
<li>Topic: cooking</li>
</ol>
<p><strong>More probably, information composed of all possible entries and of the extent to which each entry is true does exist (through <a target="_blank" title="Fuzzy Logic" href="http://en.wikipedia.org/wiki/Fuzzy_logic">Fuzzy logic</a>).</strong></p>
<p>The objection that may be raised in this case is:</p>
<p><em>if the logic of search engines is not boolean but fuzzy, then why do links in english to an italian page make it lose positions in italian search engines? These data on the italian language, in fact, do not get lost, neither are they less than before: so, if my page was at the first place on a SERP and had 50 italian links and 0 english links and it is now at the 57<sup>th</sup> place in the SERP and has 67 italian links and 565 english links, why has a &#8220;penalty&#8221; occurred? My page is not less italian than before, it is only more english!</em></p>
<p><em>Remembering that I do not know the truth about these issues</em>, I will suggest my theory: <strong>the language factor is composed of weights that are distributed in proportion to the main language</strong>. So, in the above example, it is true that the web page did not lose any italian links (on the contrary, it has got some more compared to before), but these links <strong>move the page more to the english sector and away from the italian one, </strong>due to their being less than the english links.</p>
<p>The same thing happens in the case of the topic.</p>
<p>There is more information passed on through the link: the information usually defined as the <strong>trust </strong>of a site that basically gives a <strong>consistency value</strong> both to the site and to us when we are linked.</p>
<p>I am definitely lacking knowledge on this subject, but based on the opinion I formed, <strong>the trust may very well be passed on as a single fuzzy value and not as a value composed of the different information contents that have created it</strong>. This value, however, will contain all those data defining the consistency of a site and moving it closer to or away from spam, that is information about <strong>whois</strong>, <strong>structure </strong>and probably <strong>traffic</strong>.</p>
<p><img align="left" title="Graph Theory" alt="Graph Theory" src="http://www.silab.dsi.unimi.it/~ss377827/grafi/images/tutte2.gif" />I&#8217;ll better be talking about those next time. Also, I will be talking more in detail about the <strong>Graph theory: I preferred not to mention it</strong>, despite being strictly connected to the ideas expressed in this article, as it is not crucial for defining the link as information passing on, but it is <strong>essential in passing on the information.</strong>
</p>
]]></content:encoded>
			<wfw:commentRSS>http://en.kerouac3001.com/passing-on-information-through-links-3.htm/feed/</wfw:commentRSS>
		<feedburner:origLink>http://en.kerouac3001.com/passing-on-information-through-links-3.htm</feedburner:origLink></item>
	</channel>
</rss>
