s/engines list  |  s/engine details  |  s/engine links  |  marketing info  |  weed's home page

search engines - submissions advice

[please note -- this page has not had a major update since 2004 -- however much of the advice it contains is still relevant]

common sense
search engines vs directories
how search engines rank the results of a query
preparing web pages
theme-based indexing
purchasing query terms
paying for search engine inclusion
Yahoo and LookSmart
gateway (doorway/bridge) and hallway pages
popularity-based indexing
so what do i know about it?

common sense

the best search engines are the ones which return the most relevant results for search queries - when people use search engines they are looking for the best site(s) on the subject they are interested in - if a search engine doesn't return relevant results, then people won't use it - therefore it follows that, other things being equal, the most reliable way of getting to the top of the results page on the major search engines for a particular query term is to have the best page on that subject

search engines vs directories

search engines index individual pages - directories list sites

search engines use programs (called robots, spiders or crawlers) to actively search the web, downloading page content for indexing as they go - people looking for sites on search engines typically enter a query term (one or more words, or a phrase), and after a few seconds a list of links to sites containing the query term, ordered by relevance, is displayed

the information held by directories is entered by its visitors, who supply details of either their own site, or sites they wish to be included in the directory's database - web sites are arranged by category, and are typically found by following a series of hierarchical links until the appropriate sub-category is found, where the sites will be listed alphabetically

search engines contain links to large numbers of pages, but queries can return results many of which are not relevant to the search request - directories contain links to a smaller number of pages (usually restricting the number of submitted pages from any one site), and sites are often reviewed to ensure categories only contain relevant sites

in practice, most directories have an internal search facility and use back-up info from one or more of the search engine indexes; and many search engines incorporate site details from one or more of the directory databases

how search engines rank the results of a query

one or more of the following are used -
  • analysis of text content of the web page (page-based)
  • analysis of text content of all pages on the site (theme-based)
  • paying to appear higher up in the results table (paid placement)
  • analysis of quantity and quality of links pointing to the page (popularity)
  • analysis of clickthrus re previous appearances in search results (popularity)

preparing web pages

the most important part of the page re indexing by search engines is the <title>.....</title> tag - place it immediately after the <head> tag - not more than 7 (or 8?) words - search engines will display between 60 and 115 characters of the title - use different titles on different pages - the title should contain the (one or) two query terms relevant to the page content that are most likely to be used by people looking to find the page via search engines (usually this is not a company name!) - words or phrases used in the title should also appear in the page text

potential query terms (keywords) that people may use for finding sites via search engines shd be scattered throughout the text, especially the first 25 words - worth ensuring that they also occur within header tags - eg - <H1>query terms here</H1> - but overuse of the same keyword in a small amount of page text may be penalised (not more than 7 times?) - at least some of the pages on the site should be rich in plain text (200-600 words is sometimes recommended)

description meta tags are worth adding - even when not used directly by the search engines, the text is sometimes indexed as tho it were part of the page - often used intact as a description of the site when it appears on the results page for a search engine query - optimum number of characters 100, max number of chars 150

keywords meta tag almost defunct, because this provides a way of attempting to spam the search engines, meta keywords now only used by Inktomi & Teoma - most of the following info is probably no longer relevant - first letter of keywords possibly worth capitalising, so that it can be found by a query term starting with either a lower or upper case character (but trend now for search engines not to distinguish - only AltaVista is fully case-sensitive) - max no of characters 1024 minus the no of chars in meta description - may be better to separate individual words or phrases with commas (rather than with spaces, the other option) as phrases may then be indexed as phrases rather than individual words (??) - ensure everything on 1 line (no line breaks) - query terms at beginning of list are likely to be considered more important than those at end - it is possible that more than 3 occurrences of the same keyword may be penalised (tho this may depend on how it is used... eg whether it is just repeated, or whether it is used in conjunstion with phrases, as with hyphenated words) - rumoured that keywords may sometimes be counter-productive if not also encountered in page text, but also possible that keywords not in the page text will be indexed (Oct '02)

alt text (as used with image/sound files) is usually indexed as tho it were standard page text - comments text may be indexed by some search engines (but not by Google or Alta Vista)

potential query terms can be used as file names, directory names or domain names, but domain names are not given much weight (far less then titles and page text - if using query terms for file names then hyphens are better than run-ins or underscores eg animal-farm.com is better than animalfarm.com or animal_farm.com) - domain names (excluding suffixes such as ".com") best kept to less than 55 characters (Jun '02)

the amount of text on a page effects search engine rankings (sometimes a maximum of 250 words is recommended for the home page) - each page shd be centered round 2 or 3 potential query terms, and these shd each be repeated 3 times if this can be done without disturbing the flow and sense of the text

submissions are most effective when they are for simple static text-based html pages - frames and redirection pages may not be indexed - it is best to avoid refreshes and redirects - if redirects have to be used, they shd not be at domain name level

javascript is best placed in a separate plain text file using the .js file extension - otherwise it shd be moved to the bottom of the page - this ensures that keyword rich page text is the first thing the search engine robots come across

pages optimised for search engines shd be linked to directly from the home page and if possible be placed in the root directory

dynamic URLs containing a "?" "=" and other query strings may not be indexed by some search engines - for ways round this see  MarketPositionNetMechanic.netASP 101ApacheHigh Rankings,  and  Digital Web Magazine

dynamic page content (eg from a database) is best generated from a page with a static URL, however spiders will only index one set of information per visit

the noscript tag shd be used for dhtml menus - also a text version of any Flash info can be placed in an ALT tag within the noscript tags

in an attempt to stop people spamming their index, Google introduced new algorithms in November 2003 which removed or penalised many sites with previously high rankings - over-optimising a site may result in pages getting lower rankings than if they were not optimised - two of the signs of over-optimisation are thought to be filenames which include a particular key word/phrase for which a page has been optimised, and overusing a key word/phrase - the solution is to design the page content for human visitors rather than for search engine robots but not to do anything which might prevent robots from indexing the page appropriately

theme-based indexing

there have been suggestions that some search engines may take into account the content of all the pages of a site when indexing individual pages (reducing the essence of the site to a couple of query terms) - if so, then important keywords would need to appear on all the pages of the site, and any links would need to point to sites which have a similar theme (that is, contain the same keywords) - but current opinion does not support the idea that search engines make use of theme-based indexing.


most of the popular english language search engines and directories are listed at http://www.wussu.com/search/ - Google, MSN, Teoma, Yahoo and Gigablast are the biggest ones - a good search engine produces relevant results in a clear format, has a fast response time, keeps cached copies of pages, and provides a translation facility

all the emboldened ones in the list are either search engines generally worth submitting to, or sites of special interest and relevance

those which have an "add URL **" link are very easy (ie quick) to submit to; those which have "add URL *" are fairly easy to submit to; those which just have "add URL" can take quite a while to enter all the details requested and/or find the appropriate categories for entries to be placed - those sites where the "add URL" is not linked have no general page for submissions but (eg) might require a category to be selected before the submission procedure can begin

the average time after submission before sites appear in indexes varies - except for paid listings, it's best to allow 2 months for submitted sites to appear in the search engines results

most search engines claim that they will reindex sites automatically (sooner or later) if content changes - it is also sometimes claimed that the more often page content changes, the more frequently will robots revisit the page to reindex it - resubmitting may speed up the process, but if a site already has a good ranking the resubmission may result in a reevaluation and the site's position might drop

if changing the domain for a site, some search engine consultants think it is best to ensure the old site's pages are first removed from the search engines' databases before resubmitting (removing pages is usually done by resubmitting pages which no longer exist) - otherwise if the same page is present in a search engine under 2 URLs, the search engine might think it is being spammed, and drop both pages

purchasing query terms

also called "paid placement", "paid listings", "featured listings" & "sponsored listings" - "ppc" = pay per clickthru

query terms can be purchased from some search engines - payment is so much per clickthru with a minimum monthly spend - results are ranked on how much people are prepared to pay to have their sites at the top of the results table when their query term is matched exactly - results that have been paid for (and which therefore appear at or near the top) are sometimes indicated together with the amount paid

a list of ppc search engines where keywords can be purchased is available from PayPerClickSearchEngines.com - Overture has been the most popular, supplying results (especially it's top 3 paid listings) to several other search engines

info re popularity of various keywords can be obtained from WordTracker and Google (tho Google info has a cut-off, only giving details of keywords appearing more than 200(?) times a day) - there's a good article re Wordtracker at 1st Search Engine Ranking - it now has a database of 350M keywords taken from the last 60 days queries from Alta Vista, Dogpile and MetaCrawler (Jun 01)

Google provides an AdWords service where query terms are bid on to determine frequency of occurrence and position within the featured listings (which are kept separate from their main results

KeywordSpy and NicheBot also provide keyword lookup services, but do not offer a free trial

Yahoo! and dmoz (The Open Directory)

Yahoo operates a policy of "paid review" - commercial sites are charged annually for submissions to the main (US) directory - however payment is only for the site to be reviewed and does not guarantee that the site will be accepted

there is free submission to non-commercial categories and to local Yahoo directories (at least in theory)

each site is manually reviewed to ensure it is worthy of being included in their database (hmmm)... only "high quality" sites are accepted (top level domains given preference) -- most (all?) sites using the free submissions procedure are rejected, according to Yahoo because they are not of good enough quality; others say it is because Yahoo doesn't have time to review them -- hoever this may be a deliberate policy to try and ensure the paid review option is used

commercial sites may need to have a postal address displayed before being accepted

if submission not successful after 1 month, then resubmit, and if not found after another month complain politely by email to url-support@yahoo-inc.com - include the URL, but categories and previous submission dates not needed (?) (Jan 99) (previously it was suggested that all initial submission data be included)

to change Yahoo entries a) submit change, b) wait 10 days, c) email them with URL + date of the change request, d) repeat from b) until they make the change [flying pigs link to go here]

dmoz, also known as The Open Directory, does not charge for submissions - it uses volunteer editors and has built up a dtaabase of over 3M sites - it supplies many other search engines with their first listed result(s) and being listed there helps get good rankings on Google -- however submissions to categories which don't have editors (or where editors are not so conscientious) may take longer to be processed, and requests for changes are not always dealt with speedily - however it is fast and clean to use and it's straightforward to submit to

paying for search engine inclusion

the advantage is speedy entry into databases and regular reindexing of content - however if a site is already well ranked, then it is likely to be counter-productive to pay for inclusion, because factors contributing to the high ranking which are built up over time (such as linkage and popularity) may no longer be taken into account

paid inclusion programs are likely to have their own spiders -- when the subscription ends the URL may be dropped from the search engine results unless/until it occurs in the default database (fed by the standard spiders)

paid inclusion can be used to get search engines to index dynamic URLs - it is aldso possible to take advantage of the regular reindexing facility to tweak pages on a trial & error basis in order to get better search engine rankings for particular query terms

AltaVista, Ask Jeeves/TEOMA, FAST and Inktomi offer one or more means of paying for individual URLs to be spidered


need to check at intervals to see if submissions have been successful

newer sites often take a few months to climb the rankings because link analysis is used, and newer sites usually have less links pointing to them

gateway (doorway/bridge) and hallway pages

some submission agents try and ensure high ranking results re search engines searches by writing different entry pages (often called gateway pages) for each of the major search engines - this takes account of the fact that search engines use different indexing methods - each entry page is then adjusted and resubmitted until it comes top of the rankings for specific query terms - the rankings are then continually monitored to accommodate changes in search engine indexing algorithms (which are continually changing in an effort to prevent people trying to circumvent them), and also to prevent the entry pages being usurped by newly submitted sites (which also may be using gateway pages)

for most sites there will be several relevant query terms which potential visitors might use, in which case it would be necessary to provide a separate gateway page for each query word for each search engine!

in some cases, pages may be ranked higher if they are found indirectly by the search engine robot via a link from a submitted page - sometimes a hallway page is submitted, a specially designed page containing links to gateway or content pages

however, there is an ongoing debate about the validity of using these techniques, and search engines prefer to index what the visitor to the page sees rather than a page specifically customised for their own robots

popularity-based indexing

following the success of Google search engines are increasingly using popularity to influence the results of searches - by -

  • analysing how many times the link to a page in their index has been clicked on when it's been displayed as a result of previous searches - eg at HotBot, which is now powered by Direct Hit as well as the Inktomi database - Yahoo also thought to use click popularity - the length of time between clicking on a search engine result and returning to the search engine may also be taken into account

  • analysing how many other pages are known to link to the indexed page - eg Google keeps track of over a billion links, and uses these to determine which sites have the most links pointing to them, and this is what determines their ranking in the search results when more than one site is found which contains the query term - Inktomi & AltaVista also take into account external links to a page (Jul 00). AltaVista first determines whether the link text is relevant.

Google, Inktomi and HotBot require at least one external link to a site otherwise they will not add it to their indexes

generally, the more links to a site the better, but links from Web sites which have a lot of sites linking to them may be of greater importance than links from less popular sites - keywords included in link text may boost the ranking of the linked page for that keyword especially if the link text is the same as the title of the page being linked to - links to domain names (rather than subdirectories) more likely to be taken into account (?)

further info from Search Engine Watch

Free For All (FFA) pages are long lists of links often in submission date order - Google Inktomi and Fast have all stated that they disapprove of them - Google bans sites which are believed to be members of link exchanges or are involved in artificial ways of boosting links to themselves - presumably this includes some forms of reciprocated linking - Google is reported to have banned sites which offer incentives for other sites to link to them


search engines take steps to prevent people fooling them by using various spamming techniques - it may be possible to fool the search engines for a while using a variation of one of the many tricks already tried, but the major search engines change their ways of ranking sites almost daily, and they penalise sites which deliberately flout either the letter or spirit of their rules of fair play

a simple text-based page containing high quality original content with key-words appearing once in the title and two or three times in the page text is still the best free long-term method for getting high rankings in search engines which analyse page content


some reasons why submitted pages don't appear in search engine results -
  • submissions can take up to 6 months to be indexed
  • text in image format and java applets not indexed
  • frames may not be indexed (use non-frames intro page or <noframes> tag)
  • robots are blocked by registration and password requirements
  • robots blocked by robots meta tag or robots.txt file
  • dynamic pages (using "&" or "?" in URL) may not be indexed
  • flash sites may not be indexed
  • pages using redirects and meta refresh may not be indexed
  • pages take too long to download (slow connection or pages too big)
  • spiders may only follow internal links one level down
  • spider visits site when web server down (will remove previously indexed pages)
  • spamming techniques penalised (eg same colour text/background, hiden links etc)
  • site submissions to directories may be rejected after manual review
  • sites hosted on servers offering free web space may not be indexed
  • pages not linked to from the home page may not be indexed
  • some search engines limit the number of page submissions per day from a domain
  • non-working links may prevent a site being accepted
  • search engines sometimes accidently drop previously indexed sites
  • search engines may drop older pages in favour of newer ones
  • over-optimised pages may be penalised by Google
  • page links to affiliates and FFAs may lead to a submission being rejected
  • search engines may refuse submissions from a 'banned' web space provider
re the last point, some web hosting companies host several sites using the same IP number (virtual hosting) - if other sites using the same IP number are involved in spamming the search engines then all the sites using that IP number may be banned - also, IP numbers which are numerically close to an offending number may be banned

if all else fails, a new domain should be used, at the same time moving one's hosting to a new service provider (who will have a different set of IP numbers)


the following is from research at Penn State (June '03)

users typically visit only the first three results from a search query - when they have reached a web page, one in five searchers stays for 60 seconds or less - 40% of searchers will have left the pages within three minutes

54% of users view only one page of results in each session - 19% went on to the second page, and 10% looked at the third page of results - about 55% of users look at one result only - more than 80% stop after looking at three of the listed web pages

the description of the site in the search engine results needs to be as clear as possible about the purpose of the site - pages need to be well-designed, easy to load and relevant to a searcher's needs

so what do i know about it?

top of page  |  s/engines list  |  s/engine details  |  s/engine links  |  weed's home page

comments to weed@wussu.com
revised 2 January 2011
URL http://www.wussu.com/search/advice.htm