
title: | Optimizing a large dynamically generated website for search engine crawling and ranking |
author: | Johan Köhne |
published in: | 2006 |
appeared as: |
Master of Science thesis Man-machine interaction group Delft University of Technology |
PDF (4458 KB) |

Abstract
In the past decade, businesses have started to use the internet as a sales channel. Naturally, the
number of online visitors determines conversion rates and therefore sales. With the evolution of
search engines, the large amount of traffic generated by these has become very important for online
sales channels.
Fredhopper is a company that developed a software solution for large online sales channels. Their
Fredhopper Access Server allows fast and intuitive navigation and search within any set of products.
Clients may include this software in their own web front-end, using it as a webservice, or they may
allow users to directly browse the catalogue on the Fredhopper Access Server. In the latter case, openly
exposed to the internet, search engines were found to have trouble indexing the catalogue as
presented, resulting in minimal incoming traffic from search engines: a missed opportunity with a
financial impact.
Literature describes how the inability of search engines to effectively crawl and index dynamically
generated websites is a common issue, which is hard to solve on the search engine side.
The work presented in this thesis analyzes the underlying causes of the issue, and provides design
guidelines for possible solutions, which are both universally applicable to (large) dynamically
generated websites. The analysis consists of a broad review of crawler techniques and search engine
ranking algorithms. Proposed improvements and design heuristics include techniques as URL
rewriting, site structure simulation and keyword optimization. A Fredhopper specific solution is
derived and implemented for a representative live case. The effectiveness of the solution is
substantiated by the results of empirical studies, which show a large improvement in page
crawlability and prove a high potential for improved ranking.