title: Optimizing a large dynamically generated website for search engine crawling and ranking
author: Johan Köhne
published in: 2006
appeared as: Master of Science thesis
Man-machine interaction group
Delft University of Technology
PDF (4458 KB)


In the past decade, businesses have started to use the internet as a sales channel. Naturally, the number of online visitors determines conversion rates and therefore sales. With the evolution of search engines, the large amount of traffic generated by these has become very important for online sales channels.
Fredhopper is a company that developed a software solution for large online sales channels. Their Fredhopper Access Server allows fast and intuitive navigation and search within any set of products. Clients may include this software in their own web front-end, using it as a webservice, or they may allow users to directly browse the catalogue on the Fredhopper Access Server. In the latter case, openly exposed to the internet, search engines were found to have trouble indexing the catalogue as presented, resulting in minimal incoming traffic from search engines: a missed opportunity with a financial impact.
Literature describes how the inability of search engines to effectively crawl and index dynamically generated websites is a common issue, which is hard to solve on the search engine side. The work presented in this thesis analyzes the underlying causes of the issue, and provides design guidelines for possible solutions, which are both universally applicable to (large) dynamically generated websites. The analysis consists of a broad review of crawler techniques and search engine ranking algorithms. Proposed improvements and design heuristics include techniques as URL rewriting, site structure simulation and keyword optimization. A Fredhopper specific solution is derived and implemented for a representative live case. The effectiveness of the solution is substantiated by the results of empirical studies, which show a large improvement in page crawlability and prove a high potential for improved ranking.

