|title:||Optimizing a large dynamically generated website for search engine crawling and ranking|
Master of Science thesis
Man-machine interaction group
Delft University of Technology
|PDF (4458 KB)|
In the past decade, businesses have started to use the internet as a sales channel. Naturally, the
number of online visitors determines conversion rates and therefore sales. With the evolution of
search engines, the large amount of traffic generated by these has become very important for online
Fredhopper is a company that developed a software solution for large online sales channels. Their Fredhopper Access Server allows fast and intuitive navigation and search within any set of products. Clients may include this software in their own web front-end, using it as a webservice, or they may allow users to directly browse the catalogue on the Fredhopper Access Server. In the latter case, openly exposed to the internet, search engines were found to have trouble indexing the catalogue as presented, resulting in minimal incoming traffic from search engines: a missed opportunity with a financial impact.
Literature describes how the inability of search engines to effectively crawl and index dynamically generated websites is a common issue, which is hard to solve on the search engine side. The work presented in this thesis analyzes the underlying causes of the issue, and provides design guidelines for possible solutions, which are both universally applicable to (large) dynamically generated websites. The analysis consists of a broad review of crawler techniques and search engine ranking algorithms. Proposed improvements and design heuristics include techniques as URL rewriting, site structure simulation and keyword optimization. A Fredhopper specific solution is derived and implemented for a representative live case. The effectiveness of the solution is substantiated by the results of empirical studies, which show a large improvement in page crawlability and prove a high potential for improved ranking.