![]() |
|
| Webmaster Resources: Search Engine Optimization Information | |
|
|
From Corpora to Matching
Making effective use of the Internet is increasingly about creating better and more intelligent applications and search engines. Here is a brief introduction into how search engines work: 01) Define the corpus, search space/data; Each document in a corpus (database) is described by a set of keywords called index terms. We assign weights to index terms according to their relevance (frequency of occurrence for instance), this is how we go about creating the index, that we can then search. Corpus preparation: Extract terms of interest: Build term-by-document matrix: Each document becomes a column vector, each row represents a term. Each row identifies the frequency of a term across the analysed corpus, at first we simply build the matrix by counting the terms for each document. Compress the matrix: Normalis the matrix: Unit document vectors contain frequency of terms; the normalisation is applied because the semantic content of a document is generally determined the relative frequency of terms. Singular Value Decomposition: A geometric interpretation: The term-by-document matrix is then decomposed to calculate eigen values and vectors. Eigen vectors represent a new Cartesian coordinate frame spanning the same search space, BUT, they indicate the most important dimenions/axis along which documents mainly lie. Eigen value do quantify the spread of documents along these new axes/eigen vectors. Queries: © I am the website administrator of the Wandle industrial museum (http://www.wandle.org). Established in 1983 by local people determined to ensure that the history of the valley was no longer neglected but enhanced awareness its heritage for the use and benefits of the community.
Add to these social bookmarking sites: MORE RESOURCES: Warning: fopen(http://news.google.com/news?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2005-22,GGLG:en&q=SEO&output=rss) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.0 503 Service Unavailable in /hermes/bosweb/web239/b2397/glo.hhsoft/webmaster-resources/seo/inc/rss.inc on line 81 could not open XML input |