Ranking Algorithm

Introduction

Ranking forms an integral part of various information retrieval issues like online advertising, sentiment analysis, collaborative filtering and retrieval of documents.

Training data comprises of documents and queries matching them all together with a certain degree of relevance associated with every match. Human assessors might prepare them manually and the results are constantly checked for queries and the relevance for every result is also determined (Liu, 2009). It not a practical job to assess the relevance for every document which is why a method called pooling is implemented. Here, only some of the top documents are retrieved by checking few of the current ranking models. Sometimes training data is automatically derived by assessing click-through logs and query chains etc.

Discussion

The development and marketing of Boolean systems can be traced back to about 30 years when the power to compute was minimal. Thus, the systems needed the users to provide adequate syntactical limitations in their queries for limiting the documents that were retrieved. These documents have not been ranked in any particular order with respect to the queries of the users. The Boolean systems is known to offer strong on-line search options to librarians and intermediaries but their service options for the end-users is very poor (Xia et.al, 2008). These users are acquainted with the data set’s terminology being searched by them but lack the practice and training required for getting good outcomes from the system due to complicated query syntax needed by the systems. Having a ranking approach for retrieval is highly oriented towards its end-users. This helps the users to enter a simplified query like a phrase or sentence and acquire the set of documents that are ranked as per relevance.

The reason behind the popularity of the ranking/natural language approach is its effectiveness for the users as every term associated with the query is utilized for retrieval and the results are ranked as per co-occurrence of the terms used in the query. This approach discards the Boolean syntax utilized by the users and creates a result even when the query terms are wrong. This approach is also suitable for complicated queries which are complicated for the users for expressing the Boolean logic.

The Vector Space Model

The query vectors and sample document discussed in section 14.2 are considered as an n-dimensional vector space. Here, n matches with the total number of unusual items present within the data set. Thus, a vector matching activity depending on cosine correlation for measuring the angle cosine existing between the vectors can be used for measuring the similarity between the query and the document (Bradley, 1997)

Here,

tdij = the i^thterm in the vector for document j

tqik = the i^thterm in the vector for query k

n = the number of unique terms within the data set

The model is used basically for various ranking retrieving experiments mainly for SMART system experiments taking place under Salton and allied associates. The ranking experiments began in the year 1964 at the Harvard University and shifted to Cornell University in the year 1968 forming a larger part of the studies taking place in information retrieval. These experiments cover multiple information retrieval areas like phrases, synonyms, clustering and relevance feedback.

Probabilistic Models

Maron and Kuhns proposed as well as tested a model based on probabilistic indexing in the year 1960 but the most commonly used model had been designed by Robertson and Sparck Jones in the year 1976. The model is based on the concept that the terms used in a document retrieved previously for a particular query must be highly weighed than the ones that did not appear in the relevant documents. The table given below is presents the term t distribution in both non-relevant and relevant documents pertaining to query q (Brazdil et.al, 2003)

N = the number of documents within the collection

R = count of documents to be used for query q

n = count of documents that have mention of term t

r = count of documents that have mention of term t

This table is used for deriving 4 formulas which indicate the relative distribution of different terms within the non-relevant and relevant documents using them for term-weighting.

References

Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7), 1145-1159.

Brazdil, P. B., Soares, C., & Da Costa, J. P. (2003). Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3), 251-277.

Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3), 225-331.

Xia, F., Liu, T. Y., Wang, J., Zhang, W., & Li, H. (2008, July). Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning(pp. 1192-1199). ACM.

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot. According to their documentation, whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__hssc	30 minutes	This cookie is set by HubSpot. The purpose of the cookie is to keep track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
ss	session	This cookie is set by the provider Eventbrite. This cookie is used for the functionality of website chat-box function.
TawkConnectionTime	session	This cookie is set by Tawk.to which is a live chat functionality. The cookie is used to remember users so that previous chats can be linked together to provide better and improved service.

Cookie	Duration	Description
__hstc	1 year 24 days	This cookie is set by Hubspot and is used for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-155867490-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
hubspotutk	1 year 24 days	This cookie is used by HubSpot to keep track of the visitors to the website. This cookie is passed to Hubspot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
CONSENT	16 years 6 months 10 days 11 hours	No description
li_gc	2 years	No description
srp	session	No description available.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
yt-remote-connected-devices	never	No description available.
yt-remote-device-id	never	No description available.

Ranking Algorithm

Search

Recent Posts

Recent Comments

Archives

Categories

Meta

Ranking Algorithm

Related posts:

Search

Recent Posts

Recent Comments

Archives

Categories

Meta

Jetzt kostenlos mit dem Preismonitoring und der Optimierung starten!

Überwachen Sie die Wettbewerbspreise und optimieren Sie Ihren Shop!