suai.ru/our-contacts |
quantum machine learning |
Searching for Information with Meet and Join Operators |
151 |
The inner product can be written as
n |
n |
|
x|y = |
xi yj ti |tj , |
(7) |
i=1 j =1 |
|
|
where ti |tj = 1 if and only if both document and query belong to the set corresponding to the intersection of sets ti and tj .
The choice of a function that assigns a weight (e.g., xi or yj ) can only be empirically selected. To the aim of Þnding the best weight function, a series of experiments led to the conclusion that some weight functions such as Term Frequency (TF) × Inverse Document Frequency (IDF) (TFIDF) can be more effective than others for the most part [21]. The weighting schemes utilized by the retrieval functions of the VSM are perhaps the most important component of a retrieval system. Indeed, the occurrence of terms in documents is insufÞcient to achieve high levels of effectiveness. In mathematical terms, the strength of expressions like (5) and (6) is provided by the xÕs and the yÕs rather than by the basis vectors |t and the ability of inner products like (7) to approximate relevance lies in the products x y since the inner products ti |tj are trivially either 0 or 1.
A comparison between the notation of a model based on sets and the notation of a model based on vector spaces is summarized in Table 1 which introduces the notion of projector and space, since each space corresponds to one and only one projector. The analogous correspondence between sets and weight function does not exist. The strength of the correspondence between spaces and the projector is that the latter can be represented by a matrix and it thus provides an algorithmic implementation of checking whether a vector belongs in a space; it is indeed sufÞcient to compute two inner products and check whether the result is 1. Thanks to the correspondence between projector and subspace, a space of H can be viewed as a set of vectors where the projector plays the role of the mechanism that checks whether a vector belongs to the subspace.
The traditional VSM for IR ignores lattice operators like meet and join; it only exploits inner products and represents documents and queries by using only one basis unless Latent Semantic Analysis (LSA) is utilized. One reason for this limitation might be due to the greater focus of VSM-based system designers on
(a) the least expert end users than on the users who are expert in their speciÞc knowledge domain, on the one hand, and (b) the simple and short queries submitted
Table 1 Comparison between sets and vector spaces is summarized
Set |
S |
Vector space |
H |
Subset |
a |
Subspace |
a |
Set element |
x |
Vector |
|x |
Weight function |
W |
Projector |
A |
Ranking function |
W (x, a) |
Projection size |
x|A|x |
Membership |
W (x, a) = 1 |
Projection |
x|A|x = 1 |
suai.ru/our-contacts |
quantum machine learning |
152 |
E. Di Buccio and M. Melucci |
to Þnd speciÞc resources, on the other hand. Although the least expert users would perceive little beneÞt from advanced vector operators, an IR may still be equipped with algorithms and data structures implementing these operators.
The role played by probabilistic models has become important in IR since the Boolean model lacks ranking capabilities and the end user has to face null output and output overload. The VSM succeeded in improving the userÕs experience because it provides some rankings, but Þnding the coefÞcients of the linear combinations has been an open problem for a long time and was mostly addressed through empirical investigations.
While weights are oftentimes provided by empirical investigations within the VSM, to the contrary, the probabilistic models provide weight functions with a sound theoretical basis such as Maximum Likelihood Estimation (MLE). A probabilistic model is currently a principled approach for providing the coordination level weights of which the BM25 is the most striking example. For instance, the socalled BIR owes its effectiveness to the Term Relevance Weight (TRW) function, which is a log-likelihood ratio from which BM25 was derived [20]. Statistical independence was further addressed by many authors, for example, in [6]. Similar and additional weight functions can be derived within the Language Modelling (LM) framework [7].
The probabilistic models organize an application domain as sets of single occurrencesÑelementary eventsÑof a process or phenomenon. Elementary events are then grouped in subsets through random variables and a probability measure maps a subset of events, i.e., random values, to the unit range [0, 1] in order to obtain a probability. In general, the elementary events are documents and the events correspond to logical combinations of terms, which are sets.
Suppose we are given n terms. There are 2n combinations of term occurrences, each corresponding to a subset of documents. Let x be one of these subsets. The probability p(x) = P (d x) that a relevant document d belongs to x can be estimated under the assumption of conditional independence of term occurrence, thus providing that
n |
|
p(x) = pixi (1 − pi )1−xi , |
(8) |
i=1 |
|
where xi { 0, 1 } denotes the occurrence of term ti and p is the probability that ti occurs in a relevant document.
Suppose that not only is occurrence observed, but a random variable Si (d) [0, 1] is also measured for each term ti and document d. In this context, x is a n- dimensional subset of [0, 1]n. A probability distribution of Si (d) can thus be deÞned
suai.ru/our-contacts |
quantum machine learning |
Searching for Information with Meet and Join Operators |
153 |
as follows: |
|
B(si (d))−1 pisi (d) (1−pi )1−si (d) B(s) = beta (1 − s, s + 1)−beta (1 − s, s + 2) .
(9)
Consider the probability distribution of Si (d) when d is non-relevant: |
|
|||||||
|
B(si (d))−1 qisi (d) (1 − qi )1−si (d). |
(10) |
||||||
The log-likelihood of the hypothesis testing relevance versus non-relevance is |
|
|||||||
|
P (d x|d is relevant) |
n |
|
|
pi (1 − qi ) |
|
|
|
log |
= i=1 |
s |
(d) log |
|
(11) |
|||
P (d x|d is not relevant) |
|
|||||||
|
i |
qi (1 − pi ) |
|
|||||
which is actually the BM25 scoring of d when si (d) is the saturation of ti in d. The advent of BM25 and the effective term weighting scheme thereof have made probabilistic models the state of the art.
Even though logic, vectorial and probabilistic approaches are three pillars of IR modeling, a strong relationship exists between them. In summary:
ÐThe Boolean logic model views documents and queries as members of sets corresponding to terms. The Boolean operators allow the end user to compose complex queries and express more elaborate concepts than those expressed by terms.
ÐThe VSM ensures that terms correspond to basis vectors and adds the inner product between the vectors representing the sets of the Boolean model to provide a ranking function of the documents with respect to a certain set of query terms.
ÐThe BM25 scoring enriches the inner product with weights given by the MLE of the p and q parameters of a Beta-like probability function of the saturation factor.
Not only can projectors be combined as explained in Sect. 2, but they can also be combined by operators called meet and join which signiÞcantly differ from the traditional set operators implemented by projectors. Consider a vector space V and two subspaces U, W thereof; we have the following deÞnitions.
Definition 5 (Meet) The meet of U and W is the largest subspace included by both U and W .
Definition 6 (Join) The join of U and W is the smallest subspace including both U and W .
suai.ru/our-contacts |
quantum machine learning |
154 |
E. Di Buccio and M. Melucci |
Meet and join only resembles intersection and union of sets. In fact, some properties of set operators cannot hold for meet and join anymore; for example, the distributive law holds for sets, but it does not for vector spaces.
From the point of view of information access, an interpretation of meet and join is necessary. The interpretation provided in this chapter for these two operators starts from the notion of basis. A basis vector mathematically represents a basic concept corresponding to data such as keywords or terms. In the event of a canonical basis, the basis vectors represent the starting vocabulary through which the content of documents and queries is produced and matched. When a document and a query or two documents share a concept their respective mathematical representations share a basis vector with non-null weight.
Consider the meet of two planes. The result of meeting two distinct planes is a ray, that is, a one-dimensional subspace. A one-dimensional subspace is spanned by a vector. Any vector can belong to any basis; indeed, the vector spanning the ray is the only vector of the basis of this subspace. As a basis vector can be a mathematical representation of a basic concept, the meet of two planes can be a mathematical representation of a basic concept. The planes meeting at the basis vector represent information sharing one concept, i.e., the concept represented by the meet, since the vector resulting from the meet of two planes may be a basis vector for both planes provided that each plane is spanned by a basis including the meet and another independent vector.
Consider the join of two distinct rays. The result of joining two rays is a plane, that is, a bi-dimensional subspace is spanned by two vectors. The subspace resulting from joining two rays is spanned by the vectors spanning the rays. The plane resulting from the join of two rays represents information based on two concepts, i.e., the concept represented by the basis vector of one ray and the concept represented by the basis vector of the other ray. Indeed, the vectors belonging to the plane resulting from the join of two rays are expressed by two basis vectors, each basis vector representing one individual concept.
However, it is safe to state that meet and join are only a mathematical representation and nothing can be argued about the meaning of what these two operators represent; we can nevertheless argue that if the planes meeting at the basis vector or the rays joined to a plane represent information sharing one concept or consisting of two concepts, respectively, the vector resulting from the meet of the two planes or the basis resulting from the join of two rays may be viewed as a sensible mathematical representation of complex concepts.
This section introduces the building blocks of a QTL. First, features and terms are introduced in Sect. 4.1. Then, Sect. 4.2 presents themes that are further exploited to rank documents as explained in Sect. 4.3. Finally, the composition of themes by
suai.ru/our-contacts |
quantum machine learning |
Searching for Information with Meet and Join Operators |
155 |
||
Table 2 Notations used in this chapter |
|
||
|
|
|
|
Symbol |
Meaning |
Comment |
|
|w |
Feature |
Textual keyword and other modality depending on media |
|
|t |
Term |
Unigrams, bigrams, trigrams, etc., such as information |
|
|
|
retrieval and quantum theory |
|
|τ |
Theme |
Expressions like information retrieval quantum theory |
|
|
|
or information retrieval quantum theory |
|
|φ |
Document |
Webpages, news, posts, etc. |
|
using meet and join is described in Sect. 4.4. Table 2 summarizes the notation used in this chapter.
Consider the features extracted from a collection of documents; for example, a word is a textual feature, the gray level of a pixel or a code word of an image is a visual feature, and a chroma-based descriptor for content-based music representation is an audio feature. Despite their differences, the features extracted from a collection of multimedia or multimodal documents can co-exist together in the same vector space if each feature is represented by a canonical basis vector. Consider k distinct features and the following.
Definition 7 (Term) Given the canonical basis1 |e1 , . . . |ek of a subspace over the real Þeld, a term is deÞned as
k |
ai R, |
|t = ai |ei = (a1, . . . , ak ) |
|
i=1 |
|
where the aÕs are the coefÞcients with respect to the basis. Therefore, terms are a combination of features; for example, if k = 2 textual features, say ÒinformationÓ and Òretrieval,Ó then Òinformation retrievalÓ is a term represented by
|information retrieval = ainformation |information + aretrieval |retrieval .
The main difference between features and terms lies in orthogonality, since the feature vectors assume mutual orthogonality whereas the term vectors only assume mutual independence. Non-orthogonal independence also distinguishes the QTL from the VSM, since term vectors might not beÑand they are often notÑorthogonal whereas keyword vectors are usually assumed orthogonal; for example,
1The i-th canonical basis vector has k − 1 zeros and 1 at the i-th component.