Here's an example set of parameters for working with the ShrinkageBeliefNode in the NEXI supporting retrieval model. Experiments with some of the smoothing techniques available to the user through this parameter specification format are available in this paper. Note that the parameter estimation presented in the paper is not yet a part of the toolkit, but the ranking model described in Equation 1 of the Section 2 is supported by the code.
<parameters> <rule>method:linear,lambda:0.5</rule> <rule>node:ShrinkageBelief,parentWeight:0.1,docWeight:0.4,recursive:false</rule> <rule>node:ShrinkageBelief,field:p,weight:0.2,length:true</rule> <rule>node:ShrinkageBelief,field:st,weight:0.4,length:true</rule> </parameters>
The first rule specifies that the statistical language models for components should be estimated as a linear interpolation with the collection model where lambda=0.5. The other three parameters specify how that language model is then combined with other components' language models in the document hierarchy.
The second rule states that the component should be estimated as 0.1 P(w|parent) + 0.4 P(w|document) + 0.5 (a P(w|component) + sum_children[ b_child P(w|child)]
Note that P(w|parent), P(w|document), P(w|component), and P(w|child) have all been smoothed with the collection model according to the first smoothing rule. This is a little different from the presentation in the paper, where the smoothing with the collection model is an explicitly stated component in the above. However, they are equivalent models, but you must be careful with your parameter specification. If you have in the paper: lambda_C = 0.5 lambda_D = 0.2 lambda_P = 0.1 lambda_O = 0.2 Then in the parameter file, you should specify
<rule>method:linear,lambda:0.5</rule> <rule>node:ShrinkageBelief,parentWeight:0.2,docWeight:0.4,recursive:false</rule>as parentWeight = lambda_P / (1 - lambda_C) docWeight = lambda_D / (1 - lambda_C)
The recursive:false part of the parameter specifies that the P(w|parent), P(w|document) and P(w|child) should be taken from the pre-shrinkage estimates. Setting recursive:true would do recursive smoothing and use the already smoothed along the hierarchy language models for combination.
The last two parameters specify how the a and b_child weights are set. In
<rule>node:ShrinkageBelief,field:p,weight:0.2,length:true</rule> <rule>node:ShrinkageBelief,field:st,weight:0.4,length:true</rule>the length:true states that the weights are used relative to the length of the child components, which results in a combination:
a = length(component) / z_component b_child = alpha_type(child) * length(child) / z_component z_component = length(component) + sum_children[ alpha_type(child) * length(child) ]
In the above, alpha_type(child) is the weight corresponding to the type of the child. For example, if the child is a "p", then the alpha_type(child) = 0.2. This method of smoothing effectively allows the user to place extra weight on word occurring in child nodes of certain types. If a rule for a field type is not specified in the parameter file, its weight is by default 0.
If the length:false is set, then the combination is of the form:
a = 1 - sum_children[ b_child ] b_child = alpha_type(child)and the code will fail if sum_chidlren[ b_child ] > 1. This allows the user to place a fixed weight on children components of certain types.
The Lemur Project
Last modified: Wednesday, 14-Dec-2005 10:21:53 EST