Query Log Analysis

The Behavior Characterization Framework (BCF; cf. Duarte, Hiemstra, & Serdyukov, 2010b; Duarte & Weber, 2011) has been developed in order to provide information on the queries submitted to search services by children. The BCF provides a set of metrics from which end-users may select those appropriate. The analysis of query logs can be performed at different levels of detail: single action level and session level. In the former, query and web clicks are analysed in isolation. In the latter, sequences of actions are grouped using a time window. The table shows different characteristics of the BCF which have been developed and tested within the PuppyIR evaluations.

Query related metrics

Length of queries

Younger children often submit shorter queries consisting of fewer tokens and shorter words (Duarte & Weber, 2011). Analyzing the length of queries submitted to a search service can provide information on the demographics of children using a search service. Analyzing global statistics, such as the number of unique tokens used in the queries submitted to the service, can give indications of the vocabulary size of the user group.

Metrics:

  • Average number of characters used in queries
  • Average number of tokens used in queries
  • Query length distribution across all queries
  • Number of unique tokens used in queries

Click related metrics

For PuppyIR services which provide a ranked list output then standard click distribution metrics can be used to understand where on the search results page users have clicked.

 Metrics:

  • Rank position distribution
  • Number of distinct pages clicked
  • Number of distinct URLs and domains clicked (for web search)

Query – Click pairs

Click entropy can be used to understand the variation in result pages clicked from a query (Weber & Jaimes, 2011). This metric can help understand if queries are ambiguous, and perhaps indicating if users need more support through query suggestions, or the type of query, e.g. (Duarte & Weber, 2011) show how the metric can detect navigational queries.

Sessions

Session based metrics, calculated by grouping consecutive user actions within a time window, can be useful to understand what users are trying to achieve during a search session.

Metrics:

  • Session activity: Number of entries in the session (queries, clicks submitted) can provide useful information on the level of user activity within a search session.
  • Session duration: the total amount of time spent on searching.
  • Query/Click re finding (Tyler & Teevan, 2010) is the amount of repeat behavior, i.e. issuing the same query within a session or revisiting the same page within a session. Such looping behavior has been seen as a common orienteering device by children (Bilal & Kirby, 2002).
  • Click duration distribution which can provide indications on how long children spend on reading information as opposed to query creation and modification (Hassan & Jones &  Klinkner, 2010)
  • Query reformulations patterns. Investigating reformulation patterns (adding terms to queries, removing terms, changing terms, etc) can provide information on where children need support in modifying queries or refining information needs.

Download sample of query log analysis scripts