Select one of the examples. The corresponding XML file will be rendered. Each XML file
is associated with a secret XPath expression that selects some nodes (elements, text nodes,
comments, processing instructions, and/or attributes).
You need to enter an XPath expression such as /*
in order to make a first guess.
For selecting namespaced elements with a prefix, such as xsl:for-each
in the XSLT example,
you can enter //xsl:for-each
, //Q{http://www.w3.org/1999/XSL/Transform}for-each
,
or //*:for-each
. Elements without a prefix can be selected without namespace if they
are in the same namespace as the top-level element. If a prefix is declared on any
element, you can also use this prefix in the expression. An example is the Atom/CAP feed in the
2022-06-24 daily challenge where you can use cap:*
.
The prefix xs
with a binding for http://www.w3.org/2001/XMLSchema
is always available, no matter whether declared in the document or not.
You cannot select //namespace-node()
(well, you can, but they won’t be displayed).
You cannot select processing instructions or comments that are outside the top-level element, either.
As a result of evaluating your guess, the number of selected items will be displayed so you can see whether at least the item count matches. But you are not finished until you selected exactly the same items that the secret expression selected, or until you run out of attempts.
In order to see how well your guess-selected items match, the items will be highlighted in the rendered XML (you might need to click into the rendering area and scroll down or sideways). The match quality will be signalled by color code and by tool tip. The tool tip contains both the hierarchical XPath to the item and the distance to the closest target item.
The number of items highlighted is limited to min((max((2 * count($secret-items), 4)), 20))
(that is, normally double the secret count; minimum of 4 items, maximum of 20).
Otherwise you’d be able to select every node and quickly come up with an XPath expression that
selects only the green ones. Only when your guess selects the same items as the secret expression,
the number of highlighted items won’t be limited any more.
If you check the “Format & indent” box, the input document will be serialized with indent=true
and parsed again. This might lead to a code rendition that is easier graspable. You can try this feature
on the Saxonica Blog Atom feed.
Be aware that checking/unchecking the format & indent box after changing the XPath guess will increase your guess count. If you don’t change the guess, the guess count won’t change when you check/uncheck the box (or when you hit Submit).
The distance calculation scales with the product of secretly selected items and the items you guess. Therefore it can take a couple of seconds if these counts are around 30 or higher, depending on your computer. All calculations will be carried out by XSLT 3.0 in the browser (see the links at the bottom of the page).
The distance “metric” (which is not a metric, see below) works as follows for a given pair of items:
The following diagram shows several paths between nodes and indicates the corresponding distances using the same color.
The last requirement makes this a non-metric because it violates the triangle inequality. Suppose you have
a list with 5 items. If it weren’t for the last requirement, the distance between the first and the last item
would be 2: Go up one step to the list element, go down one step to the fifth item. Applying the last requirement,
you won’t go up because the first item is already the penultimate element on its ancestor-or-self
axis. Instead, you go sideways to the fifth item, and the distance then is 4. The individual distances for
each list item to the list element equal 1, so the sum of both distances is 2. This violates the triangle
inequality that says that the sum of the distances to some third point is at least as large as the direct
distance between two points.
This non-metric was chosen because it seemed counter-intuitive that, for example, the first and the last
paragraph of a novel have a distance of 4: Go up to the common ancestor, which might be body
,
passing a chapter
element, then go down through another chapter
to the last paragraph
of the novel. On the other hand, using the number of intermediate nodes as a metric would make guessing a secretly
selected item much easier: You take your first guess and go that many nodes back or forth, like in
let $guess := //p[10] return (//node()[$guess << .])[24]
where your initial guess
was //p[10]
, the reported distance was 24 and you move 24 nodes forward in order to select
an item of the secret set.
The “metric” applied here seeks to strike the balance between being intuitive and being ambiguous enough to avoid easy wins.
Querying the document without selecting any nodes yields a warning, but doesn’t count as an attempt.
Try this expression in order to get a list of element names and their frequencies, name and frequency
separated by ~
, sorted by frequency:
might yield (when applied to this HTML document):
Warning (code XPathle03): path~31 text~29 tspan~29 g~25 p~19 rect~17 code~15 li~10 span~7 a~7 div~6 h3~5 ellipse~4 ul~3 script~2 html~1 head~1 title~1 meta~1 link~1 body~1 h1~1 details~1 summary~1 svg~1 defs~1 input~1
is not a node from the document.
It has been said that at most two times the secretly selected items (or 20, whatever is the lower number; at least 4) will be highlighted.
Suppose that, as it is in the “HTML landing page for the transpect documentation” case, 33 items have been secretly selected.
Then you can get hints for at most 20 guessed items. You can select 20 evenly spaced (in terms of position()
) elements
below the document node:
In that expression, replace 20 by two times the secret items if they are fewer than 10.
Once you identify a specific element as the common ancestor of the candidates, you can set $start
to that element.
In the landing page example, this common ancestor might be //body
but this wouldn’t be a significant improvement over /
.
In other documents, the situation may be different.
The scatter method doesn’t work well in large documents with few secret items. If there’s just one secret item, you can randomly select an element at about a third of the document and another guess at two thirds. But it might be as good a first guess as anything else.
If the document contains many siblings on the same level, for example xs:element
, xs:simpleType
,
and xs:complexType
in the XSD that is the 2022-06-23 daily challenge,
you can select as many of them as can be highlighted and see whether some of them are closer to target items than others.
A modified scatter expression might look like this:
In this expression, we want to skip initial xs:annotation
and xs:import
elements. Therefore
we don’t use mod $dist = $dist idiv 2
which would select an item in the middle of each evenly spaced selected item groups. Specifying
mod $dist = 0
instead causes the first $dist
elements to be skipped.
Also note that $dist := $count idiv 4
although only one item has been selected by the secret
expression. We didn’t use two times this count, which would amount to just two items, because the minimum number
of highlighted items is set to 4
, see above.
Once you have a candidate item that has the shortest distance of all equally scattered candidates, you can look at the preceding or following siblings whether one of them is even closer, like so:
Then you can either reassign $start
to that candidate and apply the same procedure to its children or grandchildren,
or you can enter a new expression that addresses the candidate more directly, such as:
Then you are only 2 items away from the solution (in that example).
@select
attributes,
might not be reproduced correctly. This is because XML parsers are required to turn newlines in attributes into
plain spaces, which is a pity. As a heuristic, if there are at least 5 spaces in a row in an attribute value,
it is assumed that the first space used to be a newline prior to parsing, and it will be converted to
a newline in the rendered XML. Of course this heuristic may fail if indentation was done using tabs,
if there were spaces before the newlines, or if there was no newline at all.