Javascript XPath
(6 intermediate revisions by one user not shown) | |||
Line 15: | Line 15: | ||
− | Node selector | + | Node selector<br/> |
+ | |||
* Tag. E.g. ''book''. Reserved char '''*''' matches any element tag name | * Tag. E.g. ''book''. Reserved char '''*''' matches any element tag name | ||
* Attribute (starting with '@'. E.g. ''@type'' | * Attribute (starting with '@'. E.g. ''@type'' | ||
* Text: '''text()''' | * Text: '''text()''' | ||
* Child text: '''child::text()''' | * Child text: '''child::text()''' | ||
+ | |||
+ | |||
+ | Predicates<br/> | ||
Node selector can be followed by one or more predicates enclosed in square brackets: ''[]''.<br/> | Node selector can be followed by one or more predicates enclosed in square brackets: ''[]''.<br/> | ||
Predicates add more filtering criteria.<br/> | Predicates add more filtering criteria.<br/> | ||
Maximum number of predicates is 10. Parse will fail if the number of predicates exceeds this number.<br/> | Maximum number of predicates is 10. Parse will fail if the number of predicates exceeds this number.<br/> | ||
− | + | Predicate type: | |
* Index. Numeric, decimal digits | * Index. Numeric, decimal digits | ||
: 1 based index of node in parent. Allowed interval: [1..0xffffffff] (4 bytes unsigned integer excluding 0) | : 1 based index of node in parent. Allowed interval: [1..0xffffffff] (4 bytes unsigned integer excluding 0) | ||
Line 44: | Line 48: | ||
* Functions (match if true is returned) | * Functions (match if true is returned) | ||
: * '''matches(VARIABLE,regexp[,flags])''', '''notMatches(VARIABLE,regexp[,flags])'''. Regular expression (NOT) match | : * '''matches(VARIABLE,regexp[,flags])''', '''notMatches(VARIABLE,regexp[,flags])'''. Regular expression (NOT) match | ||
− | : Matches if current node has requested VARIABLE function returns true | + | : Matches if current node has requested VARIABLE and function returns true |
: Parameters: | : Parameters: | ||
− | : '''VARIABLE''': See VARIABLEs in comparison description | + | : '''VARIABLE''': Value to check. See VARIABLEs in comparison description |
: '''regexp''': String constant describing the regular expression. Delimiter MUST be XML escaped | : '''regexp''': String constant describing the regular expression. Delimiter MUST be XML escaped | ||
: '''flags''': String constant with regular expression flags: ''i''(case insensitive) ''b''(Basic POSIX regular expression). Delimiter MUST be XML escaped | : '''flags''': String constant with regular expression flags: ''i''(case insensitive) ''b''(Basic POSIX regular expression). Delimiter MUST be XML escaped | ||
+ | |||
+ | |||
+ | '''NOTE: For performance reasons it is recommended to use compiled XPath objects (already parsed).''' | ||
Latest revision as of 12:12, 12 September 2024
Contents |
[edit] General
An XPath describes a path in an XML element.
XPath items are nodes in XML tree.
Each item selects one or more XML node(s) (element, attribute, text) starting from previous item result.
An absolute path starts with slash (/). First item selector applies to root element.
A relative path (not starting with slash) behaves like XML root element was already selected. First item selector applies to it.
String constants (used with operators or in function parameters) MUST be enclosed using ' or " character. E.g. 'some_string'
or "some_string"
.
Delimiters MUST be escaped when present inside string value (e.g. some'_''string
)
- String literal (used with operators). Delimiter characters MUST be repeated:
'some''_''''string'
- Function parameters. Delimiter MUST be XML escaped:
'some'_''string'
Node selector
- Tag. E.g. book. Reserved char * matches any element tag name
- Attribute (starting with '@'. E.g. @type
- Text: text()
- Child text: child::text()
Predicates
Node selector can be followed by one or more predicates enclosed in square brackets: [].
Predicates add more filtering criteria.
Maximum number of predicates is 10. Parse will fail if the number of predicates exceeds this number.
Predicate type:
- Index. Numeric, decimal digits
- 1 based index of node in parent. Allowed interval: [1..0xffffffff] (4 bytes unsigned integer excluding 0)
- Index is handled by selector type, attributes indexes and child indexes are not overlapping
- Keyword last() - Identifies the last node (by type: element, attribute, text) in parent
- Comparison:
VARIABLE<OPERATOR>VALUE
- Operators:
- * = Equality operator. VARIABLE is the same as VALUE
- * != Inequality operator. VARIABLE is not the same as VALUE
- VALUE: string literal
- VARIABLEs. A VARIABLE defines the contents to be compared with VALUE
- * @attr_name. Use an attribute named attr_name of the current node. Fails if there is no attribute with this name
- * text(). Use current node XML text. Fails if there is not text
- * Else. VARIABLE is handled as XML tag. Check for an XML child element with given tag. Use its XML text. Fails if there is no child or there is no text in in it
- Variable presence
- * @attr_name. Matches if current node has an attribute named attr_name
- * text(). Matches if current node has a non empty xml text
- * Else. Value is handled as as XML tag. Match if current node has a child name with given tag
- Functions (match if true is returned)
- * matches(VARIABLE,regexp[,flags]), notMatches(VARIABLE,regexp[,flags]). Regular expression (NOT) match
- Matches if current node has requested VARIABLE and function returns true
- Parameters:
- VARIABLE: Value to check. See VARIABLEs in comparison description
- regexp: String constant describing the regular expression. Delimiter MUST be XML escaped
- flags: String constant with regular expression flags: i(case insensitive) b(Basic POSIX regular expression). Delimiter MUST be XML escaped
NOTE: For performance reasons it is recommended to use compiled XPath objects (already parsed).
[edit] Constructor
- new XPath(str,flags])
Build an XPath from string description and parse flags.
Parameters:
str String description
flags Flags used by parser
Flags: XPath.StrictParse Enable strict path parse.
- Path parse will fail in some conditions (e.g. found spaces where not expected or duplicate index in predicate)
- The following will fail if this flag is set:
-
book/author[1][1]
-
book/author [1]
XPath.IgnoreEmptyResult Ignore (do not check) empty result when parsing steps.
- Path parse will fail if a step would not select anything (e.g. previous step select an XML text: there is nothing after it)
- The following will fail if this flag is not set:
-
book/text()/author
- An xml text can't have a child -
book/author[1][2]
- An xml child can't be in first and second position
XPath.NoXmlNameCheck Do not check XML element tag or attribute name for valid XML charcaters.
- Path parse will fail if this flag is not set and an invalid character is found in string
- The following will fail if this flag is not set:
book/&author
- new XPath(strOrXPath)
Build an XPath from string description or XPath object.
Parameters:
strOrXPath String description or XPath object (copy held XPath description only)
[edit] Static Methods
- escapeString(str[quot[,literal=true]])
Escape a string to be used in an XPath expression.
This function should be used when building an XPath from pieces.
Parameters:
str String to escape
quot Optional string quoting (enclose) character. Allowed: ' or ". Default: "
literal True if string is going to be used as literal (e.g. in comparison), false XML string match (will be XML escaped)
Return: Escaped string
var str = "\"Literal\"<XML>"; var literal = XPath.escapeString(str); var xml = XPath.escapeString(str,undefined,false); // literal: """Literal""<XML>" // xml: ""Literal"<XML>"
[edit] Methods
- valid()
Check if path is valid.
Return true if path is valid, false if not (parse failed).
- absolute()
Check if path is absolute.
Return true if path is an absolute one, false if not.
- getPath()
Retrieve the path string description.
Return string if path is valid, null if not.
- getItems([escape])
Retrieve the path items (steps).
Parameters:
escape Boolean. True to escape strings, false to return unescaped strings. Default: true
Return array of strings if path is valid, null if path is not valid.
var x = new XPath("book[@attr='''Literal']/author[matches(text(),'<XML>')]"); var escaped = x.getItems(); // ["book[@attr='''Literal']","author[matches(text(),'<XML>')]"] var unescaped = x.getItems(false); // ["book[@attr=''Literal']","author[matches(text(),'<XML>')]"]
- getError()
Retrieve an object describing the path parse error.
Return object if path is not valid, undefined if path is valid.
Properties:
status Integer. Internal failure code
errorItem Integer. Index of failed path item
error String. Error description. May not be present
- describeError()
Retrieve a string describing the path parse error.
Return string if path is not valid, undefined if path is valid.
[edit] Static Properties
- FindXml, FindText, FindAttr, FindAny
Flags to be used in XML.getAnyByPath() function.
- StrictParse, IgnoreEmptyResult, NoXmlNameCheck
Parser flags to be used when building an XPath from string
[edit] Examples
In code we assume a common init:
var xml = new XML("bookstore"); // Fill children ... var path = new XPath(sample_path);
- *
- /*/*
Match all children of root element.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = xml.getChildren();
- /bookstore/*
Match all children of root element if root element tag is bookstore.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = null; if ("bookstore" == xml.getTag()) arr = xml.getChildren();
- book
Match all children having the tag book.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = xml.getChildren("book");
- *[1]
Match first child element.
XML with XPath function:
var child = xml.getChildByPath(path);
XML function(s):
var child = xml.getChild();
- *[2]
Match second child element.
XML with XPath function:
var child = xml.getChildByPath(path);
XML function(s):
var child = null; var arr = xml.getChildren(); if (arr) child = arr[1];
- *[last()]
Match last child element.
XML with XPath function:
var child = xml.getChildByPath(path);
XML function(s):
var child = null; var arr = xml.getChildren(); if (arr.length) child = arr[arr.length - 1];
- book[2][last()]
Match second child element with book tag only if its the last one.
XML with XPath function:
var child = xml.getChildByPath(path);
XML function(s):
var child = null; var arr = xml.getChildren(); if (2 == arr.length) child = arr[1];
- book[author]
Match all book children having an author child with non empty text.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { for (var ch of children) { var authors = xml.getChildren("author"); if (authors.length) { for (var author of authors) { if (author.getText()) { arr.push(ch); break; } } } } }
- book/author
Match all author children of all book children.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { for (var ch of children) { var authors = xml.getChildren("author"); if (authors.length) arr = arr.concat(authors); } }
- book[@category='generic'][author='Some Name'][year='2000']
Match all book children having a category=generic attribute an author child with specified text value and an year child with specified value.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { for (var ch of children) { if ("generic" != ch.getAttribute("category")) continue; var ok = false; var authors = xml.getChildren("author"); if (authors.length) { for (var author of authors) { if ("Some Name" != author.getText()) continue; var years = author.getChildren("year"); if (years.length) { for (var year of years) { if ("2000" == year.getText()) { ok = true; break; } } if (ok) break; } } } if (ok) arr.push(ch); } }
- book[@category]
Match all book children having a category attribute.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { for (var ch of children) { if (null !== ch.getAttribute("category")) { arr.push(ch); break; } } }
- book[@category='web']
Match all book children having a category attribute with web value.
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { for (var ch of children) { if ("web" == ch.getAttribute("category")) { arr.push(ch); break; } } }
- book[matches(@category,'^WeB$','i')]
Match all book children having a category attribute with web value (case insensitive).
XML with XPath function:
var arr = xml.getChildrenByPath(path);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { var rex = /^WeB$/i; for (var ch of children) { if (rex.test(ch.getAttribute("category"))) { arr.push(ch); break; } } }
- book/child::text()
- book/*/text()
Match the text of all children of all book children.
XML with XPath function:
var arr = []; xml.getAnyByPath(path,arr,XPath.FindText);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { var chs = xml.getChildren(); if (chs.length) { for (var ch of chs) arr.push(ch.getText()); } }
- book/author/text()
Match the text of all author children of all book children.
XML with XPath function:
var arr = []; xml.getAnyByPath(path,arr,XPath.FindText);
XML function(s):
var arr = []; var children = xml.getChildren("book"); if (children.length) { var authors = xml.getChildren("author"); if (authors.length) { for (var author of authors) arr.push(author.getText()); } }
[edit] References
- https://www.w3.org/TR/xpath-30/
- https://www.w3schools.com/xml/xpath_intro.asp
- https://en.wikipedia.org/wiki/XPath