In this Java XPath tutorial, we will learn what is XPath library, what are XPath data types and learn to create XPath expression syntax to retrieve information from XML file or document. This information can be XML nodes or XML attributes or even comments as well.Table of Contents 1. What is XPath? 2. XPath Data Model 3. XPath Data Types 4. XPath Syntax 5. XPath Expressions 6. Recommended reading
We will use this XML in running various XPath examples in this tutorial.
inventory.xml
<?xml version="1.0" encoding="utf-8" ?> <inventory> <!--Test is test comment--> <book year="2000"> <title>Snow Crash</title> <author>Neal Stephenson</author> <publisher>Spectra</publisher> <isbn>0553380958</isbn> <price>14.95</price> </book> <book year="2005"> <title>Burning Tower</title> <author>Larry Niven</author> <author>Jerry Pournelle</author> <publisher>Pocket</publisher> <isbn>0743416910</isbn> <price>5.99</price> </book> <book year="1995"> <title>Zodiac</title> <author>Neal Stephenson</author> <publisher>Spectra</publisher> <isbn>0553573862</isbn> <price>7.50</price> </book> </inventory> |
1. What is XPath
XPath is a syntax used to describe parts of an XML document. With XPath, you can refer to the first element, any attribute of the elements, all specific elements that contain the some text, and many other variations. An XSLT style-sheet uses XPath expressions in the match and select attributes of various elements to indicate how a document should be transformed.
XPath can be sometimes useful while testing web services using XML for sending request and receiving response.
XPath uses language syntax much similar to what we already know. The syntax is a mix of basic programming language expressions (wild cards such as $x*6) and Unix-like path expressions (such as /inventory/author).
In addition to the basic syntax, XPath provides a set of useful functions (such as count() or contains(), much similar to utility functions calls) that allow you to search for various data fragments inside the document.
2. XPath Data Model
XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model i.e. DOM tree, so if you’re familiar with the DOM, you will easily get some understanding of how to build basic XPath expressions.
There are seven kinds of nodes in the XPath data model:
- The root node (Only one per document)
- Element nodes
- Attribute nodes
- Text nodes
- Comment nodes
- Processing instruction nodes
- Namespace nodes
2.1. Root Node
The root node is the XPath node that contains the entire document. In our example, the root node contains the <inventory> element. In an XPath expression, the root node is specified with a single slash ('/').
2.2. Element Nodes
Every element in the original XML document is represented by an XPath element node.
For example in our sample XML below are element nodes.
- book
- title
- author
- publisher
- isbn
- price
2.3. Attribute Nodes
At a minimum, an element node is the parent of one attribute node for each attribute in the XML source document. These nodes are used to define the features about a particular element node.
For example in our XML fragment “year” is an attribute node.
2.4. Text Nodes
Text nodes are refreshingly simple. They contain text from an element. If the original text in the XML document contained entity or character references, they are resolved before the XPath text node is created.
The text node is text, pure and simple. A text node is required to contain as much text as possible. Remember that the next or previous node of a text node can’t be another text node.
For example, all values in our XML fragment are text nodes e.g. “Snow Crash” and “Neal Stephenson“.
2.5. Comment Nodes
A comment node is also very simple—it contains some text. Every comment in the source document becomes a comment node. The text of the comment node contains everything inside the comment, except the opening <!-- and the closing -->.
For example:
<!--Test is test comment--> |
2.6. Processing Instruction Nodes
A processing instruction node has two parts, a name (returned by the name() function) and a string value. The string value is everything after the name <?xml, including white space, but not including the ?> that closes the processing instruction.
For example:
<?xml version="1.0" encoding="utf-8"?> |
2.7. Namespace Nodes
Namespace nodes are almost never used in XSLT style sheets; they exist primarily for the XSLT processor’s benefit.
Remember that the declaration of a namespace (such as xmlns:auth=”http://www.authors.net”), even though it is technically an attribute in the XML source, becomes a namespace node, not an attribute node.
3. XPath Data Types
In Java, an XPath expression may return one of following data types:
- node-set – Represents a set of nodes. The set can be empty, or it can contain any number of nodes.
- node (Java support it) – Represents a single node. This can be empty, or it can contain any number of child nodes.
- boolean – Represents the value true or false. Be aware that the true or false strings have no special meaning or value in XPath; see Section 4.2.1.2 in Chapter 4 for a more detailed discussion of boolean values.
- number – Represents a floating-point number. All numbers in XPath and XSLT are implemented as floating-point numbers; the integer (or int) datatype does not exist in XPath and XSLT. Specifically, all numbers are implemented as IEEE 754 floatingpoint numbers, the same standard used by the Java float and double primitive types. In addition to ordinary numbers, there are five special values for numbers: positive and negative infinity, positive and negative zero, and NaN, the special symbol for anything that is not a number.
- string – Represents zero or more characters, as defined in the XML specification.
These datatypes are usually simple, and with the exception of node-sets, converting between types is usually straightforward. We won’t discuss these datatypes in any more detail here; instead, we’ll discuss datatypes and conversions as we need them to do specific tasks.
4. XPath Syntax
XPath uses UNIX and regex kind syntax.
4.1. Select nodes with xpath
EXPRESSIONDESCRIPTION
nodename | Selects all nodes with the name “nodename“ |
/ | Selects from the root node |
// | Selects nodes in the document from the current node that match the selection no matter where they are |
. | Selects the current node |
.. | Selects the parent of the current node |
@ | Selects attributes |
4.2. Use predicates with xpath
Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
We will learn how to use them in the next section.
4.3. Reaching unknown nodes with xpath
XPath wildcards can be used to select unknown XML elements.
WILDCARDDESCRIPTION
* | Matches any element node |
@* | Matches any attribute node |
node() | Matches any node of any kind |
4.4. XPath Axes
An axis defines a node-set relative to the current node. Following are axes defined by default.
AXISNAMERESULT
ancestor | Selects all ancestors (parent, grandparent, etc.) of the current node |
ancestor-or-self | Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself |
attribute | Selects all attributes of the current node |
child | Selects all children of the current node |
descendant | Selects all descendants (children, grandchildren, etc.) of the current node |
descendant-or-self | Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself |
following | Selects everything in the document after the closing tag of the current node |
following-sibling | Selects all siblings after the current node |
namespace | Selects all namespace nodes of the current node |
parent | Selects the parent of the current node |
preceding | Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes |
preceding-sibling | Selects all siblings before the current node |
self | Selects the current node |
4.5. XPath Operators
Below is a list of xpath operators that can be used in XPath expressions:
OPERATORDESCRIPTIONEXAMPLERETURN VALUE
| | Computes two node-sets | //book | //cd | Returns a node-set with all book and cd elements |
+ | Addition | 6 + 4 | 10 |
- | Subtraction | 6 – 4 | 2 |
* | Multiplication | 6 * 4 | 24 |
div | Division | 8 div 4 | 2 |
= | Equal | price=9.80 | true if price is 9.80 false if price is 9.90 |
!= | Not equal | price!=9.80 | true if price is 9.90 false if price is 9.80 |
< | Less than | price<9.80 | true if price is 9.00 false if price is 9.80 |
< = | Less than or equal to | price< =9.80 | true if price is 9.00 false if price is 9.90 |
> | Greater than | price>9.80 | true if price is 9.90 false if price is 9.80 |
>= | Greater than or equal to | price>=9.80 | true if price is 9.90 false if price is 9.70 |
or | or | price=9.80 or price=9.70 | true if price is 9.80 false if price is 9.50 |
and | and | price>9.00 and price<9.90 | true if price is 9.80 false if price is 8.50 |
mod | Modulus (division remainder) | 5 mod 2 | 1 |
5. XPath Expressions
Let's try to retrieve different parts of XML using XPath expressions and given data types.
XPathTest.java
package xml;
import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpression; import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document; import org.w3c.dom.NodeList;
public class XPathTest { public static void main(String[] args) throws Exception { //Build DOM
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); // never forget this! DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse("inventory.xml");
//Create XPath
XPathFactory xpathfactory = XPathFactory.newInstance(); XPath xpath = xpathfactory.newXPath();
System.out.println("n//1) Get book titles written after 2001");
// 1) Get book titles written after 2001 XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()"); Object result = expr.evaluate(doc, XPathConstants.NODESET); NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//2) Get book titles written before 2001");
// 2) Get book titles written before 2001 expr = xpath.compile("//book[@year<2001]/title/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//3) Get book titles cheaper than 8 dollars");
// 3) Get book titles cheaper than 8 dollars expr = xpath.compile("//book[price<8]/title/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//4) Get book titles costlier than 8 dollars");
// 4) Get book titles costlier than 8 dollars expr = xpath.compile("//book[price>8]/title/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//5) Get book titles added in first node");
// 5) Get book titles added in first node expr = xpath.compile("//book[1]/title/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//6) Get book title added in last node");
// 6) Get book title added in last node expr = xpath.compile("//book[last()]/title/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//7) Get all writers");
// 7) Get all writers expr = xpath.compile("//book/author/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//8) Count all books titles ");
// 8) Count all books titles expr = xpath.compile("count(//book/title)"); result = expr.evaluate(doc, XPathConstants.NUMBER); Double count = (Double) result; System.out.println(count.intValue());
System.out.println("n//9) Get book titles with writer name start with Neal");
// 9) Get book titles with writer name start with Neal expr = xpath.compile("//book[starts-with(author,'Neal')]"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i) .getChildNodes() .item(1) //node <title> is on first index .getTextContent()); }
System.out.println("n//10) Get book titles with writer name containing Niven");
// 10) Get book titles with writer name containing Niven expr = xpath.compile("//book[contains(author,'Niven')]"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i) .getChildNodes() .item(1) //node <title> is on first index .getTextContent()); }
System.out.println("//11) Get book titles written by Neal Stephenson");
// 11) Get book titles written by Neal Stephenson expr = xpath.compile("//book[author='Neal Stephenson']/title/text()"); result = expr.evaluate(doc, XPathConstants.NODESET); nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getNodeValue()); }
System.out.println("n//12) Get count of book titles written by Neal Stephenson");
// 12) Get count of book titles written by Neal Stephenson expr = xpath.compile("count(//book[author='Neal Stephenson'])"); result = expr.evaluate(doc, XPathConstants.NUMBER); count = (Double) result; System.out.println(count.intValue());
System.out.println("n//13) Reading comment node ");
// 13) Reading comment node expr = xpath.compile("//inventory/comment()"); result = expr.evaluate(doc, XPathConstants.STRING); String comment = (String) result; System.out.println(comment); } } |
Program output:
Console
//1) Get book titles written after 2001 Burning Tower
//2) Get book titles written before 2001 Snow Crash Zodiac
//3) Get book titles cheaper than 8 dollars Burning Tower Zodiac
//4) Get book titles costlier than 8 dollars Snow Crash
//5) Get book titles added in the first node Snow Crash
//6) Get book title added in last node Zodiac
//7) Get all writers Neal Stephenson Larry Niven Jerry Pournelle Neal Stephenson
//8) Count all books titles 3
//9) Get book titles with writer name start with Neal Snow Crash Zodiac
//10) Get book titles with writer name containing Niven Burning Tower //11) Get book titles written by Neal Stephenson Snow Crash Zodiac
//12) Get count of book titles written by Neal Stephenson 2
//13) Reading comment node Test is test comment |
I hope that this xpath tutorial has been informative for you. It will help you in executing xpath with Java. Above Java xpath example from string will successfully run in Java 8 as well.
If you have some suggestions then please leave a comment.
Happy Learning !!
Recommended Reading:
http://www.w3.org/TR/xpath-full-text-10-use-cases
http://en.wikipedia.org/wiki/XPath
http://oreilly.com/catalog/xmlnut/chapter/ch09.html
'개발 > Java' 카테고리의 다른 글
[Java] String to int, int to String 형변환 (0) | 2019.09.08 |
---|---|
[eclipse] context root 변경 – 서버 돌릴 때 최상위 url(/)에서 실행하기 (0) | 2019.07.21 |
[eclipse] 주석 포맷팅 설정 (0) | 2019.07.13 |
[eclipse] 주석 풀기 (0) | 2019.07.09 |
[Eclipse] github 저장소 프로젝트 추가 (0) | 2019.07.07 |
댓글