본문 바로가기
개발/Java

[Java] XPath Example – XPath Tutorial

by KEI NETWORK 2019. 9. 9.
728x90

In this Java XPath tutorial, we will learn what is XPath library, what are XPath data types and learn to create XPath expression syntax to retrieve information from XML file or document. This information can be XML nodes or XML attributes or even comments as well.Table of Contents 1. What is XPath? 2. XPath Data Model 3. XPath Data Types 4. XPath Syntax 5. XPath Expressions 6. Recommended reading

We will use this XML in running various XPath examples in this tutorial.

inventory.xml

<?xml version="1.0" encoding="utf-8" ?>

<inventory>

    <!--Test is test comment-->

        <book year="2000">

        <title>Snow Crash</title>

        <author>Neal Stephenson</author>

        <publisher>Spectra</publisher>

        <isbn>0553380958</isbn>

        <price>14.95</price>

    </book>

    <book year="2005">

        <title>Burning Tower</title>

        <author>Larry Niven</author>

        <author>Jerry Pournelle</author>

        <publisher>Pocket</publisher>

        <isbn>0743416910</isbn>

        <price>5.99</price>

    </book>

    <book year="1995">

        <title>Zodiac</title>

        <author>Neal Stephenson</author>

        <publisher>Spectra</publisher>

        <isbn>0553573862</isbn>

        <price>7.50</price>

    </book>

</inventory>

 

1. What is XPath

XPath is a syntax used to describe parts of an XML document. With XPath, you can refer to the first element, any attribute of the elements, all specific elements that contain the some text, and many other variations. An XSLT style-sheet uses XPath expressions in the match and select attributes of various elements to indicate how a document should be transformed.

XPath can be sometimes useful while testing web services using XML for sending request and receiving response.

XPath uses language syntax much similar to what we already know. The syntax is a mix of basic programming language expressions (wild cards such as $x*6) and Unix-like path expressions (such as /inventory/author).

In addition to the basic syntax, XPath provides a set of useful functions (such as count() or contains(), much similar to utility functions calls) that allow you to search for various data fragments inside the document.

 

2. XPath Data Model

XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model i.e. DOM tree, so if you’re familiar with the DOM, you will easily get some understanding of how to build basic XPath expressions.

There are seven kinds of nodes in the XPath data model:

  1. The root node (Only one per document)
  2. Element nodes
  3. Attribute nodes
  4. Text nodes
  5. Comment nodes
  6. Processing instruction nodes
  7. Namespace nodes

2.1. Root Node

The root node is the XPath node that contains the entire document. In our example, the root node contains the <inventory> element. In an XPath expression, the root node is specified with a single slash ('/').

2.2. Element Nodes

Every element in the original XML document is represented by an XPath element node.

For example in our sample XML below are element nodes.

  • book
  • title
  • author
  • publisher
  • isbn
  • price

2.3. Attribute Nodes

At a minimum, an element node is the parent of one attribute node for each attribute in the XML source document. These nodes are used to define the features about a particular element node.

For example in our XML fragment “year” is an attribute node.

2.4. Text Nodes

Text nodes are refreshingly simple. They contain text from an element. If the original text in the XML document contained entity or character references, they are resolved before the XPath text node is created.

The text node is text, pure and simple. A text node is required to contain as much text as possible. Remember that the next or previous node of a text node can’t be another text node.

For example, all values in our XML fragment are text nodes e.g. “Snow Crash” and “Neal Stephenson“.

2.5. Comment Nodes

A comment node is also very simple—it contains some text. Every comment in the source document becomes a comment node. The text of the comment node contains everything inside the comment, except the opening <!-- and the closing -->.

For example:

<!--Test is test comment-->

2.6. Processing Instruction Nodes

A processing instruction node has two parts, a name (returned by the name() function) and a string value. The string value is everything after the name <?xml, including white space, but not including the ?> that closes the processing instruction.

For example:

<?xml version="1.0" encoding="utf-8"?>

2.7. Namespace Nodes

Namespace nodes are almost never used in XSLT style sheets; they exist primarily for the XSLT processor’s benefit.

Remember that the declaration of a namespace (such as xmlns:auth=”http://www.authors.net”), even though it is technically an attribute in the XML source, becomes a namespace node, not an attribute node.

 

3. XPath Data Types

In Java, an XPath expression may return one of following data types:

  1. node-set – Represents a set of nodes. The set can be empty, or it can contain any number of nodes.
  2. node (Java support it) – Represents a single node. This can be empty, or it can contain any number of child nodes.
  3. boolean – Represents the value true or false. Be aware that the true or false strings have no special meaning or value in XPath; see Section 4.2.1.2 in Chapter 4 for a more detailed discussion of boolean values.
  4. number – Represents a floating-point number. All numbers in XPath and XSLT are implemented as floating-point numbers; the integer (or int) datatype does not exist in XPath and XSLT. Specifically, all numbers are implemented as IEEE 754 floatingpoint numbers, the same standard used by the Java float and double primitive types. In addition to ordinary numbers, there are five special values for numbers: positive and negative infinity, positive and negative zero, and NaN, the special symbol for anything that is not a number.
  5. string – Represents zero or more characters, as defined in the XML specification.

These datatypes are usually simple, and with the exception of node-sets, converting between types is usually straightforward. We won’t discuss these datatypes in any more detail here; instead, we’ll discuss datatypes and conversions as we need them to do specific tasks.

 

4. XPath Syntax

XPath uses UNIX and regex kind syntax.

4.1. Select nodes with xpath

EXPRESSIONDESCRIPTION

nodename Selects all nodes with the name “nodename
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes

4.2. Use predicates with xpath

Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
We will learn how to use them in the next section.

4.3. Reaching unknown nodes with xpath

XPath wildcards can be used to select unknown XML elements.

WILDCARDDESCRIPTION

* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind

4.4. XPath Axes

An axis defines a node-set relative to the current node. Following are axes defined by default.

AXISNAMERESULT

ancestor Selects all ancestors (parent, grandparent, etc.) of the current node
ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
attribute Selects all attributes of the current node
child Selects all children of the current node
descendant Selects all descendants (children, grandchildren, etc.) of the current node
descendant-or-self Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself
following Selects everything in the document after the closing tag of the current node
following-sibling Selects all siblings after the current node
namespace Selects all namespace nodes of the current node
parent Selects the parent of the current node
preceding Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
preceding-sibling Selects all siblings before the current node
self Selects the current node

4.5. XPath Operators

Below is a list of xpath operators that can be used in XPath expressions:

OPERATORDESCRIPTIONEXAMPLERETURN VALUE

| Computes two node-sets //book | //cd Returns a node-set with all book and cd elements
+ Addition 6 + 4 10
- Subtraction 6 – 4 2
* Multiplication 6 * 4 24
div Division 8 div 4 2
= Equal price=9.80 true if price is 9.80
false if price is 9.90
!= Not equal price!=9.80 true if price is 9.90
false if price is 9.80
< Less than price<9.80 true if price is 9.00
false if price is 9.80
< = Less than or equal to price< =9.80 true if price is 9.00
false if price is 9.90
> Greater than price>9.80 true if price is 9.90
false if price is 9.80
>= Greater than or equal to price>=9.80 true if price is 9.90
false if price is 9.70
or or price=9.80 or price=9.70 true if price is 9.80
false if price is 9.50
and and price>9.00 and price<9.90 true if price is 9.80
false if price is 8.50
mod Modulus (division remainder) 5 mod 2 1

 

5. XPath Expressions

Let's try to retrieve different parts of XML using XPath expressions and given data types.

XPathTest.java

package xml;

 

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.xpath.XPath;

import javax.xml.xpath.XPathConstants;

import javax.xml.xpath.XPathExpression;

import javax.xml.xpath.XPathFactory;

 

import org.w3c.dom.Document;

import org.w3c.dom.NodeList;

 

public class XPathTest

{

    public static void main(String[] args) throws Exception

    {

        //Build DOM

 

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        factory.setNamespaceAware(true); // never forget this!

        DocumentBuilder builder = factory.newDocumentBuilder();

        Document doc = builder.parse("inventory.xml");

 

        //Create XPath

 

        XPathFactory xpathfactory = XPathFactory.newInstance();

        XPath xpath = xpathfactory.newXPath();

 

        System.out.println("n//1) Get book titles written after 2001");

 

        // 1) Get book titles written after 2001

        XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()");

        Object result = expr.evaluate(doc, XPathConstants.NODESET);

        NodeList nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//2) Get book titles written before 2001");

 

        // 2) Get book titles written before 2001

        expr = xpath.compile("//book[@year<2001]/title/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//3) Get book titles cheaper than 8 dollars");

 

        // 3) Get book titles cheaper than 8 dollars

        expr = xpath.compile("//book[price<8]/title/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//4) Get book titles costlier than 8 dollars");

 

        // 4) Get book titles costlier than 8 dollars

        expr = xpath.compile("//book[price>8]/title/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//5) Get book titles added in first node");

 

        // 5) Get book titles added in first node

        expr = xpath.compile("//book[1]/title/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//6) Get book title added in last node");

 

        // 6) Get book title added in last node

        expr = xpath.compile("//book[last()]/title/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//7) Get all writers");

 

        // 7) Get all writers

        expr = xpath.compile("//book/author/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

 

        System.out.println("n//8) Count all books titles ");

 

        // 8) Count all books titles

        expr = xpath.compile("count(//book/title)");

        result = expr.evaluate(doc, XPathConstants.NUMBER);

        Double count = (Double) result;

        System.out.println(count.intValue());

 

        System.out.println("n//9) Get book titles with writer name start with Neal");

 

        // 9) Get book titles with writer name start with Neal

        expr = xpath.compile("//book[starts-with(author,'Neal')]");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i)

                                .getChildNodes()

                                .item(1)                //node <title> is on first index

                                .getTextContent());

        }

 

        System.out.println("n//10) Get book titles with writer name containing Niven");

 

        // 10) Get book titles with writer name containing Niven

        expr = xpath.compile("//book[contains(author,'Niven')]");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i)

                                .getChildNodes()

                                .item(1)                //node <title> is on first index

                                .getTextContent());

        }

 

        System.out.println("//11) Get book titles written by Neal Stephenson");

 

        // 11) Get book titles written by Neal Stephenson

        expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");

        result = expr.evaluate(doc, XPathConstants.NODESET);

        nodes = (NodeList) result;

        for (int i = 0; i < nodes.getLength(); i++) {

            System.out.println(nodes.item(i).getNodeValue());

        }

         

        System.out.println("n//12) Get count of book titles written by Neal Stephenson");

 

        // 12) Get count of book titles written by Neal Stephenson

        expr = xpath.compile("count(//book[author='Neal Stephenson'])");

        result = expr.evaluate(doc, XPathConstants.NUMBER);

        count = (Double) result;

        System.out.println(count.intValue());

 

        System.out.println("n//13) Reading comment node ");

 

        // 13) Reading comment node

        expr = xpath.compile("//inventory/comment()");

        result = expr.evaluate(doc, XPathConstants.STRING);

        String comment = (String) result;

        System.out.println(comment);

    }

}

Program output:

Console

//1) Get book titles written after 2001

Burning Tower

 

//2) Get book titles written before 2001

Snow Crash

Zodiac

 

//3) Get book titles cheaper than 8 dollars

Burning Tower

Zodiac

 

//4) Get book titles costlier than 8 dollars

Snow Crash

 

//5) Get book titles added in the first node

Snow Crash

 

//6) Get book title added in last node

Zodiac

 

//7) Get all writers

Neal Stephenson

Larry Niven

Jerry Pournelle

Neal Stephenson

 

//8) Count all books titles

3

 

//9) Get book titles with writer name start with Neal

Snow Crash

Zodiac

 

//10) Get book titles with writer name containing Niven

Burning Tower

//11) Get book titles written by Neal Stephenson

Snow Crash

Zodiac

 

//12) Get count of book titles written by Neal Stephenson

2

 

//13) Reading comment node

Test is test comment

I hope that this xpath tutorial has been informative for you. It will help you in executing xpath with Java. Above Java xpath example from string will successfully run in Java 8 as well.

If you have some suggestions then please leave a comment.

Happy Learning !!


Recommended Reading:

http://www.w3.org/TR/xpath-full-text-10-use-cases
http://en.wikipedia.org/wiki/XPath
http://oreilly.com/catalog/xmlnut/chapter/ch09.html

728x90

댓글