Java,J2ee,Hybris E-commerce suit,Apache Ofbiz ,database, E-commerce domain Experts: XML parsing in java

XML - a Markup Language which defines structured and platform-independent data that can be exchanged between different applications and platforms. Both XML and Java technology helps for developing Web services and applications that access Web services. Extensible Markup Language (XML) also supports Unicode encoding . Such things made XML to become such a great and popular technology very quckly. . As XML is used every where , it is very important for java developers to read / access / parse XML document in java programming.
To work with XML in Java and XML Parser, Developer needs to have basic knowledge of XML document. It is necessary to know the terms like tags, elements, attributes and nodes of XML document for parsing the XML.
Tags : tag is the text between the left (<) and right (>) angle brackets . Starting tags are represented as <tagname> and ending tags are represented as </tagname>
Elements : An element is the starting tag, the ending tag, and tags in between them. Elements have children that may be other elements, text nodes, or a combination of both. Example for elements in the below sample books xml are <book>, <title>, <author> , <year> are elements.
Attributes : XML elements can have attributes which provide additional information about an element. eg. <book id=“954”> , id is as attribute
Nodes : Everything in an XML document is a node. Elements are only one type of node. We can call the whole document as document node , each XML element as element node , the text in the XML elements as text node , Every attribute as attribute node , Comments as comment node . In the below books xml document , books is the root node , which has two book nodes , whereas each book node has three nodes such as title , author and year and each of them have text node.

The following is the sample XML document to represent the Books details.

<?xml version="1.0" encoding="UTF-8"?><books><book id=“954”>
       <title>Effective Java</title>
       <author>Joshua Bloch</author>
       <year> 2009 </year></book><book id=“777”>
           <title>Effective Java</title>
           <author>Scott Meyers</author>
           <year> 2010 </year>
       </book></books>
Tree Representation of above XML Books

Another sample XML document to represent the Employee details with Salary
    <EmpDetails>         <Employee>             <Name>ABC</Name>             <Designation>ABC</Designation>             <Scale>50000-200000</Scale>             <Salary>                   <Basic>121199.00</Basic>                   <HRA>20000.00</HRA>                   <TA>10000.00</TA>              </Salary>           </Employee>           <Employee>              <Name>XYZ</Name>              <Designation>ABC</Designation>              <Scale>12100-100000</Scale>              <Salary>                    <Basic>21199.00</Basic>                    <HRA>2000.00</HRA>                    <TA>1000.00</TA>              </Salary>           </Employee>       </EmpDetails>

XML parser checks syntax . XML document can have an optional Document Type Definition (DTD), called Schema which defines the XML document structure. If the XML document adheres to the structure of the DTD , then it is valid .

Now let us see how to access and use an XML document through the Java programming language. Two ways are there
1. Through parsers using the API Java API for XML Processing (JAXP)
two parsers are provided with the above API .
i) Simple API for XML (SAX)
ii) Document Object Model (DOM).
2. Through the new API Java Architecture for XML Binding (JAXB).
3. Using JDOM an open-source API
4. Using Apache Xerces

In our tutorial , we are going to parse the above books and Employees XML files using DOM Parser.
Java developers can make use of DOM parser in an application through the JAXP API . DOM parser creates a tree of objects that represents the data in the whole document and puts the tree in memory. Now the program can traverse the tree , to access / modify the data.

Steps to Parse XML file using DOM Parser

1.Create a document factory for DOM methods to obtain a parser that produces DOM object trees from XML documents.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); which creates a new factory instance.

2. Create document builder to obtain DOM Document instances from an XML document.
DocumentBuilder db = dbf.newDocumentBuilder(); which creates a new instance of a DocumentBuilder . XML can be parsed using the instance of this class.

3. Get the DOM Document object by parsing the content of the given XML file as an XML document
Document dom = db.parse(books); which returns DOM object where books is an XML file. Other input sources accepted by parse method are InputStreams, Files, URLs, and SAX InputSources.

4. Access / manipulate the XML document using various methods

Some of the useful methods to get nodelist , elements , node , node value , etc ...

To get the node list of all the elements from the document by giving the tag name .
NodeList nodes = dom.getElementsByTagName("book"); where book is the tag name. Nodes are returned by traversing Document tree by preorder traversal

To get the single node item from the above nodelist . The items in the NodeList are accessible through index, starting from 0.
Node node = nodes.item(index);

To get the element node of the given node .
Element element = (Element) node;

To get all child nodes of the above element for a particular tag
NodeList nodes = element.getElementsByTagName("title").item(0).getChildNodes(); - where title is the tag.

To get the children of root node .
Element root = doc.getDocumentElement(); // gets the root node.
NodeList children = root.getChildNodes(); // returns the children of root.

To get at node information , getNodeName() , getNodeValue() can be used .

Tree struture can be traversed using recursion easily . The following code traverses the entire Book XML document and prints all node names and values if exisit using recursion . This code can traverse any XML.
    import java.io.File;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import org.w3c.dom.*;public class DOMParser1 {public static void main(String args[]) {try {File books = new File("books.xml");DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();Document doc = dBuilder.parse(books);doc.getDocumentElement().normalize();Element root = doc.getDocumentElement(); // gets the root element bookxml_traverse_DOM(root);} catch (Exception ex) {    ex.printStackTrace();}}private static void bookxml_traverse_DOM(Node element)   {      System.out.println(element.getNodeName()+" = "+element.getNodeValue());         for (Node child = element.getFirstChild(); child != null; child = child.getNextSibling())      {                     bookxml_traverse_DOM(child);                    } }}

The following code (partly) can be used to find the value (text node) of the tags title , author , year of books.xml
    NodeList bookNodes = doc.getElementsByTagName("book"); // all book nodesfor (int i = 0; i < bookNodes.getLength(); i++) {    Node bookNode = bookNodes.item(i); if (bookNode.getNodeType() == Node.ELEMENT_NODE) {          Element element = (Element) bookNode;              System.out.println("Book Title: " + getTextNode("title", element));          System.out.println("Author Name: " + getTextNode("author", element));          System.out.println("Year of Publishing: " + getTextNode("year", element));            }     } getTextNode methodprivate static String getTextNode(String tag, Element element) {NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();Node node = (Node) nodes.item(0);return node.getNodeValue(); // returns the only one text node value. }
Output:

Now let us parse the above Employees XML and calculate the Total salary of each employee without using recursion .
    import java.io.File;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import java.io.*;import org.w3c.dom.*;public class employees_DOM {   public static void main(String[] args) {         try{     File employees = new File("emp.xml");     DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();     DocumentBuilder db = dbf.newDocumentBuilder();     Document doc = db.parse(employees);     doc.getDocumentElement().normalize();         // get all the employee nodes         NodeList employeeNodeList = doc.getElementsByTagName("Employee");         for(int i=0; i<employeeNodeList.getLength(); i++) { double total_salary=0.0;             Node employeeNode = employeeNodeList.item(i); // one employee node at a time    NodeList employeeChildNodeList = employeeNode.getChildNodes();    // all child nodes of employee node                        for(int j=0; j<employeeChildNodeList.getLength(); j++) {                                   Node employeeChildNode = employeeChildNodeList.item(j);   //Name , Designation , Scale ,Salary ....                                     if (employeeChildNode.getNodeType() == Node.ELEMENT_NODE) {                 String employeeChildNodeName=employeeChildNode.getNodeName();                  Node employeeTextNode = employeeChildNode.getFirstChild();//the only text node                 int no_of_childs=employeeChildNode.getChildNodes().getLength();    // no of childs inside name child , designation , ..., salary . salary has more than one child nodes.                 if (no_of_childs==1)                  System.out.println( employeeChildNodeName + " = " +employeeTextNode.getNodeValue().trim());   else    {   // samething for salary node    NodeList salaryChildNodeList = employeeChildNode.getChildNodes();   // all child nodes of salary node         for(int k=0; k<salaryChildNodeList.getLength(); k++) {                                      Node salaryChildNode = salaryChildNodeList.item(k); // to get basic , hra, ta ...                                        if (salaryChildNode.getNodeType() == Node.ELEMENT_NODE) {                    String salaryChildNodeName=salaryChildNode.getNodeName();                     Node salaryTextNode = salaryChildNode.getFirstChild(); //the only text node                       System.out.println( salaryChildNodeName + " = " + salaryTextNode.getNodeValue().trim());      total_salary=total_salary + Double.parseDouble(salaryTextNode.getNodeValue().trim());      }      }      }      }              }   System.out.println("Total Salary = " + total_salary + "\n");           }      } catch (Exception e) {e.printStackTrace(System.err);}       }}
Output:

Some of the imporatnt uses of XML :
XML is used everywhere , some of the example are given below
1. XML plays big role in Webservice through messaging. XML and Web services architecture allows applications written in different languages on different platforms to exchange information with each other in a standards-based way.
2. XML helps to create interactive pages that allows the customer to customize those pages which can be translated to XHTML using XSL stylesheet. With XML,the data is stored once and can be rendered for different viewers with different style based on the style sheet which is processed by XSLT ( Extensible Style Language Transformation ).
3. Ajax helps to communicate with server applications with the help of XML
4. Many configuration files used by Application Server , Framework are xml .
5. XML helps to keep the data separated from your HTML

traverse / parse a XML file in Java with DOM Parser example code . XML document / file Parsing using recursion