Web 2.0 to Web 3.0 tutorial


We have all been used to the term “Web 2.0”. It has been a great marketing brand, initially created by O’Reilly media group to group and highlights key technologies changing the user habits and usage of internet in the world. There has been a quantum increase in the number of users on the internet leading to huge amount of social, business, education as well as governance related functions being exposed through the internet.
Depending on the wide range of user patterns over the internet, wide ranges of innovative solutions were released to the user communities. Contrary to the reaction received by the initial “Internet Bubble”, these innovative solutions were well received by the user community and had converted these initial investments into multi-billion dollar industry.
Some of the common usage patterns or which Web 2.0 provides generic solutions is:

·         Social Networking
·         Blogging
·         Personalization / Aggregation
·         Filtering and Tracking
·         Collaboration amongst communities
·         Office online
·         Content sharing
·         Relevant Search
·         Notifications
·         Learning

Some of the popular terms which all of us go through infinite times in a day now are:
Blogs
Simple webpage consisting of brief paragraphs of opinion, information, personal diary entries, or links, called posts, arranged chronologically with the most recent first, in the style of an online journal. Eg.
http://wordpress.org/
http://www.blogger.com/
Wiki’s
Collaborative tool that facilitates the production of a group work. Eg.
http://meta.wikimedia.org/wiki/MediaWiki *
http://www.socialtext.com/products/overview
http://www.twiki.org/
Tag’s / Bookmarking
A tag is a keyword that is added to a digital object (e.g. a website, picture or video clip) to describe it, but not as part of a formal classification system.
Social bookmarking systems allow users to create lists of ‘bookmarks’ or ‘favourites’, to store these centrally on a remote service (rather than within the client browser) and to share them with other users of the system. Eg.
http://www.digg.com/
Folksonomy
Folksonomy is collaborative tagging and method of collaboratively creating and managing tags to annotate and categorize content. In it metadata is generated not only by experts but also by creators and consumers of the content.
Multimedia sharing
Biggest revolution in Web 2.0 has been triggered by the power of content sharing where users can share pictures and multimedia content to achieve any personal as well as business objective. Eg.
http://www.youtube.com/
Podcasting
Digital media files that can be downloaded from the internet and are used to share talks, lectures, and interviews. Eg. Apple ITunes, Microsoft Zune.
RSS Feeds
RSS is a family of formats which allow users to find out about updates to the content of RSS-enabled websites, blogs or podcasts without actually having to go and visit the site. Instead, information from the website is collected within a feed (which uses the RSS format) and ‘piped’ to the user in a process known as syndication.
Mashups
A mashup is a web page or application that combines data or functionality from two or more external sources to create a new service. The term mashup implies easy, fast integration, frequently using open APIs and data sources to produce results that were not the original reason for producing the raw source data.
Widgets
A web widget is a portable chunk of code that can be installed and executed within any separate HTML-based web page by an end user without requiring additional compilation. It is a capability using which we can expose/integrate an application or service to various available web-sites.

There were obvious reasons why this paradigm was so popular and successful amongst the user:
1.      Patterns identified were based on the actual usage and behavior.
2.      Most of the solutions were open source which triggered innovative ideas, solutions and participation from the user community.
3.      It was for the user community.
1.1.1.                 Technologies behind Web 2.0
Web 2.0 paradigm was widely successful in leveraging both internal and external synergies in the market place. Organizations were able to tie up their content, knowledge, information, people and processes better with Web 2.0. In the same way many new organizations have come out with commercial offerings with Web 2.0 paradigm redefining channels to deliver service and value to their customers.
All this has happened mainly due to the wide involvement of Open Source Paradigm. Open source has given Web 2.0 that agility which was not possible through the product companies. Wide acceptability against a very little investment is only possible with open source software due to community wide collaboration.
Key Standards and technologies, which were mainly sourced by the open source world and which powered the implementation of Web 2.0 are:
Standards
Technologies
RSS
XML
RDF
Ajax
ATOM
DOJO
SOAP / REST
Flex

Silverlight
Having discussed Web 2.0 paradigm it becomes interesting to define the expectations out from Web 3.0, the next version of web. Though different flavors of Web 2.0 were launched i.e. Government 2.0, Enterprise 2.0 etc. the basic drawback which was identified in the model was its “static” nature. Web 2.0 was successful in standardizing and integrating the services provided over the Web, it lacks in defining a framework for semantic use of these services for a logical need. For example if we are looking for a house on rent and we have certain feature list to look for, very few individual web sites offer such metadata to be used for refinement of results. Over and above this, the metadata definition is not standardized in the current set of services being offered. Currently, a “Search” is mostly based on the keyword match with a basic sense of relevance and priority built into it.
So users need an element of intelligence or a layer to provide semantics as a placeholder for that intelligence, in the next version of web. This layer will be abstract and dynamic in nature with elements of artificial intelligence built into it. In simpler terms Web 3.0 can be defined as:
“It will not be a separate web but an extension of current one, in which information is given well defined meaning, better enabling computers and people to work in cooperation” Tim Berners-Lee2001

1.2.1.                 What it will take
In the introduction of Web 3.0 we have defined few terms which will become the basis of this new paradigm i.e.:
·         Metadata
·         Abstract and dynamic nature
·         Artificial Intelligence.
In this section we will define how these elements are being addressed by the industry to fulfill the paradigm of Web 3.0.

Metadata

Metadata in layman’s term is data about data. In the same way on the web to make the available services and infrastructure to participate semantic outcome there will be a need to capture the data about the services in a standardized form. Resource Description format (RDF) is a metadata data model used for defining the data about the information available across World Wide Web. It is similar to classical modeling approaches such as E-R or OO approach based on subject-predicate-object expressions for e.g. “http://www.example.org/index.html has a creator whose value is John Smith”. In this statement subject is the web page “http://www.example.org/index.html” predicate is “creator” and the object is “John smith”.
Some more details about the example mentioned above can be represented in RDF as shown in the diagram above. XML representation of the diagram is as follows:
 <?xml version="1.0"?>
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:exterms="http://www.example.org/terms/">
 
   <rdf:Description rdf:about="http://www.example.org/index.html">
       <exterms:creation-date>August 16, 1999</exterms:creation-date>
   </rdf:Description>
 
 </rdf:RDF>
As we have seen in the example above RDF is mostly used to represent data. It uses the URI references to provide a hierarchal structure to the data.
While defining data and to build generic representation of data around us we definitely will need capabilities to define “type” of data or in OO terms “class” to characterize data. This will help us in building the vocabulary to represent data and information around us more structurally. RDF vocabulary description language also known as RDF-Schema, provides facilities to describe classes and their properties in RDF.  RDF-Schema provides “Type” semantic to RDF. An example taken for RDF primer explains the RDF-Schema as shown below.
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF   
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xml:base="http://example.org/schemas/vehicles">
 
<rdf:Description rdf:ID="MotorVehicle">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</rdf:Description>
 
<rdf:Description rdf:ID="PassengerVehicle">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>
 
<rdf:Description rdf:ID="Truck">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>
 
<rdf:Description rdf:ID="Van">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>
 
<rdf:Description rdf:ID="MiniVan">
  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
  <rdfs:subClassOf rdf:resource="#Van"/>
  <rdfs:subClassOf rdf:resource="#PassengerVehicle"/>
</rdf:Description>
 
</rdf:RDF>

Abstract and dynamic behavior

Using RDF and RDF-Schema we have given structure and shape to the data which we want to work upon. Abstract and dynamic behavior of the web i.e. semantic web is critical factor because of the variety of the services available as well as varied user usage patterns and demographics involved. Considering the infinite amount of static information stored in World Wide Web, we need semantic tool to make some logical sense out of it. We will take an example before we get into the details of this concept.
Consider the next few lines as an example for a scenario. Assume system has following set of data already encoded in it using standards like RDF & RDF-Schema.
·         Sandeep is a team member for “Genesis” Account.
·         Raman is the project manager of “Genesis” account.
·         Team member reports to Project manager of a project.
If we want to know “Whom does Sandeep reports to?” we should be able to get the answer “Sandeep reports to Raman” from the information which we have.
Two languages which are pre-dominantly being used to build this semantic over the data are:
·         OWL: Web Ontology Language
·         SPARQL: RDF Query language
OWL, Web Ontology language, allows us to define basic infrastructure for computer programs to make inferences based on the semantic defined in it. OWL is used to define ontology for the data present or to be represented by web. Ontology is a term used for the capability for defining Entities as well as relationship between them. It is inherently based on the RDF and RDF-Schema specifications which uses URL references for such representation. OWL encodes basic rules in it based on which new inferences can be made out of the data which exists within one document or multiple distributed documents across the system.
OWL as a language supports following attributes provided by the programming languages vis-à-vis ontology to be defined using it:
·         Property and data types
o        Transitive
o        Symmetric
o        Functional
o        Inverse of
o        Inverse functional
·         Property restrictions
o        All Values
o        Some Value
o        Cardinality
o        Has value
·         Equivalence between classes and properties
·         Difference between classes
·         Complex classes
o        Intersection of
o        Union of
o        Complement of
·         Enumeration
·         Disjoint classes
·         Versioning
SPARQL, Query language for RDF, is another tool which helps in building the semantic rules over the data stored using RDF. SPARQL is like SQL for RDF. It supports most of the features supported by SQL i.e. value testing and constraints.

Artificial Intelligence

There has been debate on whether Web 3.0 is another implementation of Artificial intelligence or it is totally disjoint from it. If yes, then how? Till now we have discussed the need of semantic web, ways to document data and its metadata and how to build capability of making new inferences in the data. Artificial intelligence capabilities will obviously be required by the entities who will be the eventual user of this data. There would be organizations, businesses, governments, educational institutions as well as communities which will be thriving on this data. Each having its own ontologies, there will be a need of integration of these ontologies. AI powered Inference engines / Agents will be required to process the data documented in RDF, RDF-Schema and OWL ontology to serve millions of users over the World Wide Web. Diagram below will give a pictorial view of how WWW will function under Web 3.0 paradigm.

1.2.2.                 Web 3.0 implementations
Some of the popular implementations of Web 3.0 are provided below:
The Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people, the links between them and the things they create and do; it is a contribution to the linked information system known as the Web. FOAF defines an open, decentralized technology for connecting social Web sites, and the people they describe
OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents.

Web sites
Details
SPARQL specification
Web 3.0 introduction.
OWL features
Resources for RDF
RDF-Schema specification
RDF primer


Re-engineering vs. Refactoring in software development


Enterprise legacy software systems tend to be large and complex. The analysis of system architecture therefore becomes a difficult task. To solve the problem, it would be better if legacy software architecture can be decomposed to reduce the complexity associated with analyzing large scale architecture artifacts. Architecture decomposition is an efficient way to limit the complexity and risk associated with the re-engineering activities of a large legacy system. It divides the system into a collection of meaningful modular parts with low coupling, high cohesion, and minimizes the interface, thus to facilitate the incremental approach to implement the progressive software re-engineering process.

A legacy code can be considered as any code that was written before today. Traditional approach is to start performing changes in a secured manner because it is not sure what’s really going to happen when a data structure is changed or update a variable i.e. adding a wrapper on top of existing code or copying the code from another place which already works. In the cases mentioned in the previous lines, the code will blot and maintainability, testability and understandability will be a big problem in future. But, for people who deal with it day in and day out "legacy code" is a Pandora’s box: sleepless nights and anxious days poring through bad structure, code that works in some incomprehensible way. Martin Fowler defines refactoring legacy code as “a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior.

Most of the tasks in the evolution and servicing phases require program comprehension, understanding how and why a software program functions in order to work with it effectively. Effective comprehension requires viewing a legacy program not simply as a product of inefficiency or stupidity, but instead as an artifact of the circumstances in which it was developed. This information can be an important factor in determining appropriate strategies for the software program's transition from the evolution stage to the servicing or phase-out stage.

This article will talk about the definition of re-engineering and refactoring and also present the situation in which this process need to be used in an effective manner.

Refactoring is the process of changing a software system in such a way that the external behavior of the code is unchanged but its internal structure and architecture are improved. It is a behavior-preserving source code transformation.






      






Programmers hold onto software designs for long duration even after they have become unwieldy. The life span of the legacy code will be alive only as long as product is running at customer place. We reuse code that is no longer maintainable because it still works in some way and are bit scared for modification. But is it really cost effective to do so?. When we remove redundancy, eliminate unused functionality and rejuvenate obsolete designs, we are in the process of refactoring the code that are not maintainable.
Refactoring throughout the entire project life cycle saves time and improves quality. During this phase, a series of question will arise for all the programmers like:
·         “Changing the design/code might break the system!”
o        Solution: Use tests to prove behavior preservation
·         “I don't understand how it works now!”
o        Solution: Learn through the process and Build documentation as you refactor and simplify
·         “I don't have the time to refactor!”
o        Solution: Refactoring will pay for itself later
The below mentioned graphical picture will depict the cost saving if continuous refactoring followed during the lifecycle of a project/product: [Reference: http://www.jacoozi.com/blog/?p=11 ]

 















But it is important for the programmers to understand that refactoring process will help to improve readability, flexibility, extensibility, understandability and improve performance. Refactoring process can be applied during application, maintenance, testing, coding and during framework development. Below mentioned section will explain the refactoring cycle that can be used for refactoring code under maintenance:
·         Program source code should go through expansion and contraction phases.
o        Expansion phase: code is added to meet functional requirements
o        Contraction phase: code is removed and refactored to better address those requirements and plan for the future.
·         This cycle will be repeated many times during a program's lifetime.
The objective of refactoring is to keep the design simple as time goes on and also avoid clutter and complexity in the legacy code. Refactoring is the process which will help in cleaning up the code which is easier to understand, modify and extend. In the longer run, it will groom the system which is well defined and more maintainable.
There is certain amount of Zen to refactoring. It is hard at first because the design which is envisioned and working has to set off and accept the design that was serendipitously identified while refactoring. It is important to know that the design envisioned was competent but is obsolete now. Before implementing this process, it is better to remove the notions about what the system should or should not be and try to see the new design emerge as code changes take place.
The number of refactoring that would be beneficial to any code is infinite. Some of the refactor techniques that are used in java development are
·         Organize imports
·         Rename {field, method, class, package}
·         Move {field, method, class}
·         Extract {method, local variable, interface}
·         Inline {method, local variable}
·         Reorder method parameters
·         Push members down
Legacy systems might have written in different architecture which in turn written in different computer languages. The key issue will be in maintenance and the integration of these systems. Companies that optimize business processes must often change legacy information systems to support the new processes. The required changes can involve new features, porting, performance optimization, or bug fixes. Changes for the legacy systems often require replacement of the existing code but also of supporting tools (e.g., compilers and editors), development processes (testing and version control).
This change requires discarding part or all of the existing system, modifying existing parts, writing new parts and purchasing new or improved parts from external vendors. Based on this criteria, we can termed in two different forms and the same is mentioned below
·         If the change is accomplished primarily through discarding the existing system and buying or building new parts then those systems are termed as a rewrite or redevelopment.
·         If the change is accomplished primarily by modifying the existing system, the project is termed as a reengineering project.
Rewriting and reengineering are the extremes along a spectrum of strategies for change but in reality most major upgrades are accomplished by some combination of the two.
Reengineering and refactoring might look quite similar at the beginning however, reengineering deals with the examination and alteration of a system to reconstitute it in a new form, and the subsequent implementation of the new form.
The primary benefits that reengineering will include:
·         Reduced operating and maintenance costs caused by overheads of older applications.
·         Improved application maintainability, even if there is limited application knowledge, high staff turnover, lack of qualified resources, outdated documentation, or obsolete application platform support.
·         Improved access to legacy applications in case of a merger or organizational change.
Based on the process followed in reengineering project, the lifecycle involves two major steps in reengineering processes and the same is mentioned below:













Forward reengineering: Forward engineering starts with system specification and involves the design and implementation of a new system.









Text Box: System specification





Text Box: New System






 



Reverse reengineering: Reverse engineering is the process of analyzing a subject system to identify the systems’ components and their interrelationships and create representations of the system in another form or at a higher level of abstraction












Text Box: Existing legacy system



Text Box: Re-engineered system






 



Perspective
Reengineering
Refactoring
Scope
Reengineering always affects the entire system or part of the system (in this case we will take hybrid approach).
Refactoring has typically (many) local effects
Process
Reengineering follows a disassembly / reassembly approach in technical domain.
Refactoring is a behavior-preserving, structure transforming process.
Result
Reengineering can create a whole new system – with different structure and possibly a different behavior.
Refactoring improves the structure of an existing system – leaving its behavior

Cost
Cost is higher when compared with refactoring.
Continuous refactoring will decrease the total cost of ownership.

Below are the some of the scenario in which reengineering will suitable:
·         System’s documentation is missing or obsolete.
·         Team has only limited understanding of the system, its architecture and implementation
·         Bug fix in one place causes bugs in other places.
·         New system level requirements and functions cannot be addressed or integrated appropriately.
·         Code is becoming ‘brittle’ and difficult to update.


Legacy software systems are an ongoing challenge for software developers. Refactoring according to Martin Fowler, is a controlled technique for improving the design of an existing code base. It is important to maintain the health code for better maintainability by refactoring.

Developing a custom built system requires a lot of effort and cost. Hence, organizations need to maintain their old systems in order to reduce the cost and increase the lifetime of the old system. For these purpose re-engineering becomes a useful way to convert old, obsolete systems to efficient and streamlined systems. The intent of reengineering to create version of existing programs that are of high quality and easier to maintain.

StAX and XML Accelerators tutorial


XML Technology is gradually becoming the standard for Data Interchange. Most organizations in the world use XML is some form or the other. XML forms the basis of many future inventions in the field of information technology.
In spite of all these very lucrative advantages, the very basis of technology is under threat due to reduced Performance aspects that solutions have to live with due to the very nature of the parsing and processing technologies.
In the world of Java, there are primarily three options that are provided to parsing of XML Structures namely DOM (Document Object Model), SAX (Simple API for XML) and STAX (Streaming API for XML).
DOM and SAX have traditionally being used for parsing XML Structures. STAX is a relatively newer member of XML Parsing technology in the Java World.
STAX is built upon the concept of Pull model in which an application queries the parser for the next parsing event, but never surrenders control to the parser during the process. Stated differently, StAX essentially turns the SAX processing model upside down. Instead of the parser controlling the application's flow, and the application reacting to parsing events, it is the application that controls the flow by pulling events from the parser.

Pull Parsing Model in StAX allows for
a)      Control over the Parsing Engine
b)     Greater Programmatic control over the XML Data Structure
c)      Reduces heavy memory footprints, which are required due to usage of DOM Parsing techniques.
d)     Simple processing model such as used with SAX
e)      Event based processing control (this is called Pipelining) on XML Documents
f)       The StAX cursor model is the most efficient way to parse XML since it provide a natural interface by which the parser can compute values lazily
g)      It is more optimized for Speed and Performance in comparison to DOM and SAX

In spite of the advent of the STAX as member of the Java Technology, still a lot of debate exists in adoption of XML Technologies, mainly due to performance overheads.

XML Accelerators are the newest mechanism appearing in the industry. Currently there are primarily three options available in dealing with Improving XML Performance
a)      Microprocessor based acceleration: This option takes into account the fact that faster microprocessors will process XML data faster than not so fast microprocessor.
b)     Standalone XML Accelerator Engine: This devices hook into the individual applications and reduce the XML data beings transmitted across applications. What these don’t attempt to is improve the performance of XML processing on Individual Application.
c)      PCI Hardware boards for XML Accelerators: These hardware boards actually separate XML processing from the application thereby improving performance.  Figure below gives an example of PCI based Hardware Board processing mechanism

StAX is definitely is a much better solution implementation option as compared to DOM and SAX. However in order to boost XML Performance use of XML accelerator solutions is still evolving. Meanwhile, choice of the PCI Hardware based XML Accelerators for today may good option to enhance XML processing and implement the much needed SOA solutions.