hg/stx-goodies-xmlsuite: changeset 201:c40bcc1de216

--- a/doc/papers/Dateso2010/xquery-dateso2010.tex	Mon Feb 08 10:55:20 2010 +0000
+++ b/doc/papers/Dateso2010/xquery-dateso2010.tex	Mon Feb 08 14:39:17 2010 +0000
@@ -1,3 +1,270 @@
+<<<<<<< .mine
+\documentclass{llncs}
+\usepackage{amssymb}
+\usepackage{url}
+\usepackage{graphicx}
+\usepackage{color} % specialni makra pro formatovani DP a BP
+\input{unity}
+
+\definecolor{light-red}{rgb}{1, 0.2, 0.2}
+\newcommand{\todo}[1]{\colorbox{light-red}{TODO:} \textbf{#1}}
+
+\title{Deferred node-copying scheme for XQuery processors}
+\author{Jan Kur\v{s} and Jan Vran\'{y}}
+\institute{Software Engineering Group, FIT \v{C}VUT,
+\\ Kolejní 550/2, 160 00, Prague, Czech Republic
+\\ \email{kurs.jan@post.cz, jan.vrany@fit.cvut.cz}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+XQuery is generic, widely adopted language for querying and 
+manipulating XML data. Many of currently available native 
+XML databases are using XQuery as its primary query language.
+%
+The XQuery specification requires each node to belong to exactly 
+one tree. This may lead into excessive and unnecessary data copying 
+in case of a new XML document with query results is to be created.
+%
+In this paper, we present a new node copying scheme that deferrs 
+copy operation unless necessary. We will show that this 
+schemes significantly reduces copy operations required during the 
+query processing. 
+
+\end{abstract}
+
+{\small {\bf Keywords:} XML, XQuery, Smalltalk}
+
+
+\section{Introduction}
+
+    %describe the problem 
+
+    
+    
+    According to the XQuery specification every node belongs to exactly one 
+    tree \cite{xdm_spec}. \todo{full citation, quote it}. 
+    
+    This rule requires copying of whole tree whenever the node is placed 
+    into new node hierarchy. Consider the query 
+    \ref{fig:query:simple-document-creating-query} is to be evaluated
+    and its output is to be serialized to an output file. 
+      
+    \begin{figure}[h]
+      \centering
+        \todo{fill in the query}
+        \begin{verbatim}
+            some condition to show that copied data are not
+            used..
+        \end{verbatim}
+      \caption{Simple document-creating query}
+      \label{fig:query:simple-document-creating-query}
+    \end{figure}
+    
+    In this case, a whole XML subtree that matches \todo{xpath}
+    must be copied out before output serialization. This may
+    lead into excessive node copying even though most of them
+    are never touched. Consequently, such unnecessary copy operations
+    leads into higher memory (or permanent storage) consumption 
+    and thus badly affects overall performance. 
+
+    In this paper we will describe an efficient node-copying 
+    scheme that avoids unnecessary copying while preserving 
+    XQuery semantics. We will show how this scheme is implemented
+    in CellStore/XQuery processor as well as some benchmark results.
+
+    The paper is organized as follows: section \ref{sec:deferred-copying} 
+    give an overall description of the node-copying scheme mentioned above. 
+    Section \ref{sec:impl} gives more detailed description of implementation
+    of this scheme in CellStore/XQuery processor -- an open source XQuery
+    processor written in Smalltalk. Section \ref{sec:discussion} discusses
+    experimental results based on running XMark benchmarks. Section 
+    \ref{sec:related} provides a brief overview of related work. 
+    Section \ref{sec:conclusion} concludes by summarizing presented
+    work. 
+ 
+    % \begin{itemize}
+    %      \item CellStore 
+    %      \item Related XQuery Requirements 
+    % \end{itemize}
+
+
+
+\section{Deferred copying}
+\label{sec:deferred-copying}
+
+    
+
+
+
+
+    The copy of the model is required, at any time the 
+    node is placed to the new XML structure. 
+
+Even thought the data are only
+    duplicated and there is no difference in the data model. This may lead
+
+
+
+
+    The CellStore XQuery interpreter can prevent the unnecessary duplication
+    of the data. The copy is executed only if it is inevitable. Consider 
+    following example.
+   
+    \begin{verbatim}
+doc.xml:
+    <root>
+        <name>Name1</name>
+        <name>Name2</name>
+    </root>
+    \end{verbatim}
+
+    \begin{verbatim}
+let $a:= doc("doc.xml") return
+return <myroot>{$b/name[0]/text()}</myroot>
+    \end{verbatim}
+
+    The \texttt{myroot} element contains exactly the same text as the first
+    \texttt{name} element in the \texttt{doc.xml} file. So the \texttt{Name1}
+    text could be shared in both trees.
+
+\begin{figure}[ht]
+\begin{center}
+\includegraphics[scale=0.3]{shared_node}
+\caption{Two XML trees sharing one node}
+\label{fig:shared_node}
+\end{center}
+\end{figure}
+    
+
+\subsubsection{Copy If Change}
+    It is necessary to copy the node if data has changed.    
+    
+    \begin{verbatim}
+let $a:= doc("doc.xml") return
+return <myroot>{$b/name[0]/text()} is the first</myroot>
+    \end{verbatim}
+
+    During execution of this XQuery command two important thinks happens:
+    \begin{enumerate}
+        \item The \texttt{Name1} (a result of \texttt{\$b/name[0]/text()}
+            XPath) is added to the \texttt{myroot} element. In this moment
+            the structure is the same like in the picture 
+            \ref{fig:shared_node}.
+        \item The \textit{" is the first"} is appended to the to the
+            \texttt{myroot} element text. This changes the text node with
+            the \textit{"Name1"} text, so the text is copied and the value
+            is changed to the \textit{"Name1 is the first"}
+    \end{enumerate}
+
+\subsubsection{Copy On Focus} 
+    \todo{Jak se nazyva aktivni mnozina uzlu???}
+    
+    Consider the example in the picture \ref{fig:shared_node}. The 
+    \texttt{myroot} is actual node and the \texttt{//text()} XPath
+    is executed. The \texttt{Name1} text node is active (\todo{}). 
+    If the \texttt{/parent::*} XPath is executed the \texttt{name} 
+    element is obtained instead of \texttt{myroot}. The data model
+    of the \texttt{Name1} text node contains old reference to its
+    parent.
+
+    That is why the \texttt{Name1} text node has to be copied and the
+    parent node has to be updated if it becomes the focused node.
+
+    The transitions between node kinds are depicted in the picture
+    \ref{fig:transitions}
+\begin{figure}[ht]
+\begin{center}
+\includegraphics[scale=0.3]{hybrid_kinds_transitions}
+\caption{Kinds Transitions}
+\label{fig:kinds_transitions}
+\end{center}
+\end{figure}
+
+
+\subsection{Real results}
+    The optimized node processing has been checked tested on XMark benchmark 
+    queries. The number of copied nodes has been compared with the enabled
+    and disabled optimization.
+
+    \begin{tabular}{|c|c|c|r|}\hline
+		Query No. & Optimization copy & No optimization copy & \%  \\
+	\hline\hline
+			Q1 & 0 & 0 & 0 \\
+			Q2 & 0 & 106 & 100 \\
+			Q3 & N/A & N/A & N/A \\
+			Q4 & 0 & 106 & 100 \\
+			.. & .. & .. & .. \\
+			Q4 & 25 & 50 & 50 \\
+	\hline
+	\end{tabular}
+
+\section{Implementation of  CellStore/XQuery processor}
+\label{sec:impl}
+
+Description of CellStore/XQuery implementation...will see 
+how much detailed, we should not exceed say 10 pages in 
+final PDF. 
+
+
+\begin{itemize}
+        \item XDM Adaptor mechanism 
+        \item "Copy on demand" mechanism description 
+        \item Numbers from some XQuery benchmark queries (maybe) 
+\end{itemize}
+
+\subsection{XDM Adaptor}
+    XDMAdaptor is an interface providing set of accessors as defined
+    in XDM specification \cite{xdm_spec}. The whole XQuery interpretation
+    is based on the XDMAdaptor interface hence the XQuery could be executed
+    over any model providing the XDM accessors.
+
+\subsubsection{Accessing the Document}
+    According to the data source, the appropriate XDMAdaptor is selected
+    and the XQuery is executed. There is no need to import the document
+    to the internal representation.
+
+    \todo{What's more?}
+
+\section{Discussion}
+
+    Benchmarks and discussion
+
+
+\section{Related Work}
+    \begin{itemize}
+        \item Saxon Implementation
+        \item eXist Implementation
+        \item Should be mentioned another DBs (Berkley DB, ...)???
+    \end{itemize}
+
+\subsection{Saxon Implementation}
+    \begin{itemize}
+        \item XDM corresponds to the NodeInfo interface 
+        \item While accessing the document using fn:doc() function, the XML 
+            parser (typically the SAXParser) is executed and whole document 
+            transformed to the inner Java object representation. 
+    \end{itemize}
+
+\subsection{eXist Implementation}
+    \begin{itemize}
+        \item The DOM3 is used as a data model 
+        \item While accessing the document using fn:doc() function, the XML 
+            parser SAXParser is executed and whole document transformed to the 
+            own implementation of the DOM3 representation. 
+        \item The particular document nodes are stored in an arrays carrying 
+            required values.
+    \end{itemize}
+    
+
+\section{Conclusion and Future Work}
+
+\begin{thebibliography}{99}
+\end{thebibliography}
+
+\end{document}
+=======
 \documentclass{llncs}
 \usepackage{amssymb}
 \usepackage{url}
@@ -249,3 +516,4 @@
 
 \end{document}
 
+>>>>>>> .r201
author	kursj1a
	Mon, 08 Feb 2010 14:39:17 +0000
changeset 201	c40bcc1de216
parent 200	ebd44b69ee94
child 202	4c8990f8d9c4