--- a/doc/papers/Dateso2010/xquery-dateso2010.tex Mon Feb 08 10:55:20 2010 +0000
+++ b/doc/papers/Dateso2010/xquery-dateso2010.tex Mon Feb 08 14:39:17 2010 +0000
@@ -1,3 +1,270 @@
+<<<<<<< .mine
+\documentclass{llncs}
+\usepackage{amssymb}
+\usepackage{url}
+\usepackage{graphicx}
+\usepackage{color} % specialni makra pro formatovani DP a BP
+\input{unity}
+
+\definecolor{light-red}{rgb}{1, 0.2, 0.2}
+\newcommand{\todo}[1]{\colorbox{light-red}{TODO:} \textbf{#1}}
+
+\title{Deferred node-copying scheme for XQuery processors}
+\author{Jan Kur\v{s} and Jan Vran\'{y}}
+\institute{Software Engineering Group, FIT \v{C}VUT,
+\\ Kolejní 550/2, 160 00, Prague, Czech Republic
+\\ \email{kurs.jan@post.cz, jan.vrany@fit.cvut.cz}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+XQuery is generic, widely adopted language for querying and
+manipulating XML data. Many of currently available native
+XML databases are using XQuery as its primary query language.
+%
+The XQuery specification requires each node to belong to exactly
+one tree. This may lead into excessive and unnecessary data copying
+in case of a new XML document with query results is to be created.
+%
+In this paper, we present a new node copying scheme that deferrs
+copy operation unless necessary. We will show that this
+schemes significantly reduces copy operations required during the
+query processing.
+
+\end{abstract}
+
+{\small {\bf Keywords:} XML, XQuery, Smalltalk}
+
+
+\section{Introduction}
+
+ %describe the problem
+
+
+
+ According to the XQuery specification every node belongs to exactly one
+ tree \cite{xdm_spec}. \todo{full citation, quote it}.
+
+ This rule requires copying of whole tree whenever the node is placed
+ into new node hierarchy. Consider the query
+ \ref{fig:query:simple-document-creating-query} is to be evaluated
+ and its output is to be serialized to an output file.
+
+ \begin{figure}[h]
+ \centering
+ \todo{fill in the query}
+ \begin{verbatim}
+ some condition to show that copied data are not
+ used..
+ \end{verbatim}
+ \caption{Simple document-creating query}
+ \label{fig:query:simple-document-creating-query}
+ \end{figure}
+
+ In this case, a whole XML subtree that matches \todo{xpath}
+ must be copied out before output serialization. This may
+ lead into excessive node copying even though most of them
+ are never touched. Consequently, such unnecessary copy operations
+ leads into higher memory (or permanent storage) consumption
+ and thus badly affects overall performance.
+
+ In this paper we will describe an efficient node-copying
+ scheme that avoids unnecessary copying while preserving
+ XQuery semantics. We will show how this scheme is implemented
+ in CellStore/XQuery processor as well as some benchmark results.
+
+ The paper is organized as follows: section \ref{sec:deferred-copying}
+ give an overall description of the node-copying scheme mentioned above.
+ Section \ref{sec:impl} gives more detailed description of implementation
+ of this scheme in CellStore/XQuery processor -- an open source XQuery
+ processor written in Smalltalk. Section \ref{sec:discussion} discusses
+ experimental results based on running XMark benchmarks. Section
+ \ref{sec:related} provides a brief overview of related work.
+ Section \ref{sec:conclusion} concludes by summarizing presented
+ work.
+
+ % \begin{itemize}
+ % \item CellStore
+ % \item Related XQuery Requirements
+ % \end{itemize}
+
+
+
+\section{Deferred copying}
+\label{sec:deferred-copying}
+
+
+
+
+
+
+ The copy of the model is required, at any time the
+ node is placed to the new XML structure.
+
+Even thought the data are only
+ duplicated and there is no difference in the data model. This may lead
+
+
+
+
+ The CellStore XQuery interpreter can prevent the unnecessary duplication
+ of the data. The copy is executed only if it is inevitable. Consider
+ following example.
+
+ \begin{verbatim}
+doc.xml:
+ <root>
+ <name>Name1</name>
+ <name>Name2</name>
+ </root>
+ \end{verbatim}
+
+ \begin{verbatim}
+let $a:= doc("doc.xml") return
+return <myroot>{$b/name[0]/text()}</myroot>
+ \end{verbatim}
+
+ The \texttt{myroot} element contains exactly the same text as the first
+ \texttt{name} element in the \texttt{doc.xml} file. So the \texttt{Name1}
+ text could be shared in both trees.
+
+\begin{figure}[ht]
+\begin{center}
+\includegraphics[scale=0.3]{shared_node}
+\caption{Two XML trees sharing one node}
+\label{fig:shared_node}
+\end{center}
+\end{figure}
+
+
+\subsubsection{Copy If Change}
+ It is necessary to copy the node if data has changed.
+
+ \begin{verbatim}
+let $a:= doc("doc.xml") return
+return <myroot>{$b/name[0]/text()} is the first</myroot>
+ \end{verbatim}
+
+ During execution of this XQuery command two important thinks happens:
+ \begin{enumerate}
+ \item The \texttt{Name1} (a result of \texttt{\$b/name[0]/text()}
+ XPath) is added to the \texttt{myroot} element. In this moment
+ the structure is the same like in the picture
+ \ref{fig:shared_node}.
+ \item The \textit{" is the first"} is appended to the to the
+ \texttt{myroot} element text. This changes the text node with
+ the \textit{"Name1"} text, so the text is copied and the value
+ is changed to the \textit{"Name1 is the first"}
+ \end{enumerate}
+
+\subsubsection{Copy On Focus}
+ \todo{Jak se nazyva aktivni mnozina uzlu???}
+
+ Consider the example in the picture \ref{fig:shared_node}. The
+ \texttt{myroot} is actual node and the \texttt{//text()} XPath
+ is executed. The \texttt{Name1} text node is active (\todo{}).
+ If the \texttt{/parent::*} XPath is executed the \texttt{name}
+ element is obtained instead of \texttt{myroot}. The data model
+ of the \texttt{Name1} text node contains old reference to its
+ parent.
+
+ That is why the \texttt{Name1} text node has to be copied and the
+ parent node has to be updated if it becomes the focused node.
+
+ The transitions between node kinds are depicted in the picture
+ \ref{fig:transitions}
+\begin{figure}[ht]
+\begin{center}
+\includegraphics[scale=0.3]{hybrid_kinds_transitions}
+\caption{Kinds Transitions}
+\label{fig:kinds_transitions}
+\end{center}
+\end{figure}
+
+
+\subsection{Real results}
+ The optimized node processing has been checked tested on XMark benchmark
+ queries. The number of copied nodes has been compared with the enabled
+ and disabled optimization.
+
+ \begin{tabular}{|c|c|c|r|}\hline
+ Query No. & Optimization copy & No optimization copy & \% \\
+ \hline\hline
+ Q1 & 0 & 0 & 0 \\
+ Q2 & 0 & 106 & 100 \\
+ Q3 & N/A & N/A & N/A \\
+ Q4 & 0 & 106 & 100 \\
+ .. & .. & .. & .. \\
+ Q4 & 25 & 50 & 50 \\
+ \hline
+ \end{tabular}
+
+\section{Implementation of CellStore/XQuery processor}
+\label{sec:impl}
+
+Description of CellStore/XQuery implementation...will see
+how much detailed, we should not exceed say 10 pages in
+final PDF.
+
+
+\begin{itemize}
+ \item XDM Adaptor mechanism
+ \item "Copy on demand" mechanism description
+ \item Numbers from some XQuery benchmark queries (maybe)
+\end{itemize}
+
+\subsection{XDM Adaptor}
+ XDMAdaptor is an interface providing set of accessors as defined
+ in XDM specification \cite{xdm_spec}. The whole XQuery interpretation
+ is based on the XDMAdaptor interface hence the XQuery could be executed
+ over any model providing the XDM accessors.
+
+\subsubsection{Accessing the Document}
+ According to the data source, the appropriate XDMAdaptor is selected
+ and the XQuery is executed. There is no need to import the document
+ to the internal representation.
+
+ \todo{What's more?}
+
+\section{Discussion}
+
+ Benchmarks and discussion
+
+
+\section{Related Work}
+ \begin{itemize}
+ \item Saxon Implementation
+ \item eXist Implementation
+ \item Should be mentioned another DBs (Berkley DB, ...)???
+ \end{itemize}
+
+\subsection{Saxon Implementation}
+ \begin{itemize}
+ \item XDM corresponds to the NodeInfo interface
+ \item While accessing the document using fn:doc() function, the XML
+ parser (typically the SAXParser) is executed and whole document
+ transformed to the inner Java object representation.
+ \end{itemize}
+
+\subsection{eXist Implementation}
+ \begin{itemize}
+ \item The DOM3 is used as a data model
+ \item While accessing the document using fn:doc() function, the XML
+ parser SAXParser is executed and whole document transformed to the
+ own implementation of the DOM3 representation.
+ \item The particular document nodes are stored in an arrays carrying
+ required values.
+ \end{itemize}
+
+
+\section{Conclusion and Future Work}
+
+\begin{thebibliography}{99}
+\end{thebibliography}
+
+\end{document}
+=======
\documentclass{llncs}
\usepackage{amssymb}
\usepackage{url}
@@ -249,3 +516,4 @@
\end{document}
+>>>>>>> .r201