We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.
EXtensible Markup Language, XML, was designed as a markup language for structuring,
storing and transporting data on the World Wide Web. The focus of XML is on
data content; arbitrary markup is used to describe data. This versatile, self-describing
data representation has established XML as the universal data format and the de facto
standard for information exchange on the Web. This has gradually given rise to the
need for efficient storage and querying of large XML repositories. To that end, we
propose a new model for building a native XML store which is based on a generalisation
of vertical decomposition. Nodes of a document satisfying the same label-path,
are extracted and stored together in a single container, a Stripe. Stripes make use of
a labelling scheme allowing us to maintain full structural information. Over this new
representation, we introduce various evaluation techniques, which allow us to handle
a large fragment of XPath 2.0. We also focus on the optimisation opportunities that
arise from our decomposition model during any query evaluation phase. During query
validation, we present an input minimisation process that exploits the proposed model
for identifying input that is only relevant to the given query, in terms of Stripes. We
also define query equivalence rules for query rewriting over our proposed model. Finally,
during query optimisation, we deal with whether and under which circumstances
certain evaluation algorithms can be replaced by others having lower I/O and/or CPU
cost. We propose three storage schemes under our general decomposition technique.
The schemes differ in the compression method imposed on the structural part of the
XML document. The first storage scheme imposes no compression. The second storage
scheme exploits structural regularities of the document to minimise storage and, thus,
I/O cost during query evaluation. Finally, the third storage scheme performs structureagnostic
compression of the document structure which results in minimised storage,
regardless the actual XML structure. We experiment on XML repositories of varying
size, recursion and structural regularity. We consider query input size, execution plan
size and query response time as metrics for our experimental results. We process query
workloads by applying each of the proposed optimisations in isolation and then all of
their combinations. In addition, we apply the same execution pipeline for all proposed
storage schemes. As a reference to our proposed query evaluation pipeline, we use
the current state-of-the-art system for XML query processing. Our results demonstrate
that:
• Our proposed data model provides the infrastructure for efficiently selecting the parts of the document that are relevant to a given query.
• The application of query rewriting, combined with input minimisation, reduces
query input size as well as the number of physical operators used. In addition,
when evaluation algorithms are specialised to the decomposition method, query
response time is further reduced.
• Query evaluation performance is largely affected by the storage schemes, which
are closely related to the structural properties of the data. The achieved compression
ratio greatly affects storage size and therefore, query response times
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.