This wiki service has now been shut down and archived

Tuesday 16th March

From EscienceEnvoyWiki

Jump to: navigation, search

Click to return to the main event page - Monday - Tuesday - Wednesday - Thursday - Friday - Biographies - Abstracts - eSI Workshop wiki

Presentations and other material used at the DIR workshop

Data-Intensive Research Workshop: Tuesday 16th March 2010

Topic: The challenge of increasing scales: data volume, data sources, data users

Day organiser: Stratis Viglas


Data and scale are overloaded terms and mean different things to different practitioners depending on their perspectives and goals.

In terms of data, the semantics is usually dependent on the application domain. It might refer to a corpus of unstructured text to someone working on text mining; or, it might refer to a time-series of sensor measurements to a geologist; or, a multitude of input/output parameters to a physicist; or, it might refer to a fully structured relational schema to a database researcher; the possibilities are as endless as the domains. At the end of the day, data is what all our systems process and produce.

Scale also comes in flavours. Is it the disk capacity needed to store the data? Is it the rate at which data is produced? Does it represent the number of complex interactions across different types of data? Or is it the number of concurrent parallel processes we can throw at a data-intensive problem? What we want is our systems to be scalable in a number of dimensions – some of which we might not have even identified yet, let alone comprehend.

There is a saying that “a craftsman is only as good as his tools.” To some extent, we are all craftsmen and we have to pick the right tool for the job. Tools differ across crafts and no two jobs within the same craft are the same. The goal of the day is to be exposed to different “crafts” and listen to the jobs they include and the tools that are better suited for them. It is most likely the case that the saying will be verified once more. But what will hopefully come out of this exposition is the knowledge of some tool another craftsman has been using to do a job similar to what we have to deal with in our craft.

To that end, we will have a series of talks in the morning:

  • Astronomers produce vast quantities of data; Alan Heavens (University of Edinburgh) will talk about the challenges in analysing large sky datasets, and how the statistical techniques developed for doing so are finding applications in other domains..
  • However, we do not need to look to the endless sky for data challenges, when similar quantities and complex interactions can be found here on Earth; Torlid van Eck (Royal Netherlands Meteorology Institute) will talk about the data challenges in earthquake seismology.
  • Keith Haines (University of Reading) will talk about less catastrophic elements of our planet: climate and oceanographic data and how these can be integrated into a single view.
  • Arguably, relational databases are one of the de facto mechanisms to support data storage and manipulation. Martin Kersten (CWI) is the godfather of one of the most impressive open-source database systems around: MonetDB. He will present the challenges in developing database support for generic data processing.
  • A collection of data usually adheres always adheres to a rudimentary data model. Is that always true? Jonty Rougier (University of Bristol) will talk about the challenges of uncertainty in data modelling.
  • In addition to long-term storage and long-running processing of data, we might be interested in real-time analysis. Beth Plale (Indiana University) will present the challenges and methods in this area.
  • Finally, Michael Batty (University College London) will talk about the use of complex geo-spatial data to aid long-term planning.

Potential topics for discussion during the breakout sessions include:

  • Database query languages vs. data-flow languages: when do we need one and when do we need the other?
  • Differences between real-time analytics and long-running processing.
  • Horizontal and vertical scaling: are both always necessary, or can we get away with one?

However, none of these topics are set in stone and are indeed expected to be substantially reformulated after the morning’s talks.


Time Session Speaker Talk Title
09:00 Astronomy's data challenges Alan Heavens, Institute for Astronomy, University of Edinburgh Dealing with large data sets in astronomy and medical imaging (by throwing almost everything away) PDF (8MB) Video
09:30 Seismology's data challenges Torild van Eck, The Royal Netherlands Meteorological Institute Data challenges in Earthquake Seismology PPT (13.9MB) PDF (2.8MB) Video
10:00 Integrating climate and oceanographic data Keith Haines, Reading e-Science Centre, University of Reading Making the most of Earth-system data PPT (23.5MB) PDF (5.3MB) Video
10:30 Coffee break
11:00 Foundations for scientific data Martin Kersten, CWI Scientific Databases: the story behind the scene PPT (5.9MB) PDF (1.5MB) Video
11:30 Uncertainty in Climate Science data Jonty Rougier, University of Bristol Model limitations: sequential data assimilation with uncertain static parameters PDF (832KB) Video
12:00 Handling Earth-Systems data in real time applications Beth Plale, Indiana University Earth-Systems data in real time applications: low latency, metadata, and preservation PDF (2MB) Video
12:30 Planning & geo-spatial data Michael Batty, University College London Challenges in Large Scale GeoSpatial Data Analysis: Mapping, 3D and the GeoSpatial Web PPT (18.4MB) PDF (3.9MB) Video
13:00 Lunch
14:00 Breakout Groups Messy Data PPTX (41KB) PDF (89KB)
Data Intensive Research: Data Analysis PDF (540KB)
16:00 Plenary session: Report back from working groups; consolidation of observations, vision and recommendations
17:00 e-Infrastructure for Data-Intensive Research Carole Goble, School of Computer Science, University of Manchester Providing an environment where every data-intensive researcher will thrive PPT (13.1MB) PDF (3.6MB) Video

Discussion Pages

You can read more and discuss Tuesday's activities at: Tuesday's talks and Break-out sessions

This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.