This wiki service has now been shut down and archived

Monday 15th March

From EscienceEnvoyWiki

Jump to: navigation, search

Click to return to the main event page - Monday - Tuesday - Wednesday - Thursday - Friday - Biographies - Abstracts - eSI Workshop wiki

Web cast of plenary talks presentations and other material used at the DIR workshop


Data-Intensive Research Workshop: Monday 15th March 2010

Topic: Introduction - the interplay between challenges and technology in data-intensive research

Day organiser: Malcolm Atkinson

Introduction

The use of data in research is growing rapidly; the digital revolution generates more and more data, and policies encourage more data to be published. Expectations for openness, repeatability and evidence quality increase the data-use imperative.

Data use includes the data-curation lifecycle from creation or collection, through cleaning, integration, analysis, annotation, citation and presentation to preservation or discard. It includes finding data, developing an understanding of a body of data (often drawn from multiple sources), working out how to get evidence from data or what the newly available collections of data may now enable, deciding how to extract evidence from data possibly in combination with other data and presenting results so that evidence is understood, trusted and used.

There are many challenges in the wide range of data uses; for example, coping with complexity, with variable or poor data quality, with high volumes, with sophisticated analytic and presentation requests, with high data rates, with heterogeneity, with user numbers and diversity and so on. These challenges are addressed through multiple forms of iteration, progressively discovering and understanding data, progressively developing and understanding analytic methods, progressively refining the processes used to obtain particular forms of evidence, progressively improving the software, progressively adapting the computational platforms, and so on. Communities of data users build expertise around data and as they do they change requirements, patterns of use and modes of acceptable behaviour.

Data-intensive research is both research in any domain that has to pay serious attention to the ways in which it uses data in order to succeed, and research that improves our ability to use data. These co-evolve.

Monday’s programme explores that co-evolution. Talks will start from an understanding of a class of research challenges that need data-intensive methods and report how those methods are developing and exploiting advances in data-intensive technology. Other talks will introduce an approach to data-intensive methods and show how that approach is developing power and delivering results in multiple disciplines.

During the research village groups who have been making progress with any aspect of data-intensive research will display their wares. The participants will move around the village for 15-minute presentations. After each 15 minutes there will be a 5-minute interval to choose a new presentation at a different stand. This should provide a good opportunity for meeting fellow researchers, finding potentially useful ideas, methods and software and for starting new working relationships. Use the coffee breaks and the reception to follow up leads.

We also introduce three of the many ways of thinking about data-intensive problems: (1) programming paradigms, include workflows and bespoke high-level languages, (2) database paradigms, include parallel and distributed query as a framework for research methods, and (3) data analysis paradigms, combine statistical and algorithmic methods to extract knowledge from data. Other ways of thinking about data use will arise in the sessions and breakouts, as this list is by no means comprehensive. Also, a given technology might be suitable for more than one track, e.g. MapReduce is both a programming paradigm and a data-analysis paradigm. In other words, the tracks are not independent. We hope they will stimulate thought but not inhibit other lines of thinking.

The day, which is open to all (tell your friends), will form an introduction to data-intensive research, to current work and to key projects and products. It will form a framework for the rest of the week’s discussions.

The day will conclude with a drinks reception to give you a chance to have follow-up conversations and to start new ones.

Timetable

Time Session Speaker Talk Title
09:30 Registration and coffee
10:30 Welcome Dave Robertson, Head of School of Informatics, University of Edinburgh
10:40 Introduction Malcolm Atkinson, School of Informatics, University of Edinburgh Setting the agenda for data-intensive research PPT (10.2MB) PDF (3.5MB) Video
10:45 The challenge of big data Alex Szalay, Johns Hopkins University Strategies for exploiting large data PPTX (8.5MB) Video
11:30 Biology and data Douglas Kell, BBSRC & Manchester University Motivation and strategies for data-intensive biology PPT (5MB] Video
12:15 Complex Data Thore Graepel, Microsoft Research Cambridge Learning from Data in Online Advertising and Games PPT (9.5MB) PDF (4MB) Video
13:00 Lunch and Research Village David Meredith, Stephen Crouch, Peter Turner, Gerson Galang, Ming Jiang, Hung Nguyen Towards a loosely coupled and scalable component set for scheduling bulk data copying across different storage resources as fault tolerant batch jobs PPT (1.3MB) PDF (428KB)
Edinburgh Data-Intensive Research Group Talks Available on YouTube
DTS Batch Job Quicktime movie
Martin Kersten, MilenaIvanova, NielsNes, Rómulo Gonçalvesand Arjende Rijke Scientific Data Management: Why not let the databases in? PDF (246KB)
Neil Chue Hong OMII-UK - Delivering Software and Social Platforms for Successful Research PPT (3.7MB) PDF (1,5MB)
Savvas Petrou SPRINT: A Simple Parallel R INTerface PPT (2MB) PDF (574KB)
15:45 Data Analysis Strategies Chris Williams, School of Informatics, University of Edinburgh Data Intensive Research: Data analysis PDF (540KB) Video
16:05 Databases & Scientific Data Stratis Viglas, School of Informatics, University of Edinburgh Database paradigms for large-scale data processing PDF (934KB) Video
16:25 Programming Paradigms Geoffrey Fox, Indiana University Introduction of plan, people, questions and ideas for programming paradigms PPTX (1.2MB) Video
16:45 Experience with SEARS & Meandre Xavier Llorà and Bernie A'cs, NCSA Soaring through clouds with Meandre PPTX (8.3MB) Video
17:30 Reception

Discussion Pages

You can read more and discuss Monday's activities at these places: Monday's talks and the Research Village

Views
Navigation
This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.