This wiki service has now been shut down and archived
Monday 15th March
From EscienceEnvoyWiki
Click to return to the main event page - Monday - Tuesday - Wednesday - Thursday - Friday - Biographies - Abstracts - eSI Workshop wiki
Web cast of plenary talks presentations and other material used at the DIR workshop
Data-Intensive Research Workshop: Monday 15th March 2010
Topic: Introduction - the interplay between challenges and technology in data-intensive research
Day organiser: Malcolm Atkinson
Introduction
The use of data in research is growing rapidly; the digital revolution generates more and more data, and policies encourage more data to be published. Expectations for openness, repeatability and evidence quality increase the data-use imperative.
Data use includes the data-curation lifecycle from creation or collection, through cleaning, integration, analysis, annotation, citation and presentation to preservation or discard. It includes finding data, developing an understanding of a body of data (often drawn from multiple sources), working out how to get evidence from data or what the newly available collections of data may now enable, deciding how to extract evidence from data possibly in combination with other data and presenting results so that evidence is understood, trusted and used.
There are many challenges in the wide range of data uses; for example, coping with complexity, with variable or poor data quality, with high volumes, with sophisticated analytic and presentation requests, with high data rates, with heterogeneity, with user numbers and diversity and so on. These challenges are addressed through multiple forms of iteration, progressively discovering and understanding data, progressively developing and understanding analytic methods, progressively refining the processes used to obtain particular forms of evidence, progressively improving the software, progressively adapting the computational platforms, and so on. Communities of data users build expertise around data and as they do they change requirements, patterns of use and modes of acceptable behaviour.
Data-intensive research is both research in any domain that has to pay serious attention to the ways in which it uses data in order to succeed, and research that improves our ability to use data. These co-evolve.
Monday’s programme explores that co-evolution. Talks will start from an understanding of a class of research challenges that need data-intensive methods and report how those methods are developing and exploiting advances in data-intensive technology. Other talks will introduce an approach to data-intensive methods and show how that approach is developing power and delivering results in multiple disciplines.
During the research village groups who have been making progress with any aspect of data-intensive research will display their wares. The participants will move around the village for 15-minute presentations. After each 15 minutes there will be a 5-minute interval to choose a new presentation at a different stand. This should provide a good opportunity for meeting fellow researchers, finding potentially useful ideas, methods and software and for starting new working relationships. Use the coffee breaks and the reception to follow up leads.
We also introduce three of the many ways of thinking about data-intensive problems: (1) programming paradigms, include workflows and bespoke high-level languages, (2) database paradigms, include parallel and distributed query as a framework for research methods, and (3) data analysis paradigms, combine statistical and algorithmic methods to extract knowledge from data. Other ways of thinking about data use will arise in the sessions and breakouts, as this list is by no means comprehensive. Also, a given technology might be suitable for more than one track, e.g. MapReduce is both a programming paradigm and a data-analysis paradigm. In other words, the tracks are not independent. We hope they will stimulate thought but not inhibit other lines of thinking.
The day, which is open to all (tell your friends), will form an introduction to data-intensive research, to current work and to key projects and products. It will form a framework for the rest of the week’s discussions.
The day will conclude with a drinks reception to give you a chance to have follow-up conversations and to start new ones.
Timetable
| Time | Session | Speaker | Talk Title |
|---|---|---|---|
| 09:30 | Registration and coffee | ||
| 10:30 | Welcome | Dave Robertson, Head of School of Informatics, University of Edinburgh | |
| 10:40 | Introduction | Malcolm Atkinson, School of Informatics, University of Edinburgh | Setting the agenda for data-intensive research PPT (10.2MB) PDF (3.5MB) Video |
| 10:45 | The challenge of big data | Alex Szalay, Johns Hopkins University | Strategies for exploiting large data PPTX (8.5MB) Video |
| 11:30 | Biology and data | Douglas Kell, BBSRC & Manchester University | Motivation and strategies for data-intensive biology PPT (5MB] Video |
| 12:15 | Complex Data | Thore Graepel, Microsoft Research Cambridge | Learning from Data in Online Advertising and Games PPT (9.5MB) PDF (4MB) Video |
| 13:00 | Lunch and Research Village | David Meredith, Stephen Crouch, Peter Turner, Gerson Galang, Ming Jiang, Hung Nguyen | Towards a loosely coupled and scalable component set for scheduling bulk data copying across different storage resources as fault tolerant batch jobs PPT (1.3MB) PDF (428KB) |
| Edinburgh Data-Intensive Research Group Talks | Available on YouTube | ||
| DTS Batch Job Quicktime movie | |||
| Martin Kersten, MilenaIvanova, NielsNes, Rómulo Gonçalvesand Arjende Rijke | Scientific Data Management: Why not let the databases in? PDF (246KB) | ||
| Neil Chue Hong | OMII-UK - Delivering Software and Social Platforms for Successful Research PPT (3.7MB) PDF (1,5MB) | ||
| Savvas Petrou | SPRINT: A Simple Parallel R INTerface PPT (2MB) PDF (574KB) | ||
| 15:45 | Data Analysis Strategies | Chris Williams, School of Informatics, University of Edinburgh | Data Intensive Research: Data analysis PDF (540KB) Video |
| 16:05 | Databases & Scientific Data | Stratis Viglas, School of Informatics, University of Edinburgh | Database paradigms for large-scale data processing PDF (934KB) Video |
| 16:25 | Programming Paradigms | Geoffrey Fox, Indiana University | Introduction of plan, people, questions and ideas for programming paradigms PPTX (1.2MB) Video |
| 16:45 | Experience with SEARS & Meandre | Xavier Llorà and Bernie A'cs, NCSA | Soaring through clouds with Meandre PPTX (8.3MB) Video |
| 17:30 | Reception |
Discussion Pages
You can read more and discuss Monday's activities at these places: Monday's talks and the Research Village