Monthly Archives: November 2012

‘Triage [selection] and handover’ session at JISC MRD and DCC IE workshop, Nottingham

The ‘Triage and Handover’ session (session 3B) of the JISC Managing Research Data programme progress and DCC institutional engagements workshop (24 – 25 October 2012) differed in structure from the other sessions: less about project experiences and more about sharing expertise from people working specifically in this area and generating discussion for projects attending in response.

For the sessions, we note-takers were tasked with establishing: a) what is working? b) challenges and lessons learned, and c) what the MRD programme or the DCC can do to help.  Whilst the structure of this session didn’t lend itself so well to this task as some other sessions, I hope this summary will supply the salient points.

Angus Whyte (DCC) began this session by acknowledging the difficulties of the area.  Because there is no way of knowing which digital objects will be useful in the future, there is no one foolproof way to decide which data should be retained for handover at project end to institutional data management services, and which can be disposed of.

‘Triage’ here is used in the business sense rather than the medical sense: it is meant to imply the existence of a process of decision-making which can determine resource allocation.  ‘Selection’ suggests an either/or decision, which is useful to consider, but Angus makes the point that for institutions the greater need is to define a range of decisions. One of these will be disposal. Others might range from showcasing high-value data online to keeping low-value data on tape back-up.

As a co-author of the DCC ‘How to’ guide on appraisal and selection of data for curation, Angus has spent some time considering various models that are used by data centres and archives to guide their decision-making.  He described the basic records management approach to this:

1. Define a policy, i.e. criteria and range of decisions

2. Archive management applies criteria: select the significant, dispose of the rest

However, he argues, there are a few complications for this model when it comes to dealing with research data, i.e.:

  • Research processes may be more complex (need more explanation) than administrative processes
  • Data purpose may change
  • Needs more effort to make re-usable
  • Complex relationships and rich contexts
  • Originators should be engaged but may not have capacity to be
  • Others may need to be involved too
  • More than keep / dispose choice – need to prioritise attention and effort to make data fit for re-use.

So, for research data:

  • First, characterise.  What is this data?  What are the relationships within it and what are the significant aspects of the context in which it was created?
  • Appraisal criteria should establish: who has the duty of care? How accessible is the data?  What is its re-use value, and what costs are involved?
  • Categorise the responses to these criteria or questions i.e. combinations of high or low ratings. These are your triage levels; levels of effort and cost attached to making data accessible and discoverable, balanced against the likely range of reuse cases and benefits
  • An important factor will be whether there are other natural homes for the data and, if so, whether there are benefits from retaining a copy with the institution.
  • A tiered approach to data value could in theory map to a tiered approach to resource costs, e.g. for discoverability, access management, storage performance, preservation actions.

Clearly, some effort is required here.  This may make senior management, as well as the researchers themselves, say, ‘why not just keep it all?’  Well, in the arguments for selection, costs are a significant issue.  There has been an exponential growth in digital storage required in the last few years: this includes lots of types of digital content including research data, but of course other types of digital material can also be useful in the research process.

David Rosenthal estimated in his frequently-mentioned blogpost of 14 May 2012 how much it would cost to ‘keep everything forever in the cloud’.  He speculated that, based on current cost trajectories, keeping 2018’s data in S3 (Amazon’s cloud storage service) will ‘consume more than the entire GWP [Gross World Product] for the year’.  Whilst the DC/DP/RDM community may argue around the specifics of Rosenthal’s position here, his argument does help to demonstrate that whilst storage costs – never mind those for curation – have long been transparent to researchers, they are real and clarity here can help us to price curation (including storage) realistically and responsibly.

Selection presumes description.  You can’t value what you don’t know about.  Angus argued researchers can’t afford not to spend effort on minimal metadata description and organisation, because costs of retention will be much higher if they don’t.  Description makes data affordable – is citation potential a concrete enough reward?

To summarise, we must identify what datasets are created and where they are, and differentiate priorities.

Marie-Therese Gramstadt then outlined the activity of the JISC MRD KAPTUR project relating to selection and retention.  KAPTUR is aware of previous JISC MRD work in training.  One of main questions addressed by KAPTUR is how to select and appraise research data.  In their approach, they have referred to the DCC paper on this topic, and held an event earlier this year to further explore the issues.  The event discussed the following aspects of research data in the creative arts and how to select it for management:

  • Value and context, including scientific and historical value;
  • Value creation;
  • Ethics and legal issues;
  • Enabling use and reuse;
  • Enabling long-term access.

(More information on this KAPTUR event is available here http://kapturmrd01.eventbrite.co.uk/, which includes the presentations.)

Veerle Van den Eynden of the UKDA then presented a data centre view of the issue, as opposed to an institution-level view.  She described the current process that applies to deposit in the ESRC-funded UK Data Service, including the data review form, the work of the acquisitions committee which evaluates applications for deposit, and the acceptance criteria they apply.

The acquisitions committee will give one of three decisions about a dataset offered:

  • accept data into main ESDS collection for curation and longer-term preservation;
    • processing determined: either A, B or C
  • accept data into self-archive system, the ESRC data store, for short-term management and access; or,
  • unable to accept data.

which is a useful reminder that selection for management (including preservation) need not be a binary matter of yes / no but can consist of a range of possible management solutions.

Acceptance criteria includes:

  • Within scope
  • Long-term value and re-use potential
  • Data requested (by ESDS advisory committee, users)
  • Data from ESRC-funded research
  • Viable for preservation (acceptable file format, well documented)

Common reasons for non-acceptance:

  • Value of data in publications
  • Legal obstacles (copyright, IPR)
  • Ethical constraints (consent, anonymisation)
  • Depositor wishes unnecessarily stringent access conditions

Usually about 5-10% of data offered currently falls into these categories of non-acceptance.

There are currently some draft categories for the data collections accepted by UKDS.

  • Data collections selected for long term curation
  • Data collections selected for ‘short term’ management
  • Data collections selected for ‘delivery’ only
  • Data collections selected for ‘discovery’ only.

The Data Service has a Collections Development Policy currently in draft.  This addresses factors such as

  • Relevance
  • Scientific or historical value
  • Uniqueness
  • Usability
  • Replication data and resourses (materials required for replicating research)

Even if other projects and services don’t have the same levels of experience and capacity as the Research Data Service, these aspects of Data Service policy and structure provide an example of a functional approach to ‘triage’ and selection of research data.

Veerle also mentioned the repository engagement project, to support institutional data management / repository managers in their local role as ESRC data curators.  Through this, they aim to provide guidance and training in appraising data for social science research for IR staff and other good practice.  This is helpful in the current environment where there is more expectation from funders that institutions can take more responsibility for archiving data.  You can see Veerle’s presentation here.

Marie-Therese then briefly showed material from Sam Peplar of NERC who was unable to attend at short notice.  This described the development of the NERC data value checklist which aims to make selection better, more consistent and more objective.  It emerged from consultancy in the research sector and has been modified in response to user feedback.

NERC funding requires an outline DMP at proposal stage with a detailed DMP when funding is agreed.  The data value checklist is intended to be useful when preparing this full DMP but, Sam’s material cautioned, the checklist should not be expected to give some authoritative or definitive response to whether the data should be retained.  Rather, it supplies questions on which to reflect around aspects of the data such as storage, access, formats, origin, conditions, etc.  Sam is clear that there are not neat solutions for selecting data; objective rules are not possible.  He is also clear that scientists are not generally prepared to do the selection alone – this is an area of RDM which requires support.

The group feedback was included various pertinent questions, and concluded that whilst there is no one methodology for discerning the future value of data, it is currently important for institutions to understand where they fit in, in the current landscape in terms of their responsibility to assist researchers in responsible selection and deposit of data.  Veerle confirmed that funders expect data to go to the IR where available, and a data centre if not.  In either case, it is massively helpful if acceptance criteria are public: this can help researchers and research support staff to discern the most appropriate data for selection.

What are your main challenges in selecting and disposing of research data?  What could the JISC MRD programme or the DCC do to help?  Tell us in the comments.

‘Institutional Policies, Strategies, Roadmaps’ session at JISC MRD and DCC IE workshop, Nottingham

The ‘Components of Institutional Research Data Services’ event on 24 October 2012 brought together the ongoing JISC MRD infrastructure projects as well as the institutions with which the Digital Curation Centre is running an ‘institutional engagement’.

The ‘Institutional policies, strategies, roadmaps’ session (session 1A) reflected this nicely, with two speakers from MRD projects ‘Admire’ and ‘Research360’, and two from DCC IEs, St Andrews and Edinburgh.

What is working?

Tom Parsons from Nottingham’s Admire project described further connections across this set of institutions, acknowledging the 2011 aspirational Edinburgh data policy (more on this later) as the inspiration for theirs at Nottingham, and underlining the importance of being aware of the requirements not only of major funders at your institution but also the institutional policies which exist: these need to be found, understood, and worked with to give a coherent message to researchers and support staff about RDM.  This can be done, as he noted, by reflecting these existing messages in your data policy but also by strengthening the data management aspects of these existing policies, and so making the most of any credibility they already have with university staff.

At Bath, RCUK funders are also important influences on progress.  Cathy Pink from Research360 has established that the biggest funder of research work at her institution is the EPSRC, and so Research360’s roadmap work to particularly respond to the EPSRC’s expectations is important at her university, and was published earlier this year.  Bath has looked to the Monash University work to guide its direction in policy formation, particularly to inform strategic planning for RDM and making a clear connection between work at the university to advance RDM and the university’s existing strategic aims: an intelligent way to garner senior management buy-in.

Cathy noted that the DAF and Cardio tools from DCC were both useful in ascertaining the existing situation at Bath: these measures are important to take both in order to identify priorities for action, and also in order to be able to demonstrate the improvements (dare I say impact?) brought about by your work in policy formulation and / or training and guidance provision.

To be taken seriously at the institution and to promote awareness and buy-in, Cathy urged institutions to incorporate feedback from a wide range of relevant parties at the university: research support office, the library, IT support and the training support office where available.  This promotes a coherent approach from all these stakeholders as well as a mutually well-informed position on what each of these areas can contribute to successful RDM.

Birgit Plietzch from St Andrews also found DAF and Cardio relevant to ascertain the current data management situation at her institution but felt the processes could be usefully merged.   Birgit’s team again started by finding out who was funding research at the university (400+ funders!) and then increasing their understanding of these funders’ RDM requirements to create a solid base for policy work.  Again, the Monash University work in this area was useful at her institution, and when the EPSRC roadmap work was completed, as with Bath, it helped to demonstrate the relevance of RDM to diverse areas of institutional activity.

Edinburgh’s Stewart Lewis, too, described the value of creating relationships not only with senior management champions for RDM but also between the university mission statement or strategic aims, and RDM policy.  Stewart acknowledged that the aspirational policy published by Edinburgh in 2011 is a useful way to both instigate and lead on improved RDM at the university, but that action is also crucial.  The aspirational mode of policy gives a stable, high-level statement which is then enacted through supporting, and more volatile, documents.  So whilst action is devolved from the top-level document, it is still intrinsically important if culture change is to happen.  To this end, they have created various levels of implementation groupings to carry through specific actions.  Infrastructure specified by their policy work includes a minimum storage amount and training provision.

In accordance with the Grindley Theory of Four Things (see the – fittingly – 4th bullet point of https://mrdevidence.jiscinvolve.org/wp/2012/11/05/research-data-management-programme-training-strand-kick-off-workshop-london-26-october/), Edinburgh is concentrating on four high level  areas: planning, infrastructure, stewardship and, lastly, support across these three.   These areas were chosen in order to meaningfully move forward the RDM work at Edinburgh whilst still making sense to the researcher population.

Challenges and lessons learned

Tom shared some findings gathered by Admire from their survey of the institution’s researcher population which shows around 230 projects are currently funded and so storage requirements are substantial.  Most of these projects are funded by RCUK funders, and so the expectations for a well-organised approach to RDM are also pretty substantial.  When c. 92% of researchers surveyed at the institution report having had no RDM training, we can understand the need for (and scale of) Admire’s work!

Cathy echoed Tom’s point: don’t attempt to simply lift one institution’s work and hope to apply it to yours.  The tailoring required is significant if a set of policies is going to work in your own context.

The first attempt at the RDM policy for Bath was rejected by the senior management group.  Inspirationally, Cathy recognised this as a great opportunity to refine their work and improve the policy using the feedback received.  It also helped clarify their ambitions for the policy and resolved the team to do better than ‘just good enough’: being tempered, of course, by the support infrastructure that could be realistically delivered by the institution – a similar situation as with Nottingham.

Cathy emphasised the point that good quality consultation across the institution is time-consuming but well worthwhile if you aim to build genuinely useful and effective policy or other resources.

Birgit also faced challenges in getting a wider acceptance of some promising RDM policy work.  The institutional environment, including a recent reshuffle of IT provision, had contributed problems to the smooth progress of their IE and senior management, once again, needed compelling evidence to understand the benefits of improved RDM for the institution.

Birgit also found that academics were overextended and found it difficult to make the time to participate in the research that her team needed to undertake to develop policy in this area, but when they realised the relevance they were keen to be involved in the process and to access RDM training.  The notion of the aspirational (as opposed to the highly-specified) mode of RDM policy is popular with researchers at her institution.

Next steps for Stewart and the team at Edinburgh include attaching costs, both in terms of person-time and financial, to the actions specified under their EPSRC roadmap, which will be published soon.  The team will also soon run focus groups using the DCC’s DMP Online tool, run a pilot of Datashare, establish what is needed by researchers in addition to storage, and run training for liaison librarians; these activities, however, need resources: the next challenge to meet.

Discussion picked up the balance between universities offering trustworthy storage appropriate for research data and the motivation of researchers to bid for these resources elsewhere: researchers bidding for this type of funding not only helps the university to concentrate resources in other useful areas but also helps to give a clear message to funders that if they want improved RDM, they have to be prepared to contribute financially towards it.

Costing was a popular topic: Graham Pryor (DCC) was interested that no speaker said they’ve attached costs.  Sometimes explicitly identifying costs means this work becomes unacceptable to senior management on financial grounds.  Paul Stainthorpe at Lincoln agreed that you can spend lots of time on policy, but it won’t be accepted unless there’s a business case.  Other institutions agreed, but added that senior management want some illustrative narrative in addition to the hard figures, to tell them why this really matters.

Birgit added that there is also the problem of unfunded research, particularly in the arts.  Her team has been receiving an increasing number of enquiries relating to this area, and it’s an area also being considered by Newcastle’s Iridium project, who have looked at research information management systems and discovered they only track funded work, leaving unfunded research as ‘a grey area’, even though it may be generating high impact publications.  At UAL, a partner in the KAPTUR project, lots of researchers do a lot of work outside the institution and not funded by it and so for the purposes of the project, they’re being explicit about managing funded work.

UAL has recently launched their RDM policy as a result of their KAPTUR work and stakeholders are happy with it in principle, but the challenge now is how to implement it: John Murtagh noted that engagement and understanding mean work must continue beyond the policy launch.  I mentioned the importance of training here as an element which has to be developed at the institution alongside policy and technical infrastructure.  This was agreed by Wendy White of Southampton: policy needs to be an ongoing dialogue and the challenge is to integrate these elements.

What could the MRD programme or the DCC do to help?

–          DCC: advise on whether funders are going to move the goalposts, and how realistic the risks are of this happening;

–          DCC: advise on what public funding can be used to support RDM policy work;

–          help with costing work

–          DCC: mediation between universities and the research councils, clarifying requirements and sharing universities’ experiences, etc.

–          DCC: providing briefings on current issues, e.g. PVC valued briefings re. open access.

Research Data Management programme Training Strand kick-off workshop, London, 26 October

This one-day event provided an overview of the JISC MRD programme training strand, its aims and context; a description of the DaMSSI-ABC support initiative for the training strand and various pieces of work it hopes to complete before particularly in terms of making outputs easier to find and use; and recognition of the fact that the activity of the four small training materials projects of the JISC Digital Preservation programme have correspondence with the RDMTrain02 projects.

The four RDMTrain02 projects each talked about their approach, activities, challenges and progress, giving us an idea of the subject areas or staff groups they are specifically addressing with the RDM training materials they develop:

  • RDMrose, Sheffield (Andrew Cox): ‘information professionals’ (which I understand to be, in this context, academic librarians)
  • Research Data Management Training for the whole project lifecycle in Physics & Astronomy research (RDMTPA), Hertfordshire (Joanna Goodger): PG students and ECRs in the physical sciences
  • Sound Data Management Training (SoDaMaT), Queen Mary University of London (Steve Welburn): postgraduate research students, researchers and academics working in the area of digital music and audio research
  • TraD: Training for Data Management at UEL (Gurdish Sandhu and Stephen Grace): PGR students in psychology and in computer science.

The afternoon session consisted of an introduction to a set of description and evaluation criteria which have been developed by the Research Information Network through its Research Information and Digital Literacies coalition.  These criteria are in an advanced draft form and participants were asked to read and feedback on them.  They are intended to help with 1. specifying what the training resource or event is meant to do and who it is for, and 2. assessing the success of the training against those specifications.  As such, it’s potentially a very useful tool to suggest to and remind those developing training of useful measures they can take and factors that should be considered in order to create a genuinely useful training resource, whilst also providing a framework for review and impact.

Some participants were perhaps not entirely clear on the potential benefits of the criteria, and profited from a chance to discuss the document with members of the DaMSSI-ABC team.  Those who had a clear grasp of the aim and structure of the document – usually by replacing ‘information literacy’ with ‘research data management’ for ease of use in their particular context – agreed it looked very useful and provided a structure that may clarify what they’re trying to do.

Detailed feedback and questions on the criteria were sought, and will still be received gratefully by Stéphane Goldstein at stephane.goldstein AT researchinfonet.org.

Discussion was a good opportunity for projects to ask questions and share experiences.   Points included:

  • Culture change in institution can’t be expected to happen during short project lifespan.  But projects can be a catalyst to inspire change and start the process.
  • Important to remember that changing culture in one area or institution can influence other players, e.g. researcher practice and requirements can influence the behaviour of publishers if messages are clear enough.
  •  Support – including admin – staff are an important population in institutions: in universities, they are over 50% of staff.  They also have to manage data and information.  Datasafe (Bristol) has been considering their needs as well as those of researchers.
  • Simplification of models can sometimes help engagement.  As JISC’s Neil Grindley pointed out, many initiatives have simplified models such as the DCC lifecycle model into four main areas; e.g. the four digital preservation projects have collaborated on a leaflet which reduces DC activities to: start early, explain, store, share.  This will heretofore be known as the Grindley Theory of Four Things.
  • Short (5 – 10 min) resources lend themselves to easier re-use and can more easily be slipped into training at the institution that isn’t about RDM.  This means we can raise awareness more widely than just preaching to converted.  For example, it would make sense to include RDM in induction training, or training for researchers in bidding for funding.
  • Terminology is still an issue: ‘digital preservation’ and even ‘data’ is problematic in some training contexts.
  • People in institutions are already doing training in disparate ways in areas connected to RDM.  It’s important to find out if this is happening in your institution, if they are aware of your project and if you’re giving consistent messages across the institution.
  • Even simple measures can be valuable when you’re trying to quantify the benefits of improved RDM.  Sometimes a quantity is useful, sometimes a story.
  • Need for generic as well as discipline-specific training and resources.
  • Need to work across campus and involve all relevant areas such as research office, library, IT services (both local and central computing services), staff development services, legal office.
  • Librarian role is valuable for various reasons, but an important one is the ability to use links across campus.
  • Whilst researchers often appear to have higher loyalty to their discipline than their institution, and researchers are a mobile population, a discipline by its nature doesn’t often have agreed rules, representatives, funded infrastructure or membership.    So knowledge can be passed through informal networks, but there is little in the way of actually engaging with ‘a discipline’ as a whole.  It’s still institutions who are providing the infrastructure, policy framework and the training.  DaMSSI-ABC keen to work with professional bodies where these exist to try and address this situation.
  • This strand of projects as well as fellow travellers, e.g. www.le.ac.uk/researchdata happy to build on prior work, e.g. JISC Incremental www.glasgow.ac.uk/datamanagement, UKDA, Sudamih, in the ‘four things’ approach to building online guidance.
  • Is there a role for organisations such as UKCGE, HEA?

Links:

JISC MRD training strand (RDMTrain02): http://www.jisc.ac.uk/whatwedo/programmes/di_researchmanagement/managingresearchdata/research-data-management-training.aspx

DaMSSI-ABC: http://www.researchinfonet.org/infolit/damssi-abc/

RIN Research Information and Digital Literacies Coalition: http://www.researchinfonet.org/infolit/ridls/

RIN Criteria for Describing and Assessing Training: http://www.researchinfonet.org/infolit/ridls/strand2/