Category Archives: Projects: Commonalities & Differences

Oxford digital infrastructure to support research workshop

The University of Oxford have impressively attempted to marshal the diverse projects ranging across disparate areas of expertise in research data management at the university. I attended a DaMaRo workshop today to review the digital infrastructure required to meet the challenges of the multi disciplinary and institutional research landscape as it pertains to Oxford.

First and foremost, this is no mean feat in a university as diverse and dispersed as Oxford and Paul Jeffreys and colleagues are to be congratulated for the work to date. It’s hard enough attempting join up in a smaller, albeit research intensive university such as Leicester and the road is long and at times tortuous. Never mind potentially at odds with established university structures and careers…

I particularly liked the iterative approach taken during the workshop: so present key challenges to the various stakeholders present; provide an opportunity to reflect; then vote with your feet (ok, post-it notes in traffic light colours) on which areas should be prioritised. At the very least this is useful even if we may argue over which stakeholders are present or not. In this case the range was quite good but inevitably you don’t get so many active researchers (at least in terms of publishing research papers) at this kind of meeting.

In assessing the potential research services it was pointed out where a charging model was required, if not funded by the institution or externally. Turns out here at Oxford the most popular choice was the proposed DataFinder service (hence no weblink yet!) to act as a registry of data resources in the university which could be linked to wider external search. I remember during the UK Research Data Service pathfinder project that there was a clearly identified need for a service of this kind. Jean Sykes of LSE, who helped steer the UKRDS through choppy waters, was present and told me she is about to retire in a couple of months. Well done Jean and I note that UKRDS launched many an interesting and varied flower now blossoming in the bright lights of ‘data as a public good’ – an itch was more than scratched.

I also note in passing that it was one of the clear achievements of the e-science International Virtual Observatory Alliance movement, developed for astronomical research between 2000-2010, that it became possible to search datasets, tools and resources in general via use of community agreed metadata standards. Takes medium to long term investment but it can be done. Don’t try it at home and don’t try and measure it by short term research impact measures alone…even the  Hubble Space Telescope required a decade plus before it was possible to clearly demonstrate that the number of journal papers resulting from secondary reuse of data overtook the originally proposed work. Watch it climb ever upwards after that though…

Back to the workshop: we identified key challenges around Helpdesk type functionality to support research data services and who and how to charge when – in the absence of institutional funding. I should highlight some of the initiatives gaining traction here at Oxford but it was also pointed out that in house services must always be designed to work with appropriate external services. Whether in-house or external, such tools must be interoperable with research information management systems where possible.

Neil Jefferies described the DataBank service for archiving, available from Spring 2013, which provides an open ended commitment to preservation. The archiving is immutable (can’t be altered once deposited) but versioned so that it is possible to step back to an earlier version. Meanwhile Sally Rumsey described a proposed Databank Archiving & Manuscript Submission Combined DAMASC model for linking data & publications. Interestingly there is a serious attempt to work with a university spin off company providing the web 2.0 Colwiz collaboration platform which should link to appropriate Oxford services where applicable. It was noted that to be attractive to researchers a friendly user interface is always welcome. Launch date September 2012 and the service will be free to anyone by the way, in or out of Oxford.

Meanwhile, for research work in progress the DataStage project offers secure storage at the research group level while allowing the addition of simple metadata as the data is stored, making that step up to reusability all the easier down the line. It’s about building good research data management practice into normal research workflows and, of course, making data reusable.

Andrew Richards described the family of supercomputing services at Oxford. Large volumes of at risk storage are available for use on-the-fly but not backed up. You’d soon run into major issues trying to store large amounts of this kind of dataset longer term. There is also very little emphasis on metadata in the supercomputing context other than where supplied voluntarily by researchers. I raised the issue of sustainability of the software & associated parameters in this context where a researcher may need to be able to regenerate the data if required.

James Wilson of OUCS described the Oxford Research Database Service ORDS which will launch around November 2012 and again be run on a cost recovery basis. The service is targeted at hosting smaller sized databases used by the vast majority of researchers who don’t have in-house support or appropriate disciplinary services available to them. It has been designed to be hosted in a cloud environment over the JANET network in the same way as biomedical research database specific applications will be provided by Leicester’s BRISSkit project.

Last but not least, Sian Dodd showed the Oxford Research Data Management website which includes contact points for a range of research data lifecycle queries. It is so important to the often isolated researcher that there is a single place to go and find out more information and point to the tools needed for the job at hand.  Institutions in turn need to be able to link data management planning tools to in-house resources & costing information. To that end, the joint Oxford and Cambridge X5 project (named after the bus between the two) will go live in February 2013 and provide a tool to enable research costing, pricing & approval.

DARTS3, The Third Discover Academic Research Training & Support Conference. Dartington Hall, Devon: 28 – 29 June 2012

Whilst storms swept much of the rest of the country, the sleepy peace of bucolic Devonshire was barely disturbed by the arrival of several dozen librarians (plus a couple of ‘fellow travellers’) to dreamy Dartington.

Anna Dickinson from HEFCE’s REF team (of which there are only five people!) kicked off the first day with a very informative overview of the 2014 REF expectations, process, staff selection, timescales, the test submission system, the assessment of the research environment and how the panels work, with particular advice on areas where research support staff may be involved.

Judith Stewart of UWE and Gareth Cole of Exeter, in separate presentations, both described the work and findings of their current JISC MRD-funded research data management projects (UWE’s project, ‘Managing Research Data’ is at http://www1.uwe.ac.uk/library/usingthelibrary/servicesforresearchers/datamanagement/managingresearchdata.aspx; the Open Exeter project is at http://blogs.exeter.ac.uk/openexeterrdm/).

Each also each positioned library staff members as key to improved research data management across the university, as part of partnership working with other relevant research support professionals.  Both presenters also reminded us that library staff members are well-placed to instigate research data management activity if this is not already an activity within an institution: whilst the research data management challenge may require new skills, librarians are already skilled in information management, bibliometrics, and other relevant areas of expertise, and are experienced in working across the institution, free from inter-faculty or inter-discipline politics.  These skills equip them well to work towards supporting researchers with better management of research data.

Miggie Pickerton of Northampton pushed this relationship between library staff and research activity further, arguing there are strong benefits for library staff to wade into research activity for themselves.  Drawing a division between ‘academic’ and ‘practitioner’ research, Miggie encouraged library staff to consider either but particularly argued the case for the value of ‘practitioner’ research, which she defined as taking a pragmatic approach to a current problem or need, as opposed to curiosity-driven work intended to make REF impact.

Through a very interactive session, Miggie encouraged the audience to identify the benefits of library staff undertaking research for the individual librarian, the institution, and the library profession as a whole, and provided some examples of suitable topics for investigation.  Inspiring!

Jennifer Coombs (N’ham) and Elizabeth Martin (De Montfort) described their experiences of creating, alongside colleagues from Loughborough and Coventry, a collaborative online tutorial to teach researchers about research promotion (www.emrsg.org.uk).

Jez Cope of the Research360 project at Bath (http://blogs.bath.ac.uk/research360/) shared the benefits for researchers of several social media applications.  Despite the earlier assertions of doubt about Twitter by the event chair, Jez managed to get a few more delegates onto the service and interacting with other delegates as well as more remote followers of the event hashtag.

As always, it was apparent that institutions vary widely in their cultures, sizes and experience with RDM, but we learned a great deal about what librarians are already doing to support researchers, some new tools and techniques that might be useful for their work in this area, and some powerful arguments for expansion into the research data management and research practice areas.

Delegates to this event may find it interesting to explore the research data management training materials made by five projects of the first MRD programme, available at http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmtrain.aspx (follow the link for each project at the bottom of the page).  These materials are freely available for use and reuse, and will be supplemented by a further four projects in the second MRD programme, starting this summer, some of which will be delivering training materials specifically for research support professionals including library staff.

Here’s hoping there will be a DARTS4!

 

Discuss, Debate, Disseminate – PhD and Early Career Researcher data management workshop, University of Exeter, 22 June 2012

Jill and Hannah of the Open Exeter project have not been holding back with their user requirements research – not content with attracting hundreds of responses to their survey of Exeter postgraduates, they’re also augmenting this with their own research as well as running events like Friday’s, in an admirably thorough approach to gathering information on what postgraduate students and early career researchers at their institution need, how they work and where the gaps are in the current infrastructure provision.

Twenty enthusiastic participants turned up on 22 June, happily from across the sciences and humanities, and contributed with gusto to group discussion, intensive one-to-one conversations and a panel session.  The project has recruited six PhD students – Stuart from Engineering; Philip from Law; Ruth from Film Studies; Lee from Sport Sciences and Duncan from Archaeology, plus one more currently studying abroad – to help bridge the gap between project staff and their PhD peers.  These six are working intensively with the project team to sort out common PhD-level data management issues and activities in the context of their own work, which allows them to not only improve their own practice but also to share their experiences and tips with other PhD students and ECRs in their own disciplines at Exeter.  (You can see more about this at http://blogs.exeter.ac.uk/openexeterrdm/)

One of the most interesting aspects of working on this programme, for me, is understanding the nuts and bolts of research data management in a specific disciplinary context, in a particular institution.  In other words, the same context in which each researcher is working.  Although funders are increasingly calling the shots with requirements and expectations for research data management, the individual researcher still has to find a way to put these requirements into practice with the infrastructure they have to hand.  That means it’s all very well for the EPSRC or AHRC or whoever to require you to do something, and you may even understand why and want to do it, but who do you ask in IT to help?  Why isn’t it OK to just put data on Dropbox?  What to do with data after you finish your PhD or project?  And what is metadata anyway?

Despite the generally-held view by researchers that their RDM requirements are unique to their discipline, these questions – and other like them – are actually fairly consistent across institutions when researchers are sharing concerns in an open and relaxed environment.  And this was one of the achievements of today’s event: by keeping things friendly, low-key and informal, the team got some very useful information about what PhDs and ECRs are currently doing with RDM, the challenges they’re encountering and what Exeter needs to provide to support well-planned and sustainable RDM.

Some additional detail from the event:

–       Jill offered a working definition of ‘data’ for the purposes of the workshop: “What we mean by data is all inclusive.  It could be code, recordings, images, artworks, artefacts, notebooks – whatever you feel is information that has gone into the creation of your research outputs.”  This definitely seemed to aid discussion and meant we didn’t spend time in semantic debate about the nature of the term.

–       Types of data used by participants:
o       Paper, i.e. printouts of experiment
o       Word documents
o       Excel spreadsheets
o       Interview transcripts
o       Audio files (recordings of interviews)
o       Mapping data
o       PDFs
o       Raw data in CSV form
o       Post-processed data in text files
o       Graphs
o       Tables for literature review
o       Search data for systematic review
o       Interviews and surveys: audio files, word transcripts
o       Photographs
o       Photocopies of documents from the archives
o       NVivo files
o       STATA files

–       Common RDM challenges included: the best way to back-up, use of central university storage, number of passwords, complexity of working online (which can make free cloud services more attractive), lack of support with queries or uncertainty about who to contact; selection and disposal, uncertainty over who owns the data.

–       Sources of help identified during the event: subject librarians, departmental IT officers, and during the life of the project, Open Exeter staff, existing online resources such as guides from the Digital Curation Centre (http://www.dcc.ac.uk) and the Incremental project (http://www.gla.ac.uk/datamanagement and http://www.lib.cam.ac.uk/preservation/incremental/).

The future of the past: closing workshop for the Data Management Planning projects

It always provokes mixed feelings to attend a closing event marking the end of a project or raft of projects.  On the one hand, it’s melancholy to say goodbye to people, or to know that there will be no more interesting outputs coming from a particular project.  On the other, there is (hopefully) the sense of achievement that comes with having finished a piece of work.  Having something finished, ready to show, then getting ready for the next activity, preparing for the future.  It was useful and thought-provoking to see the findings and outputs of the ‘strand B’ or data management planning projects of the MRD02 programme at the Meeting Challenges in Research Data Planning workshop in London on 23 March.  This event marked the closing of these projects, and gave them an opportunity to share what they’d been doing.  Data management planning by definition is about considering the future, and there was a sense of energy and enthusiasm from the projects on the day which suggested we could easily have met for longer and talked more.  And yet, some elements of the discussion made me think about the past.

Back in MRD01 (2009-11), there were a few projects such as Oxford’s Sudamih and Glasgow-Cambridge’s Incremental project which performed institution-specific scoping work about what researchers need to improve both their understanding and practice of RDM.  As one of the Incremental team, I felt at the time that, to be honest, a lot of it seemed to be stating the blooming obvious, but we recognised the value of gathering original data on these issues in order (1) to check that our suspicions were correct; and (2) to wave in front of those making decisions about whether and how to fund RDM infrastructure.

You can read the full report of Sudamih here and Incremental here, but the main ideas we found evidence for were things like: researchers are almost always more interested in doing their research than spending time on data management, so engagement relies on guidance being short and situated in one obvious, easy-to-navigate place; there are lots of guidance resources at institutions already but they’re scattered and not well advertised; lots of researchers in the arts and humanities don’t consider their material as ‘data’ and so the terminology of RDM doesn’t engage them or may actively alienate them; researchers may be party to multiple data expectations from their institution and / or their funder, but a lot of them are not aware of that fact, never mind what these are and where to find them in writing.  Also, different disciplines have different data sharing conventions and protocols, which affect researcher behaviour; some researchers can be quite willing to practice good data management, but they need to know who to call or email about it at their own place; guidance written by digital curation specialists is great and fine, but often needs translating into non-specialist language, and there are lots of researchers who are just not going to engage with a policy document.  All that kind of thing.  Readers of this blog will possibly be amazed that such fundamental ideas are not more widely understood out there in the wider research community, but that in itself probably just confirms the knowledge gap between RDM people and the general researcher population.

So back at the event on 23 March, we heard from, amongst others, Richard Plant of the DMSPpsych project explaining the importance of local guidance for the institution’s researchers, and Norman Gray of MaRDI-Gross explaining the influence of the data sharing culture in big science on its researchers (although I never did get around to asking him if the project did indeed reach ‘the broad sunlit uplands of magnificently-managed big-science data’, as promised in the project blog).

History DMP from Hull charmed with an appearance by one of their tame researchers, who came along to give a brief account of his experience with the project.  He was happy not being familar with RDM terminology or principles or, as he put it,

‘This process has been very straightforward for me.  I don’t understand the technical elements but I don’t need to.’

The benefits of easier remote access to and confidence in the security of his data storage were the pay-off for him, and left everyone feeling optimistic.

Reward at UCL/Ubiquity Press did many interesting things whilst aiming to lower the barriers to good RDM and shared a deluge of findings echoing those of Incremental / Sudamih, including the value of drawing together institutional RDM-related resources to provide a single point of access; the effect of discipline-specific protocols on researcher behaviour (specifically data sharing); the value of clarifying benefits of good RDM to motivate researchers; the lack of current awareness about IPR, licensing and data protection; the reluctance to discard data; the need for training about RDM and particularly long term preservation of data; and many other points.

So what occured to me on 23 March was that it felt good to hear several of the MRD02 strand B projects reiterating our findings from their own experiences at their own institutions.  It reminded me of Heather Piwowar’s notion of ‘broad shoulders’.  It wasn’t that they were agreeing with us – I’m more than happy for my research to be challenged constructively.  It was that what we’d done in MRD01 seemed to be useful to some extent, allowing the MRD02 projects to extend and refine user requirements in RDM, and share what they found, which benefits us all.

Commonalities & Differences: Requirements & Disciplines

Within our remit to identify themes and trends in the JISMRD Programme and to enable collaboration and synergies between its projects, exploring commonalities and differences is a key area with a multitude of angles. Diverse endeavours, domains, institutions and scopes on the project side entail a number of approaches, methods, user communities, research practices & cultures, data life cycles, workflows and therefore actual needs, requirements, benefits, data infrastrutures and policies. Knowledge transfer in the programme is crucial to not to re-invent the wheel (at least not every time), learn from previous experiences, discuss emerging topics, collaborate and hence (mutually) benefit from all those differences and commonalities.

In the weeks and months to come I shall focus on commonalities and differences on this blog under different aspects, starting with the requirements and disciplinary angle (albeit I am aware that a lot of areas are overlapping: requirements gathering involves methods as well as research practice and perceived benefits, which again have an impact on costs, et cetera et cetera). The thought would be to ideally start a discourse, get feedback and input from projects and people, gather documentation and discussion topics, facilitate and provide support. A workshop at some later stage might be an activity spawning from that, if deemed useful.

My own project related hat is that of the user liaison & researcher, e.g. gathering requirements, including looking into research practice and benefits of diverse communitites at the University of Manchester previously in the MaDAM project (JISCMRD phase 1; see here for outputs) and now in MiSS (JISCMRD02; see resources section). Our requirements approach in both projects is user-driven, iterative and based on close collaboration between RDM specialists, users/researchers, other stakeholders (high-level buy-in is especially important) and the project team/developers. In MaDAM we were focussing on pilot users from the Biomedical domain – in MiSS the RDMI will have to cater for the whole of the University with the challenge of establishing a balance between a generic, easy-to-use eInfrastructure and providing a system open enough for discipline specific needs (plug-in points). We have user champions in each faculty: Life Sciences (Core Facilities and MIB – large and diverse data), Engineering and Physical Science (Henry Mosley Centre, Material Sciences & MIB – large data), Medical and Human Sciences (sensitive data!) and Humanities (CCSR, applied quantitative social research – data service and diversity) and will also open up a user committee to the wider University for input and feedback in a few weeks. We just have completed our baseline requirements phase, so please watch out on this channel for more details and the report!

But back to you, the JISCMRD projects’ fields of interests and needs:

How do you approach your requirements process?

What are particular challenges, e.g. in specific disciplines?

What are particularly enthralling lessons learned (already)?

How to achive benefits and synergies between projects?

What would be your ideas on how to facilitate (by us) any exchange on such issues, any ideas are welcome!

Meik Poschen  <meik.poschen@manchester.ac.uk>
Twitter:  @MeikPoschen