We all know what a nightmare it is trying to recruit participants for research studies. So if you only have to get hold of some routine data that’s just sitting there, well, that’s going to be much simpler, isn’t it? You’d think....
The plan for my PhD was to look at inequalities in cancer care by linking cancer registry and Hospital Episode Statistics data for lung cancer, and also linking to audit data. This is routine data that has already been collected and so I naively assumed it was just a case of getting ethical approval to access the data, finding someone to cobble the data together and off we go. I wrote an optimistic project timetable where I would get my hands on the data about seven months into the PhD. Eighteen months in I’ve finally got hold of some unlinked data and I’m still waiting for the rest.
|I don't work for News International, so what's the problem? Photo: Christian Sinibaldi|
I think my first mistake was assuming that just because the data was there it would be easy to get hold of. There are a lot of hoops you have to jump through first.
I thought that what I wanted to do was simple but it turns out that it’s not. This is apparently the most complicated linkage that the cancer registry has undertaken and the bottom line was, nobody wanted to do it. I spend a lot of time begging people to speak to me and basically being fobbed off, in the nicest possible way. Luckily I eventually found a newly-joined analyst who was willing to give it a go. I’m not sure that she’s thanking me now...
Issues then arose of whether the data I wanted might be identifiable. Variables such as date of birth and death are classed as identifiable and individual records are ‘potentially-identifiable’, even if they don’t include identifiable information (which is an excellent catch 22 – they are identifiable even though they are not identifiable...)!
Finally it seemed like it was all coming together. I’d agreed with the registry that they would supply me with anonymised data containing ages rather than dates, I’d made it through ethics, and I’d got some data. But, on checking, not exactly the data I wanted. So, currently I am discussing with the registry how it will be possible for me to calculate survival time if they won’t allow me to have data on the number of days from diagnosis to death. Survival from lung cancer is short and rounding to the nearest year isn’t going to identify survival differences with any degree of accuracy.
The sticking point is that although they are not supplying me with date of death I could theoretically work it out from this information and that makes the data (aaagh!) ‘identifiable’. However, as I don’t have an NHS number, date of birth, or place of death, I don’t know how I would identify anyone from the 140,000 records I have. Plus I’m a researcher, not a News International journalist, and I’m not interested in anyone individually, so I’m not going to attempt to do this.
Can’t I just sign something to that effect and have the data I need please?