Monday, 28 May 2012

Routine Secondary Data

Posted by Lynne Forrest

We all know what a nightmare it is trying to recruit participants for research studies. So if you only have to get hold of some routine data that’s just sitting there, well, that’s going to be much simpler, isn’t it? You’d think....

The plan for my PhD was to look at inequalities in cancer care by linking cancer registry and Hospital Episode Statistics data for lung cancer, and also linking to audit data. This is routine data that has already been collected and so I naively assumed it was just a case of getting ethical approval to access the data, finding someone to cobble the data together and off we go. I wrote an optimistic project timetable where I would get my hands on the data about seven months into the PhD. Eighteen months in I’ve finally got hold of some unlinked data and I’m still waiting for the rest.

I don't work for News International, so what's the problem? Photo: Christian Sinibaldi
So, what went wrong?

I think my first mistake was assuming that just because the data was there it would be easy to get hold of. There are a lot of hoops you have to jump through first.

I thought that what I wanted to do was simple but it turns out that it’s not. This is apparently the most complicated linkage that the cancer registry has undertaken and the bottom line was, nobody wanted to do it. I spend a lot of time begging people to speak to me and basically being fobbed off, in the nicest possible way. Luckily I eventually found a newly-joined analyst who was willing to give it a go. I’m not sure that she’s thanking me now...

Issues then arose of whether the data I wanted might be identifiable. Variables such as date of birth and death are classed as identifiable and individual records are ‘potentially-identifiable’, even if they don’t include identifiable information (which is an excellent catch 22 – they are identifiable even though they are not identifiable...)!

Finally it seemed like it was all coming together. I’d agreed with the registry that they would supply me with anonymised data containing ages rather than dates, I’d made it through ethics, and I’d got some data. But, on checking, not exactly the data I wanted. So, currently I am discussing with the registry how it will be possible for me to calculate survival time if they won’t allow me to have data on the number of days from diagnosis to death. Survival from lung cancer is short and rounding to the nearest year isn’t going to identify survival differences with any degree of accuracy.

The sticking point is that although they are not supplying me with date of death I could theoretically work it out from this information and that makes the data (aaagh!) ‘identifiable’. However, as I don’t have an NHS number, date of birth, or place of death, I don’t know how I would identify anyone from the 140,000 records I have. Plus I’m a researcher, not a News International journalist, and I’m not interested in anyone individually, so I’m not going to attempt to do this.

Can’t I just sign something to that effect and have the data I need please?

7 comments:

  1. This is a great article and sounds very familiar.

    It appears increasingly difficult to work with any data as a researcher. I think you make a really good point about the fact that we are not journalists; we are academic researchers with supposed integrity.

    My PhD work looks at tobacco and cannabis use in young people and the data I am using (both routine datasets and my own data collection) carries the risk of the identification of schools or small geographical areas where consumption of these substances is high. I have had to spend a vast amount of time going to great lengths to make sure my participants are not identified in any reporting and agreeing not to do anything unscrupulous. I am happy to do this because it means I have a strong research design and importantly I don’t want my research to have any negative consequences for my participants.

    Ethics departments duly concern themselves with the possibility of any 'leakage' and in my case the identification of schools where cannabis and tobacco use is high would have severe consequences.

    Fortunately my research area is in health behaviours rather than health outcomes so I have no need to utilise any hospital or other NHS data so I don’t profess to have the same difficulties in accessing data as yourself but I can certainly identify with the feeling of being treated like a whistle blowing journalist when I’m trying to conduct ethical research which aims to benefit people’s lives.

    ReplyDelete
    Replies
    1. Thanks for your comments. I agree that it's important that, as researchers, we don't inadvertently identify anybody but, as you say, I'd like to think I have some academic integrity and would never do this. I'm hoping to work out a way of getting non-exact dates of death that may work around my issue. However, I do feel the chances of identifying anyone within my dataset are incredibly small and it all seems a bit unnecessary! However, I guess this is all part of the learning experience of the PhD...

      Delete
  2. I can relate. More than I wish that I could. Wishing you lots of luck!

    ReplyDelete
  3. Did you consider seeking Section 251 / NIGB exemption, this allows you to handle identifiable information without the requirement for consent?

    ReplyDelete
    Replies
    1. Yes, I did look at applying for NIGB approval but after discussion with the Registry we thought the data were suitably anonymised and that this wasn't necessary. With hindsight though, maybe it would have been better to have done this (and I may yet have to do this).
      I guess I hadn't realised how strictly the whole 'potentially identifiable' rule would be enforced and thought that as I didn't have the actual date of death it would be ok. I don't know if a more experienced researcher would have done things differently?

      Delete
  4. Its really commendable for the effort that you are putting in towards the problem. I feel thata manual calculative approach would certainly bring out the raw data put processing it would be the real pain. I can understand what you must be going through as my friends have faced the same terror during their research. Good Luck!!!

    ReplyDelete