groups » Data Management and Statistics » Data Management Scenario 1 of more to come.....

A data management scenario:

- Participants in a study are assigned a unique patricipant identifer (UPD). [This is just a number and I just "invented" the acronym :) ]
- You have participants with UPD 1001 - 1100 enrolled and all have come for up to visit 4 of the study
- The visit 5 UPD for 1096 was "accidentally" recorded on the CRF / Questionnaire as 1069 and has now been submitted to the data centre for processing.

Data for participant with UPD 1096 is going to be recorded under UPD 1069! This is BAD.....!

As data managers, what might we do / advice to minimise the posibility of this happening?



  • mbwinga mbwinga 4 Jan 2011

    Dear Clemens. This is an interesting problem...nice acronyn, I think it might stick!

    I am sure this situation must happen and never be picked up. So what to do? Have a secondary system to check? Put it on the monitors check-list, this is a error that should be spotted on the CRF by the monitor?

    Perhaps it needs to be more water-tight? How about a number and letter scheme.... other ideas please?

  • jagawal jagawal 28 Jan 2011

    Greetings all. I second the suggestion of a added level of review. Would some form of built-in check be possible? I am not very familiar with this area?

  • Naomi Naomi Waithira 28 Jan 2011

    Hi all,

    It is a nightmare for any data manager to realize participants' data has been mixed at data entry/transcription. I agree with what my colleagues have suggested and building on that I would suggest

    1. Having a letter + numbering scheme and building in the database a validation check that requires one to enter the complete subject id (for example UPD1096Y, UPD1069M) while entering the data to the database. In that case it is quite unlikely that you would enter data for 1096 to 1069 as the letter would be different for both(hopefully!).

    2.Using pre-printed stickers with subject id. When a participant comes for a visit, the pre-printed sticker is put on the CRF page as the data is being transcribed. If a sticker is placed on the wrong CRF,at data entry the data will fail the validation check as in 1 above.

    3.In your database you can build in additional checks such as, date of visit must be X days after randomisation, weight at visit 5 should not be significantly different from weight at visit 4 etc. You can then be able to pick out outliers and possibly detect the error.

    Hope that's useful.


  • cpmasesa Clemens Masesa 3 Feb 2011

    Hi All!

    Thanks for taking the time to "solve" the data management scenario.

    Yes, a letter + number scheme (check character) would be very useful in ensuring that UPD1096 is not entered as UPD1069 since the check character will / should be different.

    When I started working in data management I "re-invented the wheel" so to speak as I used Excel VBA to create my own little alogrithm / function to generate these check characters. It took me a while, 5 hours, to perfect as I had to make sure that the check character for 1096 is not the same as for 1069 (single digit/charcter transposition) and so forth and so on.

    I used this function of mine in three different studies and for generating lab IDs (thats another story all together....) and was very proud of myself for having developed such "advanced error catching solutions" LOL!

    Then I came to learn that there are functions out there to generate these check characters - very brainy guys had already "been there" before me and they came up with these super sophisticated alogrithms. So I now use a "modified" version of the mod-10 check digit - a want check characters (A - Z, excluding I, O) instead of check digits (0-9). Have a look at and / or search for "check digit" on the web and you should find lots of info.

    The same alogrithm used to generate the check characters will be used in the database to recalculate the check digit and compare it with what has been entered. Knowing how to code the alogrithm in the database you use will is essential in order to have this "feature". (Do data managers have to have programming skills.....?!)

    The use of pre-printed stickers helps a lot in minimizing transcription errors. BUT it is possible for sticker for UPD1096Y to be "accidentally" put on the CRF / source documents for UPD1069H! We should call this trans-sticking :)

    This is very difficult to catch as the fancy check-digit thingie will say "OK, valid check digit"! because it actually is valid!

    I suppose there are various procedural approaches that can be taken to minimise this - check the sticker with participants ID card, pre-pack blank source documents with all the stickers needed for a visit in preparation for the visit (scheduled), check the data for gaps (came for visit 1, 2,3,and 5 - missed 4?) and duplicates (has visit 1, 2, 3, 4, 4 and 5 - yap! visit 4 twice) as frequently as possible, etc

    I would love to hear what you guys do in terms of procedures / workflows to minimise the possibility of such a mix up happening or catching errors sooner rather than later.

    In the meantime stay tuned for another data management scenario..... (You should also post yours!)


  • pdiaz Dr P Diaz 13 Jun 2011

    I have just been reading about your approaches and case studies in data management. These are good indeed. We are needing a data management system so we shall try Openclinica. Please send more of these example situations they are good practice for our data management teams - thank you all for your help.

  • Hi Clemens, sorry I've only just seen this discussion. I come from an IT background and here is what we would normally do:
    1) Definitely use a check digit. But I wonder why you chose to use a character and not another digit. Using a digit (0-9) will make it easier for the data capture clerk as they will probably be using the numeric keypad. Also, many systems expect a number as the identifier - what will happen if your lab wants to store the ID in their own database and they can only accept numbers? I strongly recommend you change to use a digit.
    2) To catch the 'trans-sticking' error, you should capture some other attribute relating to the participant. The date of birth would be good and must be recorded on the form, not printed on the sticker. Then it will be easy to check. I understand that GCP requires you to identify the participant, so you can easily ask their birth date.

    Hope this helps

  • clarissam Clarissa Moreira 22 Aug 2011

    Hi Clemens,
    Very interesting discussion here. It seems I missed the chance to help solve this data management scenario- great suggestions my everyone.
    I look forward to the next one and might even post my own soon.

  • GHN_Editors The Editorial Team 20 Nov 2012

    Hi Clemens. Can we have another of your excellent data management scenario's? Meanwhile have you seen this course in the e-learning centre?

Please Sign in (or Register) to view further.