The main challenge I am tackling at work at the moment is trying to determine accurate citation counts for publications by our senior academic staff.

The first hurdle, identifying their publications, is fairly well-covered by the data I have gathered over the past couple of years. Since, for the particular exercise at hand, I have to go back to 2002, I am sure I have some gaps but I am reasonably confident in my records. The difficult parts are, first, getting lists of which subsequent scientific papers refer back to each article and, second, being able to discount self-citations (if any of the authors of the original paper is involved in a later one that cites it, the reference is counted as a self-citation).

I have been brushing up on bibliometrics and am now relatively well-informed in the field (certainly enough to recognise that there is debate about whether citations are a good measure of scientific impact, whether self-citations are really a significant problem and whether this simple binary inclusion / exclusion is a good model to follow) but have not come across any easy solutions. I know where to find the information (Thomson Reuters’ Web of Science seems the best source) but, although I have uncovered some ways to ease the process I cannot find a route at which I can extract it automatically.

Still, it is a stimulating challenge, even if it looks like I have a few intense days being a manual conduit for the information flow in my near future.

