Subscribe: Activity Data
Added By: Feedage Forager Feedage Grade B rated
activity data  activity  data  draft guide  guide  information  jisc  library  open  project  student success  student  team  week 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Activity Data

Activity Data

Updated: 2017-03-13T20:10:49.497-07:00


Adieu from JISC AD


The launch of our final report website this week brings the work of the Activity Data synthesis team to a close and, therefore, this blog will now be mothballed and there are no further updates planned.

(image) You can browse the final report using the web interface (expertly designed for us by Dan Moat) or download a pdf of the final report.

There is still activity happening on the individual project blogs and via the twitter hashtag #jiscad which you can keep up to date with by creating a Five Filters news digest on the fly whenever you get the urge.

So for now it's a heartfelt adieu from me and the rest of the project synthesis team. < waves >

Notes and Photos from Library Camp 2011


Back in October I joined a gathering of library workers, geeks, advocates and enthusiasts in Birmingham for the Library Camp 2011 unconference.There were a few unusual things about the event from my point of view ...To start with, it was the first time I'd been at a library event with such a mixed crowd - Public libraries folks rubbing shoulders with folks from the academic libraries doesn't happen as frequently as it should do.Secondly, I have never seen so much cake.Thirdly, it was the first unconference event I've been too where everyone introduced themselves at the start of the day.Fourthly, I've never seen more sessions get proposed than the slots available.Fifthly, there was a poet-in-residence which is, again, a first for me (though strangely I've been at another event since then which had a poet-in-residence too).The idea of 150ish people introducing themselves one by one at the start of an event might seem like lunacy but I have to say I found it very moving and uplifting to hear everyone's reasons for being in the same place.Here are some of the reasons I managed to scribble down as folks introduced themselves:"... to capture the libgeist.""I'm here to start the revolution.""... lured by cake and curiosity.""I'm looking for library lovers.""... critique, collaboration and revolution.""... to steal people's enthusiasm, passion and, hopefully, anger.""Gratuitous hugging.""Libation in the library.""To show the rage and passion for libraries." Dave Pattern kindly agreed to be my wing man and ran a session on Activity Data and Recommender Services with me. Dave shared the good work he’s been involved with at Huddersfield University and I talked a bit about this programme and also shared some of the great (open source) applications that have come out of the JISC MOSAIC and Discovery developer competitions. Hopefully my interpretative dance representation of Alex Parker's Book Galaxy serendipitous search interface persuaded a few of those present to take a look at whether they can exploit any of the applications that are sitting there waiting to be plundered for a good cause. Part of the discussions we had in our session were around the challenge for libraries who don’t have developers on their staff to take advantage of opportunities like those. One of the solutions we discussed for that problem was to check who else is using the same library systems that your institution is using and looking for opportunities to form alliances around shared development goals.All in all it was an invigorating day full of positive conversations and rapidly shared ideas. My only regrets are that a) I couldn't stay on into the evening to continue with the conversations and b) I didn't have a large tupperware box with me to take some of the cake home with me :)[...]

A round-up of recent JISC Activity Data activity


The synthesis team have been busy honing the content of our final deliverable for this programme. Namely a one stop shop which gathers together the projects' collected wisdom on identifying, collecting, managing and sharing activity data within UK HE. The amount of content we have to share means we have a challenge to present it in an intuitive way with easily navigable routes into (and out of) the information for end users but we're hopeful that it's achievable and you'll be able to judge the fruits of our labour next month when we'll be launching it.Earlier this month we ran a pre-conference workshop at ALT-C which (talking of navigation issues) seemed to go very well once the initial challenge of finding the room itself was conquered. The session was entitled 'Improving processes by using activity data' and featured the following sessions:- Introduction to activity data [presented by Tom Franklin]- Challenges raised by activity data [presented by Mark van Harmelen]- [Case Study] Leeds Met STARTrak: NG - using activity data in support of student success [presented by Rob Moores]- Discussion of potential use case [facilitated by David Kay]- [workshop session] Building a business case [facilitated by Tom Franklin and myself]- [workshop session] Working with activity data - technical discussion of value and challenges [facilitated by Mark van Harmelen and David Kay]I've combined (i.e. wrestled) all the slides from the workshop into one presentation and uploaded it as a pdf which you can view below: Open publication - Free publishing - More altc2011The Tabbloid digest for that week captures the tweets from the workshop: Open publication - Free publishing - More jiscHere are some of my twitter highlights from the day:Hidden amongst all the workshop tweets is a gem from the AGtivity project: a blogpost sharing what they've worked out about handling timestamps between UNIX, GNUplot and Excel during the course of their project. Another gem came from the direction of AGtivity in the shape of Martin Turner bringing CritterVRE to my attention. Martin used it to capture the twitter activity from the day and it looks like a useful tool to add into my event amplification / capture toolbox. The service was developed to use alongside Access Grid sessions but it looks useful for other purposes too (if that's allowed, I haven't had a go yet).September was a busy month on the Exposing VLE Activity Data project blog as the end of their project extension period:- A guide to their Perl analysis tool.-  A guide to using Gephi to visualise a bipartite network of users and websites [including a discussion of their approach and a technical recipe/guide] .- Analysis of their VLE event logs.- A discussion of releasing anonymised VLE event log data [including a link to the dataset they've released (4gb download)].The LIDP project have now released data as well as a Library Impact Data toolkit (both of which are published under open licences).[...]

Draft Guide: 'Legal issues relating to sharing data'


The problem If you want to share activity data with others then you have to make sure that you have the right to do so, that you share it in an appropriate way and that the terms under which you share it are appropriate. In order to share data you have to have the right to do so, In practice this means that you need to ensure that you have the right to do so because you have appropriate intellectual property rights (IPR) in the data. If the data subjects might be able to be identified (i.e. you are realising full data rather than statistical data) then the data subjects need to have been informed that sharing can happen when they agreed to the data being collected (and they had a real ability to opt out of this). Finally you will need to select an appropriate licence under which to release the data. The options Intellectual property rights (IPR) It is likely that you will own the data from any systems that you are running, though it may be necessary to check the licence conditions in case the supplier is laying any claim to the data. However, if the system is externally hosted then it is also possible that the host may lay some claim to the log-file data, and again you may need to check with them. JISC Legal has a section addressing copyright and intellectual property right law Data protection Data protection, which addresses what one may do with personal data, is covered by the 1988 Data Protection act, and there is much advice available including: JISC Legal has a section on data protectionEdinburgh University has produced a useful set of a definitions An alternative approach to addressing the needs of data protection is to anonymise the data. Licensing the data Any data automatically comes with copyright, and therefore you need to licence the data in order for other people to legitimately use the data. There are a wide variety of types of licence that you can use, though the most common is likely to be some form of creative commons licence. Guidance is available from a wide variety of places including: JISC OSS Watch has a section on IPR and licensing.The Licensing Open Data: A Practical Guide by Naomi Korn and Professor Charles Oppenheim The JISC sponsored IPR and licensing module that can be found at Within that you might be particularly interested in: - Introduction to licensing and IPR - Creative Commons license: [...]

Tabbloid: 31 August 2011


(object) (embed)

A couple of updates from the project blogs this week:
In the wider world I stumbled across a mention that 'big data' has now made it onto the Gartner Hype Cycle for the first time, which seems significant even if, like me, you find yourself wondering where the Gartner Hype Cycle itself falls on their chart.

Next week the synthesis team will be reunited when we head to Leeds to run one of the ALT-C pre-conference workshop, 'Improving processes by using activity data', where we'll be joined by the geographically convenient Rob Moores who'll be sharing knowledge and experience from the Leeds Met STAR-Trak project with those who attend.

Tabbloid: 24 August 2011


Open publication - Free publishing - More jiscThis week there's an interesting post over on the EVAD project blog about the problem of finding the right 'data munging' tool and how they ended up developing their own custom perl script instead. They've publically released the perl script so it will be interesting to watch and see whether their custom built script suits the needs of another project or whether a new bespoke tool needs to be fashioned for every project going. The LIDP project have been presenting to, and in attendance at, the Performance Measurement in Libraries and Information Services conference which is a week-long event taking place at York University [#pm9york]. Word on the twittersphere is that the LIDP toolkit will be released next week so I'll probably be linking to that next week. The OpenURL Router Data project launched their article recommender prototype and it's just as well that I don't have an Athens log-in because I was quickly drawn in all sorts of intriguing looking material, including an article entitled 'Getting a Grip on Strangles'. Out in the wider world there have been relevant links flying into my twitterstream from unexpected quarters which suggests to me that either a tipping point is coming our way in terms of a wider awareness of activity data, or I'm am getting more creative in my interpretation of what is relevant to the programme. In any case here are a few highlights that I've picked out of this week's Tabbloid: this visualisation tool for the Department of Health's public health dataset is impressive but (to mine eyes) not altogether intuitive or open.there have been a couple of interesting reads in 'the media': - a mildly doom-ridden article on the potential omnipotence of algorithms on the BBC website. - a similarly toned piece on the Guardian website about digital serendipity, or the impending lack thereof [they get bonus points for talking about *the filter bubble* without mentioning it by name].Lorcan Dempsey picked up on a job advert for a 'bibliometrician' at the University of Leicester which struck me as interesting until I realised that bibliometrician doesn't quite mean what I think it does (i.e. it's more about content of research than activity data) ... but it still seems reasonably pertinent if you employ the same 'magic eye' technique I use to look at the world.On a related note, Graham Stone picked up on a conversation at #pm9york about whether every library will need a 'Data Jockey' ... if that cool job title becomes widely used then I can see a whole new generation of young people getting unknowingly lured into a career as a shambrarian :) [...]

Five Filters news digest: 10 August 2011


A Tabbloid did in fact wend its way into my inbox this morning but it was a little bereft of life so I've turned to the trusty Five Filters website to create this week's blog digest. As before, you can generate a digest on the fly but I'll also be sending it out via email.

Just a couple of project updates this week:
  • the UCIAD project published their final project blogpost, including a video which gives a demo of the UCIAD platform, with an accompanying written commentary nestled below the video [and I can confirm that it's in with a good chance of winning both the 'techiest video I've watched' and 'longest video without a soundtrack' awards in my imaginary video award ceremony at the end of the year]. It's a shame we haven't got any more online exchanges planned because it would have been a good opportunity to get Mathieu to talk through the demo. I'll be interested to hear the results of the user feedback that the project plans to gather as part of their post-JISC project activity.
News from the twittersphere:
News from the synthesis team is that we've finalised the programme for the pre-conference ALT-C ['Improving processes by using activity data'] workshop which we're running on 5 September in Leeds. The workshop is free, includes lunch, and you don't need to be going to ALT-C in order to attend.

Tabbloid: 3 August 2011


Open publication - Free publishingSome more final blogposts have emerged this week: AEIOU projectAGtivity project who are inviting any institutions who run an Access Grid node in the UK and who are interested in having their node's data analysed to get in touch.SALT project who have been refining their thinking on how they can evaluate the value of the long tail as a result of their user evaluations and have also written a blogpost on how they've significantly improved the efficiency of their recommender API by refining how they process their data.Some other project blogposts worth visiting if you're interested in the more technical side of what they've achieved: UCIAD have shared their technical architecture and their thoughts on some of the tools they've been using. They've also been blogging about their project's wins and fails, anonymisation and licencing of software and data, and the benefits of UCIAD to end users and the institution. Interesting stuff, particularly the link into consumer exploitation of their data to monitor and improve their performance. OU RISE shared their updated technical approach which is also worth reading for the discussion in the comments re: anonymisation (albeit from earlier in the year).A couple of other interesting reads I saw flying at high velocity around the twittersphere today: An article on search and how university libraries are getting it wrong by putting blocks in the way of student curiousity.An article by Paul Miller on keeping the user experience in mind when using semantic technologies to join up the dots in our data. His point about allowing the user to feel like they are the one controlling the experience strikes me as key. [...]

Draft Guide: 'Dealing with Activity Data'


[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section] The problem: A project that aims to make use of activity data from sources such as those in the Identifying Activity Data draft Guide can’t avoid the fact that they will inevitably have to roll their collective sleeves up and get hands on with various data sources. It is likely that the data you hope to extract and manipulate will be either hard to reach, unwieldy, incompatible, incomplete, downright uncooperative or all of the above. This guide shares some helpful hints from the experiences of the JISC Activity Data projects and the wider world of library data hacking. The solution: Dealing with activity data relies on embracing a pioneering mindset, requiring equal measures of experimentation and hacking, together with a sixth sense of how far down one route you should go before accepting that a different solution is needed. Unfortunately there are no hard and fast rules you can follow but here are helpful principles and pointers that have come out of the JISC AD projects and beyond: Tony Hirst blogged about his tactics for dealing with large CSV files after in the course of playing with the OpenURL Router project.Make the most of existing tools and resources to avoid reinventing the wheel. The AEIOU project was able to make use of resources and expertise from the PIRUS2 project.Sometimes data extraction will go much smoother than you dared imagine – cherish these moments and share any triumphs with the wider world. Taking it further: If you are releasing open data with the hope that people outside of the project and the institution will do something with that data, it’s worth taking steps to remove any unnecessary barriers. Many of those barriers will be the same things that made it a challenge for you to deal with the data in the first place: create small sample files that enable potential end-users to get a feel for the scope and structure of the data you’re sharing.use lowest common denominator/widely accepted formats e.g. CSVpublish the scripts you yourself used to manipulate the data. If you adapted someone else’s script/code then share what you’ve done with them to create a virtuous cycle of iterative improvements. Additional resources: Tony Hirst’s Online Exchange presentation covers some of the issues mentioned in the section above: . Tony’s blog is also a robust source of further information: This twinset of AEIOU project blogposts were the initial inspiration for this guide: Hunting and Gathering dataConsuming and Querying data The EVAD project is handling a vast dataset and have blogged about the data and also published a Guide to Using Pivot Tables in Open Office. They’ve also shared their thoughts around taking a user-centric approach to their data: The OU RISE project documented their thoughts about how they could most usefully format the recommender data they plan to release: [...]

Tabbloid: 27 July 2011


Open publication - Free publishingIn true tabloid vernacular I think it's fair to say that this week's Tabbloid is 'a whopper' and I can tell that the OU RISE project team are certainly back from their holidays.Some of the projects have posted their official 'final blogpost' {wipes tear from corner of eye} but I have a feeling that we will continue to see further blogposts from them in the weeks to come. Here are the project's final blogposts, no doubt there will be another flurry of them before the week is out:Leeds Met STAR-Trak Open University RISE [I particularly like the way they've sneaked in an extra lesson learnt by prefacing it with "If we were allowed a Lesson 4".]Huddersfield LIDPOther newsworthy news (ahem) this week:The OU RISE team released their code for all to plunder as they see fit - I'm guessing there will be no way of tracking who makes use of it (is there?) but as time goes by it will be interesting to see whether anyone takes advantage of the work done by the RISE team in developing their own applications (I really wanted to use the phrase 'long tail' there but managed to stop myself).The AGtivity team published a few gems this week:Recipe for 'Plotting Calendar Data with GNUPlot'Case Study 2: Testing, Testing, Testing [analysing their data to identify and evaluate the effectiveness of their QA service]Case Study 3: To Book or Not To BookCase Study 4: CO2, Loads Of It [about quantifying the CO2 savings from holding meetings using their video conference facilities]I know I use the phrase 'thought provoking' a lot in these synthesis posts but that is the perfect descriptor for Leeds Met STAR-Trak's post on the domain knowledge chasm that they discovered in the course of running feedback workshops with students and staff.Both the OU RISE and the SALT projects have been thinking deep thoughts about licensing this week (which is handy for me as I'm just finalising the draft guide on that very topic).And finally, some other links of interest regarding activity data within academia and without the wider world:Tony Hirst recommended a book on Tesco's early use of customer dataBen Showers tweeted a link to the Ariadne article 'Looking for the Link between Library Usage and Attainment' authored by none other than Graham Stone et al from Huddersfield (published earlier this month).Google's public data visualiser thingymajig[...]

Online Exchange #4: Event Recording [21 July 2011]


The fourth, and most likely final, Online Exchange took place last week and the topic this time was data visualisation (or 'visualization' depending on which side of the pond you reside).

The session was an opportunity for the JISC AD projects to share information about the data that they're wrangling as part of their project and their thoughts on/experience of the challenge of presenting that data visually. The main attraction though was a presentation from Tony Hirst who gave a very useful (or should I say 'OUseful' {nice pun Helen!}) overview of the tools and techniques you can use to create data visualisations.

You can playback the whole session by following the link below. [Note that you'll need to run the Java application that launches in order to watch it] The playback is slightly crackly on my machine but hopefully it won't detract from your listening pleasure:
You can see Tony's accompanying slides below and the good news is that he hopes to build and openly release a data viz 'uncourse' along the same lines later this year:
Jiscad viz frameborder="0" height="355" marginheight="0" marginwidth="0" scrolling="no" src="" width="425">
View more presentations from Tony Hirst

Tony's tour of the various data visualisation tools was great and brought the tools to life in a very engaging way with lots of examples showing how Tony's used them with real data. Personally speaking, the really interesting part for me was listening to Tony talk about the purpose and process of data visualisation. Tony is the first to admit that he is not a statistician and when he describes the process of using visualisation tools as 'having a conversation with your data' and 'exposing the hidden shapes, stories and messages within the data' it strikes me that working with data in this way requires an artistic / poetic / craftsperson mind-set as much as it does an analytic skill-set. I'll be mining Tony's talk to improve the data visualisation Draft Guide we've written but please do add your thoughts and tips below.

Tabbloid: 20 July 2011


Open publication - Free publishingIt's been a fairly busy week on the project blogs and no doubt will continue in that manner over the next few weeks as the projects publish their final blogposts.The AGtivity team in particular have been busy and are producing some interesting stuff, including a couple of hot off the press posts that aren't included in this week's Tabbloid:Ahead of tomorrow's Online Exchange on the subject of Data Visualisation there's a timely post on the different ways that activity data can be visualised and the challenge that presents when choosing which visualisation to show the end user.A breakdown of the numbers of data items the project has processed.A first pass at writing up the project's Wins and Fails - no doubt the various data headaches they've had to deal with will chime strongly with a fair few of the other projects.The 'Tale of Two Rooms' case study the team have compiled gives a good insight into the stories that the AGtivity data can tell - it also demonstrates how important contextual information is for making sensible interpretations of the data.The LIDP project have been delving further into the data behind *that* graph (you'll recognise it when you see it) and have come up with the interesting conclusion that the differentiating behaviour is replicated year by year. It's got me wondering about what type and scale of intervention would be needed to buck the trend. I'm also wondering whether the students with higher outcomes might also be going to the library earlier in each term (and therefore having a wider choice of books) than their course mates.These sorts of wonderings are some of the things that the LIDP team have been discussing while they've been out on the road sharing the project outcomes so far.On their blog there's also a (slightly stolen) guest post from one of the LIDP project partners - Paul Stainthorp looks back at what they had to do to get at their data, how they wrangled it into one giant .csv file and how they discovered one of their datasets was missing.On Twitter, Amber Thomas shared a link to an interesting article about how some of the for-profit universities in the US, such as Kaplan, APUS and Phoenix are surprisingly open to the idea of sharing data on student success with their not-for-profit competitors.[...]

Online Exchange #3: Event Recording [13 July 2011]


Last week we held the third of our Online Exchange sessions. This time we opted for Elluminate as our conferencing weapon of choice and it served us well.

You can playback the whole session by following the link below. Note that you'll need to run the Java application that launches in order to watch it:

Ross MacIntyre introduced us to the Journal Usage Statistics Portal (JUSP) service and gave a live demo of the JUSP portal itself.

Nicole Harris talked about Cardiff University's Raptor (JISC-funded) project and their recently launched software. Nicole's slides are below.

(object) (embed)

Tabbloid: 6 and 14 July 2011


It's a double header blog round up as I look back over the past two weeks of blog and twitter activity within the Activity Data programme. Amazingly there's a Tabbloid for both weeks (wonders will never cease!). It's been busy couple of weeks for the synthesis team with multiple events in Milton Keynes, plus our third Online Exchange session - I'll talk more about those events in separate posts.6 July update:Open publication - Free publishingThe UCIAD project are continuing to do some deep thinking about user-centric activity data and have drawn up some concept diagrams which show (I think) that the organisational-centric activity data is simply an aggregation of user-centric data. Which means that an organisational-centric approach shouldn't preclude the potential that exists for releasing activity data to individual users too. It's got me thinking about what would happen if users were fed metrics about their usage such as 87% of the books/resources you've borrowed are off the reading list; 24% of your returns have been x days late etc - would it feed into a sense of self-responsibility or have a negative impact on under-achieving students. Would students welcome the additional data?14 July update:Open publication - Free publishingThis week's issue might more accurately be called the OU RISE Weekly, since nearly all of the content comes from their blog:Reflecting on the analytics data they have available and whether it can be used to measure how successful the OU RISE project has been.Pondering the pros and cons of making recommendations based on EZProxy data.A step-by-step guide to getting recommendations from an EZProxy logfile.An exploration of Expected benefits vs Achieved benefits of the RISE project.In addition to these posts on the OU RISE project blog, Richard Nurse was also pondering activity data and open metadata over on his personal blog.Some other items of (leftfield) interest that I've stumbled across in the last couple of weeks:Google launched their dedicated search blog a couple months ago. There's already some interesting discussions around and also the importance of authorship as a signifier of quality.Brian Kelly's blog has had some interesting posts, particularly around gathering quantitative evidence and the news that the government will be forcing universities to publish more data.If you are in a particularly philosophical mood and have half an hour to spare then Brian Holmes' dystopian essay on cybernetics, analytics and tools of liberal control is worth a gander. [warning: even more leftfield than usual][...]

Draft Guide: 'Anonymising data'


The problem:
Data protection requirements mean that we cannot release personal data to other people without the data subjects' permission. Much of the activity data that is collected and used contains information which can identify the person responsible for its creation. It may contain their username, the IP address from which they were working or other information including patterns of behaviour that can identify them.

Therefore where information is to be released either as open data for anyone to consideration needs to be given to anonymising the data. This may also be required for sharing data with partners in a closed manner depending on the reasons for sharing and the nature of the data together with any consent provided by the user.

The options:
Two main options exist if you want to share data.

The first is to only share statistical data. As the Information commissioner recently wrote:
"Some data sharing doesn’t involve personal data, for example where only statistics that cannot identify anyone are being shared. Neither the Data Protection Act (DPA), nor this code of practice, apply to that type of sharing."

The second is to anonymise the personal data so that it cannot be traced back to an individual. This can take a number of forms. For instance, some log files store user names while other log files may store IP addresses, where a user uses a fixed IP address these could be traced back to them. anonymising the user name or IP address through some algorithm would prevent this. A further problem may arise where rare data might be able to be used to identify an individual. For instance a pattern of accessing some rare books could be identified to someone with a particular research interest.

Taking it further:
If you want to take it further then you will need to consider the following as a starting point:
  • Does the data you are considering releasing contain any personal information?
  • Are the people that you are sharing the data with already covered by the purpose the data was collected for (eg a student’s tutor)?
  • Is the personal information directly held in the data (user name, IP address)?
  • Does the data enable one to deduce who used that data (only x could have borrowed those two rare books – so what else have they borrowed)?
Additional resources:

Draft Guide: 'Developing a Business Case'


[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]The problem:Getting senior management buy in for projects which make use of activity data to enhance the user experience or management of facilities is key if projects are to get the go ahead in the first place and become a sustainable service in the long term. There is a lack of persuasive business cases to refer to in the public realm. This guide gives some high level advice for the effective development of a solid business case.In the current programme, activity data is being used to enhance the learner experience through recommending additional material, effectively manage resources and increase student success by helping them improve their online practices. Each of these is a powerful strategic benefit.The solution:The most important thing to remember when developing a business case is that its purpose is to persuade someone to release resources (primarily money or staff time) for the proposed activity. The person who will have to make the decision has a wide variety of competing requests and demands on the available resources, so that what they need to know is how the proposed project will benefit them.The answer to this question should be that it helps them move towards their strategic goals. So the first thing that you need to find out is what their strategic goals are. Typically these are likely to include delivering cost savings, improving the student experience or making finite resources go further. You should then select one (or at most two) of these goals and explain how the project will help to meet this goal (or goals). Aligning the project to many goals has the danger of diluting each of them and having less impact than a strong case for a single goal.Structure of a business case:- Title- Intended audience- Brief description- Alternative options- Return on investment- Costs- Project plan- Risks- RecommendationDo not 'over egg the pudding' in terms of understating the costs and risks or overstating the benefits. If the costs or benefits are not credible then the business case may be rejected as it appears to be not offering realistic alternatives.The benefits should be realistic and quantifiable and, wherever possible, the benefits should be quantified in monetary terms. This allows the decision maker to compare the benefits and costs (which can usually be expressed in monetary terms), and so clearly see the return on investment, and compare this business case with other calls on their funding and staff.Taking it further:If the sector is to build a higher level picture of the business cases for exploiting activity data and also for pursuing the path towards open data then it is important to share knowledge of what works in terms of convincing key decision makers to give sustained support to using activity data.The programme has produced some example business cases which can be used to understand the type of information that it is sensible to include, and which may form the basis for your business case. However, the business case must relate to the local circumstances in which you are writing it, and the audience for which you are writing it.Additional resources:Guidance and templates[...]

JISC online consultation


JISC is currently undertaking a consultation exercise and wrote:As part of our institutional engagement work, the JISC Organisation and User Technologies Team is carrying out an online consultation (using moodle) to identify emerging issues and concerns in UK Higher Education that we may, in the future, be looking to develop programmes of activity around. There are five top level areas each with a discussion forum attached, please feel free to either post a new concern or issue or respond to someone else’s post.The site is at it’s a moodle site so there is a quick and simple 2 part registration before you postAnything you can contribute will be helpful in shaping our future plansI have added the following post on analytics - you may wish to comment or add others...One of the key factors for both students and universities will be student success; though at times they may have different definitions of what this means. There are two key areas here; retention and outcome (loosely result but also that the student has achieved what they set out to do). Retention is already good by international standards, but this does not give grounds for complacency, and there is much that can (and is) being done to improve it. It is arguable that student success is also one of the factors in the student experience. In this posting I want to look at one tool that can be used to enhance student success, where JISC is already doing some work, but much more could be done and would have a very positive return on investment for institutions. This is data analytics to support student success. Universities and colleges are already collecting vast amounts of data about their students, but making very little use of it. Every time a student logs on to the VLE, undertakes a search of the library resources, accesses an e-journal, swipes their card through the library turnstile or lecture theatre the event is recorded in logs on servers at the university. Most of the time this information simply sits there gathering electronic dust until it is archived or deleted. However, there is much valuable information that could be used to help students to help themselves. For example there are patterns of behaviour which may give early indications that a student is at risk of dropping out (non- attendance, declining use of VLE perhaps) where early intervention to support students may help them to achieve the results that they wanted to. Similarly there are patterns of behaviour which may indicate that students are studying as effectively as they might, again where early intervention could be of great assistance to the student. There are a number of areas where intervention at the national level would be of great value to the sector. These include: Understanding the information that universities have available to them Identifying patterns associated with success and failure. Note that these are likely to be discipline dependent. Some disciplines make much more use the library than others. They are also likely to be institution dependent as, for instance some universities make much more use of VLEs than others,Developing algorithms to identify students at risk or with sub-optimal study patternsResearching methods of intervention that actually support students to succeed. There is evidence that some approaches may be counter-productive These methods can form part of the way in which to enhance student learning and success, and where national support will enable all universities and colleges to achieve more than they could by developing the tools an[...]

Tabbloid: 22 June 2011


Open publication - Free publishingAs you'll see from the Tabbloid digest, the synthesis team have had a busy week here on the blog. We've shared the first draft of the recommendations we've submitted to JISC. We've also published the following draft Guides:Strategies for collecting and storing activity dataIdentifying activity data in the library serviceEnabling student successBringing activity data to life [data visualisation]Your comments on the draft guides and recommendations are very much welcomed between now and the end of August when we will be submitting final versions of them to JISC.The projects have been busy too:the SALT project has been demoing their prototype web API serviceLIDP announced that they have now received data from all the project partners and the team at Huddersfield were inspired by DMU's blogpost on the data they submitted to reflect on their own data and to ponder what data they could include in the future. DMU also blogged about the focus groups they held just before Easter and noted that the £10 print credit appears to have been an effective incentive for recruiting volunteers.UCAID have also been pondering data and how ontology technologies will support their user-centric approach to activity data [side note: the mention of "traces of activities around a user" has the artistic side of me wondering whether there might be an opportunity for some rather beautiful data visualisations]A retweet by Dave Pattern about Derek Rodriguez's article on 'Understanding library impacts on student learning' led me off on a small trail of oblique serendipity:The Association of College and Research Libraries are considering how they follow up their Value of Academic Libraries study. [side note: at the end of last year ACRL launched a paid subscription service for online access to their academic library statistics called ACRL Metrics]ACRL's blogpost about 'social hacking of the library' is a reminder of the anecdotal stories of usage and abusage that lie [somewhat buried] beneath the surface of activity data.That got me thinking about ethnography and led me to the Ethnographic Research in Illinois Academic Libraries (ERIAL) project and their Ethnographic Research in Academic Libraries ToolkitAnd, in other news, a couple of things relating to anonymisation hit our radar this week'Dispelling the Myths Surrounding De-identification: Anonymization Remains a Strong Tool for Protecting Privacy' - report published by the Information and Privacy Commissioner of OntarioOn Panopticon's information law blog there's a useful discussion of legal cases which are currently adding to our understanding of what constitutes personal data when dealing with anonymised data.[...]

Activity Data Synthesis Project: Recommendations


The following is the recommendations that we have submitted to JISC. Your comments would be most welcome by both JISC and us.. Introduction This is an informal report outlining the likely recommendations from the Activity Data projects to help JISC to determine future work in the area. This is not intended as a public document, rather to stimulate discussion and lead to a more formal document at a later stage. There are two things to note at this stage. Activity data can serve a wide variety of different functions as exemplified by the range of projects in this programme. However the greatest impact (and return on investment) will be from supporting student success. We suggest that the next call explicitly funds other universities to pick up of the techniques and / or software systems that have been developed in this programme in order to see if they are useful beyond the initial institution, and in this process, discover what the issues may be to make effective use of the techniques and / or systems. However, this may not be in accordance with JISC’s standard practice and is not an essential part of the recommendations. The recommendations appear under the following topic areas: Student success Student and researcher experience Collection management. Student success “It is a truth universally acknowledged that”[1] early identification of students at risk and timely intervention must[2] lead to greater success. It is believed that some of the patterns of behaviour that can be identified through activity data will indicate students who are at risk and could be supported by early intervention. It has also been demonstrated in work in the US that it can help students in the middle to improve their grades[3]. Recommendations: In year 2, JISC should fund research into what is needed to build effective student success dashboards Work is needed at least in the following areas: Determination of the most useful sources of data that can underpin the analytics Identification of effective and sub-optimal study patterns that can be found from the above data. Design and development of appropriate algorithms to extract this data. We advise that this should include statisticians with experience in relevant areas such as recommender systems. Watching what others are doing including in the areas of learning analytics, including VLE developer activity developments. At this stage it is not clear what the most appropriate solutions are likely to be; therefore, it is recommended that this is an area where we need to “let a thousand flowers bloom”. However, it also means that it is essential that projects collaborate in order to ensure that projects, and the wider community, learn any lessons. In year 2 or 3, JISC should pilot some of the systems developed under the current programme: Student and researcher experience This area is primarily concerned with using recommender systems to help students and (junior) researchers locate useful material that they might not otherwise find, or would find much harder to discover. Recommendations It is recommended that in year 2, JISC fund additional work in the area of recommender systems for resource discovery. In particular work is needed in the following areas: Investigation of the issues and tradeoffs inherent in developing institutional versus shared services recommender systems. For instance there are likely to[...]

Online Exchange #2: Event Recording [2 June 2011]


On the 2nd June we held the second of our Activity Data Virtual Meetings using Webex online conferencing tool. The hour-long session can be downloaded or streamed using the following links:
We heard from Richard Nurse who talked us through the Open University RISE project and shared the progress they've made so far. [Richard's presentation starts at the 11min mark]

Rise presentation for jisc online mtg 2011 06-02 [Slideshare slides] frameborder="0" height="355" marginheight="0" marginwidth="0" scrolling="no" src="" width="425">

We also heard from Sheila Fraser who presented an overview of EDINA's Using OpenURL Activity Data project and touched on how the data might be used, as well as inviting participants to suggest ideas and discuss the issues around using other institutions' data. [Sheila's presentation starts at the 20min, 20secs mark]

Using OpenURL Activity Data - Activity Data Online Exchange Event [Slideshare slides] frameborder="0" height="355" marginheight="0" marginwidth="0" scrolling="no" src="" width="425">

We also had speakers lined up to share information and experience about the Journal Usage Stats Portal (JUSP), Metridoc and the RAPTOR project but unfortunately a technical glitch in Webex meant that we had to postpone their contributions to a future session.

[NB: you can view the in session chat box by selecting 'View' >> 'Chat' from the menu at the top of the Webex playback window]

Draft Guide: 'Bringing activity data to life'


[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]The problem:Activity and attention data is typically large scale and may combine data from a variety of sources (e.g. learning, library, access management) and events (turnstile entry, system login, search, refine, download, borrow, return, review, rate, etc). It needs methods to make it amenable to analysis.It is easy to think of visualisation simply as a tool to help our audiences (e.g. management) ‘see’ the messages (trends, correlations, etc) that we wish to highlight from our datasets. However experience with ‘big’ data indicates that visualisation and simulation tools are equally important for the expert, assisting in the formative steps of identifying patterns and trends to inform further investigation, analysis and ultimately the development of such as Performance Indicators.The options:Statisticians and scientists have a long history of using computer tools, which can be complex to drive. At the other extreme, spreadsheets such as Excel have popularised basic graphical display for relatively small data sets. However, a number of drivers (ranging from cloud processing capability to software version control) have led to a recent explosion of high quality visualization tools capable of working with a wide variety of data formats and therefore accessible to all skill levels (including the humble spreadsheet user).Taking it further:Youtube is a source of introductory videos for tools in this space, ranging from Microsoft Excel features to the cloud based processing from Google and IBM to tools such as Gephi, which originated in the world of version control. Here are some tools recommended by people like us:Excel Animated Chart - Excel Bubble Chart - Motion Chart - Many Eyes - Many Eyes at Desktop - also - - resources:To grasp the potential, watch Hans Rosling famously using Gapminder in his TED talk on third world myths - UK-based Tony Hirst (@pyschemedia) has posted examples of such tools in action – see his Youtube channel - Posts include Google Motion Chart using Formula 1 data, Gourse using Edina OpenURL data and a demo of IBM Many Eyes.A wide ranging introduction to hundreds of visualisation tools and methods is provided at [...]

Draft Guide: 'Enabling student success'


[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]The problem:Universities and colleges are focused on supporting students both generally and individually to ensure retention and to assure success. The associated challenges are exacerbated by large student numbers and as teaching and learning becomes more ‘virtualised’. Institutions are therefore looking for indicators that will assist in timely identification of such as ‘at risk’ learners so they can be proactively engaged with the appropriate academic and personal support services.The options:Whilst computer enabled systems may be part of the problem, they can certainly contribute significantly to the solution through identification of patterns of learning and associated activity that highlight ‘danger signs’ and sub-optimal practice and by the automation of ‘alarms’ (e.g. traffic light indicators, alerts) triggered by one or more indicators. This approach forms part of the field of ‘learning analytics’, which is increasingly popular in North America.Well-chosen indicators do not necessarily imply a cause and effect relationship, but they do provide a means to single out individuals using automatically collected activity data, typically combining a bundle of indicators (e.g. Students who do not visit the library in Term 1 may be at risk; students who also do not download content from the VLE are highly likely to be at risk).Taking it further:Institutions wishing to develop these capabilities may be assisted by this checklist:Consider how institutions have developed thinking and methods in such as the JISC Activity Data programme - see resources belowIdentify where log information about learning –related systems ‘events’ are already collected (e.g. Learning, library, turnstile and logon / authentication systems);Understand the standard guidance on privacy and data protection relating to the processing and storage of such dataEngage the right team, likely to include key academic and support managers as well as IT services; a statistician versed in analytics may also be of assistance as this is relatively large scale dataDecide whether to collect data relating to a known or suspected indicator (like the example above) or to analyse the data more broadly to identify whatever patterns existRun an bounded experiment to test a specific hypothesis Additional resources:Three projects in the JISC Activity Data programme investigated these opportunities at Cambridge, Huddersfield and Leeds Met universities.See Activity Data Guide on ‘Data Strategies’ to maximise your potential to identify and track indicatorsMore about Learning Analytics in the 2011 Educause Horizon Report - Analytics: The Uses of Management Information and Technology in Higher Education, Goldstein P and Katz R, ECAR, 2005 -[...]

Draft Guide: 'Identifying activity data in the library service'


[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'Additional Resources' section]The problem:Libraries use a range of software systems through which users interact with premises, services and resources. The LMS system is far from the only source, the OPAC and the LMS circulation module representing increasingly partial views of user attention, activity and usage in a changing world. So libraries wishing to build a picture of user interactions face the challenge of identifying the appropriate data – depending on their purpose, which may range from collection management (clearing redundant material, building ‘short loan’ capacity) to providing student success performance indicators (if correlation can be established), to developing recommender services (students who used this also used that, searched for this retrieved that, etc).Let’s split the problem down. In this guide we consider the variety of sources available within library services, a list to which you may add more. In other guides we consider strategies for deriving intelligence from ‘anything that moves’ as well as from targeted data extraction and aggregation with reference to specific goals.The options:Libraries already working with activity data have identified a range of sources and purposes – Collection Management, Service Improvement, Student Success and Recommender Services. Potential uses of data will be limited where the user is not identified in the activity (‘No attribution’). Here are some key examples: Data Source What can be counted Value of the intelligence Turnstile Visits to library Service improvement, Student success Website Virtual visits to library (no attribution) Service improvement OPAC Searches made, search terms used, full records retrieved (no attribution) Recommender system, Student success Circulation Books borrowed, renewed Collection management, Recommender system, Student success URL Resolver Accesses to e-journal articles Recommender system, Collection management Counter Stats Downloads of e-journal articles Collection management Reading Lists Occurrence of books and articles – a proxy for recommendation Recommender system Help Desk Queries received Service improvement Taking it further:Here are some important questions to ask before you start to work with user activity data:Can our systems generate that data?Are we collecting it? Sometimes these facilities exist but are switched offIs there enough of it to make any sense? How long have we been collecting data and how much data is collected per year?Will it serve the analytica[...]

Draft Guide: 'Strategies for collecting and storing activity data'


[This is a draft Guide that will be published as a deliverable of the synthesis team's activities. Your comments are very much welcomed and will inform the final published version of this Guide. We are particularly interested in any additional examples you might have for the 'References' section]The problem:Activity data typically comes in large volumes that require processing to be useful. The challenge is where to start and at what stage to become selective (e.g. analyse student transactions and not staff) and to aggregate (add transactions together – e.g. 1 record per day for books borrowed).If we are being driven by information requests or existing Performance Indicators, we will typically manipulate (select, aggregate) the raw data early. Alternatively, if we are searching for whatever the data might tell us then maintaining granularity is essential (e.g. if you aggregate by time period, by event or by cohort, you may be burying vital clues). However, there is the added dimension of data protection – raw activity datasets probably contain links to individuals and therefore aggregation may be a good safeguard (though only partial, as you may still need to throw away low incidence groupings that could betray individual identity).The options:It is therefore important to consider the differences between two approaches before you start burning bridges by selection / aggregation or unnecessarily filling terabytes of storage.Approach 1 - Start with a pre-determined performance indicator or other statistical requirement and therefore selectively extract, aggregate and analyse a subset of the data accordingly; for example:Analyse library circulation trends by time period or by faculty or …Analyse VLE logs to identify users according to their access patterns (time of day, length of session)Approach 2 - Analyse the full set (or sets) of available data in search of patterns using data mining and statistical techniques. This is likely to be an iterative process involving established statistical techniques (and tools), leading to cross-tabulation of discovered patterns, for example:Discovery 1 – A very low proportion of lecturers never post content in the VLEDiscovery 2 – A very low proportion of students never download content Discovery 3 – These groups are both growing year on yearPattern – The vast majority of both groups are not based in the UK (and the surprise is very low subject area or course correlation between the lecturers and the students)Additional resources:Approach 1 – The Library Impact Data Project (#LIDP) had a hypothesis and went about collecting data to test it - Approach 2 - The Exposing VLE Data project (#EVAD) was faced with the availability of around 40 million VLE event records covering 5 years and decided to investigate the patterns - systems (a particular form of data mining used by such as supermarkets and online stores) typically adopt Approach 2, looking for patterns using established statistical techniques - and [...]

Tabbloid news digest: 13 June 2011


Lo and behold, my Tabbloid resuscitation skills have worked this week:Open publication - Free publishing - More jiscSome of the projects have reported on unexpected project hiccups which will no doubt resonate with anyone who has worked on a similar project:- A 'regime change' at Leeds Met means that they're having to regain buy-in for the project and the project team have been asked to submit a paper to the Vice Chancellors Group containing a proposal for an extended trial of STAR-Trak.- The EVAD team have been retrieving archived data and having to deal with corrupt and missing data. As they say, their experiences "illustrates the problems of dealing with data that’s collected but not looked at very often", which reminded me of things I've seen around digital preservation and 'data rot', which says that data stored becomes less reliable as the ability to store it increases. Unfortunately it's only when we find a use for that data that we discover whether the data we think we've been collecting is actually there at all/how intact it is.One of the news highlights last week was the release of OpenUrl data and it's good to see that an initial exploration of that data has already happened. Tony Hirst shared how he's been using nothing more than the command line to explore OpenUrl's hefty dataset. Mark van Harmelen was inspired by Tony's efforts to have a play with the data himself and selected Ruby as his data digging weapon of choice. What struck me as interesting was that both Tony and Mark's curiousity was picqued by the fact that there was data on Mendeley (which makes me wonder how long it will be before one of the guys at Mendeley get tempted into digging around in the data themselves). Also of interest to me was the fact that because more than two people were delving into the data and publishing what they found that meant they could cross-check what they found with each others results - very useful!Tony Hirst has also been using a tool called Gourse to create hypnotically watchable videos of OpenURL data visualisations [see post one and post two on Tony's blog for further information]. e.g.: allowfullscreen="" frameborder="0" height="390" src="" width="480">It certainly puts a new spin on the 'let a thousand flowers bloom' phrase that I hear so often in the world of open data.A couple of other highlights from the last week of JISC AD project blogs:- The AGtivity project published a 'recipe' for producing a basic activity diary report. They also shared their thoughts on users, serendipity and use cases.- The AEIOU project reported on the suggestions which came out of their first focus group.[...]