Subscribe: Planet Code4Lib
Added By: Feedage Forager Feedage Grade B rated
Language: English
code  data  digital  information  library  marcedit  new  open data  open knowledge  open  research  time  web  work  year 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Planet Code4Lib

Planet Code4Lib

Planet Code4Lib -


Code4Lib: Seeking for a Local D.C. Host

Mon, 22 Jan 2018 21:02:16 +0000

Looking for any D.C. locals that could host a guest for the 2018 C4L. I'll be traveling from Philadelphia by car and am hoping to bike or metro to the Omni Shoreham Hotel. I have a potential registration that could be transferred to me if I can find boarding. Our travel budget is tight so I need to reduce as much cost as possible. Thanks all!

Lucidworks: Keeping Retail Sites Up 24x7x365

Mon, 22 Jan 2018 19:21:48 +0000

In a global economy there is no downtime. Even for national or local retailers, downtime has become a thing of the past. People buy at odd hours. Customers who have a bad experience have other options that are up 24×7. Search is the core of the Internet and even more so the core of how people buy things on the Internet. 24×7 Core Infrastructure Despite this, many e-commerce platforms and even dedicated search solutions treat availability as an afterthought. One well-known search solution actually requires you to store everything and scale a NAS in order to achieve availability. Not only will that not scale, but it isn’t very dependable. Lucidworks Fusion is built upon a Solr core architecture. This architecture is tried and true and is the same general idea that the Internet giants use to scale to great size while simultaneously maintaining what used to be uncommon uptime. Meanwhile the world is becoming more volatile. Whether it is climate change making weather patterns more erratic, violence around the world, or just plain old fiber cuts, you need to make sure you can handle a data center or cloud “Availability Zone” going down. This is known by many names such as Disaster Recovery, WAN replication, and CDCR. The bottom line is you need to stay up, no matter what! 24x7x365 Analytics Recently everyone is getting into the personalization and analytics/data science business. Supposedly you’re going to profile your customers, send that off to a team of data scientists who are going to load that into notebooks, and they’re going to send you back something actionable. There are a lot of crickets in that. With Fusion Insights you can see this kind of customer profile information out of the box, in real-time, whenever you want. Combined with our advanced AI technology you can also automate most of what you’d do using this data out of the way. From promotions to recommendations, you can automatically find the user their exact desires. And yes, if you want you can just grab the user events into plain old Excel (or your favorite analytics tool) or query it with plain old SQL. 24x7x365 Updates Stuff is happening, inventory is being received. You need a system that doesn’t need to go down for batch updates. You need an architecture that can be updated in real-time. If you have other systems that operate in batch, you need to be able to share the load and get it done as soon as inhumanly possible. If not, you’re losing real money. Fusion’s Solr architecture is architected like the Internet. It can take streams of data at real-time speeds and make it readily available for your customers. Meanwhile, Fusion’s ingestion architecture can take and transform data using distributed computing technology so that as many nodes as possible are involved in getting the work done. This means your data is updated and ready as fast as you need it to be. 24x7x365 Changes Data Sources change, customers change, and therefore systems change. A modern retailer is tweaking their algorithms for product search relevance, customer personalization, and everything in-between. A modern e-commerce search solution needs to be open to change at any time. Fusion’s Query Workbench lets your search developers see what they’re doing and test it. Fusion’s Index Workbench even lets you change the way you import data and see what those changes will mean before they are live. Fusion’s Experiments engine allows you to do real A-B testing, allowing you to see which version of a query or customer targeting method yields more sales. 24x7x365 Search UI Development Supposedly every time you want to make a change you’re supposed to have a JavaScript developer wind through a mountain of code and add some new functionality (type-ahead or recommendations, promotions, whatever). This is the way we’ve always done it in the Interweb era. However, it is a lot slower than the client-server days where you could drag-and-drop a WYSIWYG interface. Besides, surely someone has developed nearly everything we can think of b[...]

Evergreen ILS: 2018 Evergreen Conference Schedule is Now Up!

Mon, 22 Jan 2018 17:25:38 +0000

Thank you to all the presenters who volunteered for this year’s Evergreen International Conference!

This year, we are using an event scheduling tool, Sched, for the conference schedule.  This online tool will allow you to create your own unique conference schedule once you register for the conference and create your account  You can access the schedule in several ways:

-when you register for the conference through Eventbrite (
-on the website doing an event search.

Thanks, too, to all of the members of this year’s Programming subcommittee:
Garry Collum, Kenton County Public Library
Terran McCanna, PINES
Debbie Luchenbill, MOBIUS
Jessica Woolford, Bibliomation

Debbie Luchenbill
Maegan Bragg
Donna Bacon
The whole 2018 Conference Planning Crew

Open Knowledge Foundation: Announcing the Open Data Day 2018 mini-grants scheme

Mon, 22 Jan 2018 09:14:53 +0000

If data is freed into the open, but no one uses it, can we consider it open data? This is one of the questions we need to ask ourselves is we want to promote data use. And what better day to promote data use than Open Data Day (ODD)? So what is Open Data Day? ODD is the yearly event where we gather to reach out to new people and build new solutions to issues in our communities using open data. 2018 marks the 8th edition of ODD. In these eight years, the community has grown and evolved greatly. Last year we registered more than 300 events around the world! To make sure some of those events had everything they needed to be great for their communities, we had the support of Hivos, Article19, SPARC and Hewlett to provide mini-grants for their organizing. This effort resulted in more than 200 applications for mini-grants that followed one of the four tracks we supported last year. This year we want to have an even bigger pool por events (we already have almost 40 events registered on To achieve this, we have partnered again with Hivos to support Follow the Money and Open Contracting events. SPARC to support Open Science and Open Research Data events, and introducing a new, exciting partnership with Mapbox to support events with Open Mapping at their core. One of last year’s funded events in Ghana: ‘“Following the money: Tracking money flows on development projects in the Accra Metropolis” Cool, so what are the mini-grants? A mini-grant is a small fund of between $200-$400 for groups to organize Open Data Day events. Since last year, we gave these grants to the events that would focus on specific issues around open data. These year, as we have already mentioned we have three tracks: Follow the Money and Open Contracting, Open Science and Open Research Data Open Mapping. There are some important things  to be aware of: To all grants: We cannot fund government applications, whether national or local. This is since we support civil society actions. We encourage governments to find their local groups and engage with them! For  Tracking public money flows: groups from developing countries will have priority. Event organisers can only apply once and for just one category, so choose well. What is the timeline for the mini-grants? Applications are open starting today (January the 22nd) through Sunday February 4th 2018 and the selected grantees will be announced on Monday, 19th February 2018. However, it is important to note that all payments will be made to the teams after ODD when they submit their blog reports and a copy of their expenses. If you need to have the payment processed before March 3, we will consider on a case by case basis. Need some guidance on how to organise Open Data Day events? Check our Open Data Day Organizer’s Guide Need some inspiration for you Open Data Day events? The OKI Staff curated some ideas for you! Are you all set? If you are all set to organize you ODD event, apply for a mini-grant HERE.    [...]

DuraSpace News: VIVO Updates, Jan 22 — Slack, Zoom, New Meeting Times, Webinars

Mon, 22 Jan 2018 00:00:00 +0000

From Mike Conlon, VIVO Project Director

Slack and Zoom.  With the new year come new technologies for use by the VIVO project.

David Rosenthal: Web Advertising and the Shark, revisited (and updated)

Sun, 21 Jan 2018 16:26:01 +0000

There's a lot to add to Has Web Advertising Jumped The Shark? (which is a violation of  Betteridge's Law). Follow me below the fold for some of it.First, I should acknowledge that, as usual, Maciej Cegłowski was ahead of the game. He spotted this more than two years ago and described it in The Advertising Bubble, based on a talk he gave in Sydney. The short version is:There's an ad bubble. It's gonna blow. Money flows in ad ecosystemThe longer version is worth reading, but here is a taste:Right now, all the ad profits flow into the pockets of a few companies like Facebook, Yahoo, and Google. ... You'll notice that the incoming and outgoing arrows in this diagram aren't equal. There's more money being made from advertising than consumers are putting in.The balance comes out of the pockets of investors, who are all gambling that their pet company or technology will come out a winner. They provide a massive subsidy to the adtech sector. ... The only way to make the arrows balance at this point will be to divert more of each consumer dollar into advertising (raise the ad tax), or persuade people to buy more stuff. ... The problem is not that these companies will fail (may they all die in agony), but that the survivors will take desperate measures to stay alive as the failure spiral tightens. ... The only way I see to avert disaster is to reduce the number of entities in the swamp and find a way back to the status quo ante, preferably through onerous regulation. But nobody will consider this. What Doc Searls SawWhat Ev Williams SawCegłowski was right that things would get bad. Last December Doc Searls, in After Peak Marketing, reported about the ads he and Ev Williams saw on Facebook when they read this post from one Mark Zuckerberg:“Of all the content on Facebook, more than 99% of what people see is authentic. Only a very small amount is fake news and hoaxes. The hoaxes that do exist are not limited to one partisan view, or even to politics. Overall, this makes it extremely unlikely hoaxes changed the outcome of this election in one direction or the other.”Searls points out that, despite Zuckerberg's "99% authentic" claim:All four ads are flat-out frauds, in up to four ways apiece:All are lies (Tiger isn’t gone from Golf, Trump isn’t disqualified, Kaepernick is still with the Niners, Tom Brady is still playing), violating Truth in Advertising law.They were surely not placed by ESPN and CNN. This is fraud.All four of them violate copyright or trademark laws by using another company’s name or logo. (One falsely uses another’s logo. Three falsely use another company’s Web address.)All four stories are bait-and-switch scams, which are also illegal. (Both of mine were actually ads for diet supplements.)Mark Zuckerberg announced changes to Facebook's News Feed to de-prioritize paid content, but Roger McNamee is skeptical of the effect:Zuckerberg’s announcement on Wednesday that he would be changing the Facebook News Feed to make it promote “meaningful interactions” does little to address the concerns I have with the platform.So am I. Note that the changes:will de-prioritize videos, photos, and posts shared by businesses and media outlets, which Zuckerberg dubbed “public content”, in favor of content produced by a user’s friends and family. They don't address the ads that Searls and Williams saw. But they do have the effect of decreasing traffic to publisher's content:Publishers, on the other hand, were generally freaked out. Many have spent the past 5 years or so desperately trying to "play the Facebook game." And, for many, it gave them a decent boost in traffic (if not much revenue). But, in the process, they proceeded to lose their direct connection to many readers. People coming to news sites from Facebook don't tend to be loyal readers. They're drive-bys. And thus divert advertising dollars to Facebook from other sites. The other sites have been[...]

Karen G. Schneider: Keeping Council

Sat, 20 Jan 2018 15:52:25 +0000

Editorial note: Over half of this post was composed in July 2017. At the time, this post could have been seen as politically neutral (where ALA is the political landscape I’m referring to) but tilted toward change and reform. Since then, Events Have Transpired. I revised this post in November, but at the time hesitated to post it because Events Were Still Transpiring. Today, in January 2018, I believe even more strongly in what I write here, but take note that the post didn’t have a hidden agenda when I wrote it, and, except where noted, it still reflects my thoughts from last July, regardless of ensuing events. My agendas tend to be fairly straightforward. — KGS   Original Post, in which Councilors are Urged to Council Edits in 2018 noted with bolding. As of July 2017, I am back on ALA Council for my fifth (non-consecutive) term since joining the American Library Association in 1991. In June I attended Council Orientation, and though it was excellent–the whole idea that Councilors would benefit from an introduction to the process is a beneficial concept that emerged over the last two decades–it did make me reflect on what I would add if there had been a follow-on conversation with sitting Councilors called “sharing the wisdom.” I was particularly alerted to that by comments during Orientation which pointed up a traditional view of the Council process where ALA’s largest governing body is largely inactive for over 350 days a year, only rousing when we prepare to meet face to face. Take or leave what I say here, or boldly contradict me, but it does come from an abundance of experience. You are a Councilor year-round Most newly-elected Councilors “take their seats” immediately after the annual conference following their election — a factoid with significance. Council, as a body, struggles with being a year-round entity that takes action twice a year during highly-condensed meetings during a conference with many other things happening. I have written about this before, in a dryly wonky post from 2012 that also addresses Council’s composition and the role of chapters. I proposed that Council meet four times a year, in a solstice-and-equinox model. Two of those meetings (the “solstice” meetings) could  be online. (As far back as 2007 I was hinting around about the overhead and carbon footprint of Midwinter.) I doubt Midwinter will go to an online format even within the next decade–it’s a moneymaker for ALA, if less so than before, and ALA’s change cycle is glacial–but the proposal was intended to get people thinking about how Council does, and doesn’t, operate. In lieu of any serious reconsideration of Council, here are some thoughts. First, think of yourself as a year-round Councilor, even if you do not represent a constituency such as a state chapter or a division that meets and takes action outside of ALA. Have at least a passing familiarity with the ALA Policy Manual. Bookmark it and be prepared to reference it. Get familiar with ALA’s financial model through the videos that explain things such as the operating agreement. Read and learn about ALA. Share news. Read the reports shared on the list, and post your thoughts and your questions. Think critically about what you’re reading. It’s possible to love your Association, believe with your heart that it has a bright future, and still raise your eyebrows about pat responses to budget questions, reassurances that membership figures and publishing revenue will rebound, and glib responses about the value of units such as the Planning and Budget Assembly. Come to Council prepared. Read everything you can in advance, speak with other Councilors, and apply solid reflection, and research if needed, before you finish packing for your trip. Preparation requires an awareness that you will be deluged with reading just as you are struggling to button up work at your library and pr[...]

Alf Eaton, Alf: Indexing Semantic Scholar's Open Research Corpus in Elasticsearch

Sat, 20 Jan 2018 07:57:26 +0000

Semantic Scholar publishes an Open Research Corpus dataset, which currently contains metadata for around 20 million research papers published since 1991.

  1. Create a DigitalOcean droplet using a "one-click apps" image for Docker on Ubuntu (3GB RAM, $15/month) and attach a 200GB data volume ($20/month).
  2. SSH into the instance and start an Elasticsearch cluster running in Docker.
  3. Install esbulk: VERSION=0.4.8; curl -L${VERSION}/esbulk_${VERSION}_amd64.deb -o esbulk.deb && dpkg -i esbulk.deb && rm esbulk.deb
  4. Fetch, unzip and import the Open Research Corpus dataset (inside the zip archive is a license.txt file and a gzipped, newline-delimited JSON file): VERSION=2017-10-30; curl -L${VERSION}/papers-${VERSION}.zip -o && unzip && rm && esbulk -index scholar -type paper -id id -verbose -purge -z < papers-${VERSION}.json.gz && rm papers-${VERSION}.json.gz
  5. While importing, index statistics can be viewed at http://localhost:9200/scholar/_stats?pretty
  6. After indexing, optimise the Elasticsearch index by merging into a single segment: curl -XPOST 'http://localhost:9200/scholar/_forcemerge?max_num_segments=1'
  7. (recommended) Use ufw to prevent external access to the Elasticsearch service and put a web service (e.g. an Express app) in front of it, mapping routes to Elasticsearch queries.

Alf Eaton, Alf: Formatting a LaCie external drive for Time Machine

Sat, 20 Jan 2018 07:56:47 +0000

  1. Plug in the drive and open Disk Utility.
  2. If only the 256MB setup volume is visible rather than the whole 2TB drive, select View > Show All Devices.
  3. Select the 2TB device and press "Erase".
  4. Choose a name, select "Mac OS Extended (Journaled, Encrypted)" (Time Machine doesn’t support APFS as it needs hard links to directories) and "GUID Partition Map" (Time Machine prefers GUID Partition Map, MBR is for Windows, Apple Partition Map is for old PowerPC Macs), then press "Erase".
  5. When Time Machine pops up, check "Encrypt backups" and accept the dialog.

Eric Hellman: GitHub Giveth; Wikipedia Taketh Away

Fri, 19 Jan 2018 16:06:45 +0000

One of the joys of administering Free-Programming-Books, the second most popular repo on GitHub, has been accepting pull requests (edits) from new contributors, including contributors who have never contributed to an open source project before. I always say thank you. I imagine that these contributors might go on to use what they've learned to contribute to other projects, and perhaps to start their own projects. We have some hoops to jump through- there's a linter run by Travis CI that demands alphabetical order, even for cyrillic and CJK names that I'm not super positive as to how they get "alphabetized". But I imagine that new and old contributors get some satisfaction when their contribution gets "merged into master", no matter how much that sounds like yielding to the hierarchy.

Contributing to Wikipedia is a different experience. Wikipedia accepts whatever edits you push to it, unless the topic has been locked down. No one says thank you. It's a rush to see your edit live on the most consulted and trusted site on the internet. But then someone comes and reverts or edits your edit. And instantly the emotional state of a new Wikipedia editor changes from enthusiasm  to bitter disappointment and annoyance at the legalistic (and typically white male) Wikipedian.

Psychologists know that that rewards are more effective motivations than punishments so maybe the workflow used on GitHub is kinder than that used on Wikipedia. Vandalism and spam are a difficult problem for truly open systems, and contention is even harder. Wikipedia wastes a lot of energy on contentious issues. The GitHub workflow simplifies the avoidance of contention and vandalism but sacrifices a bit of openness by depending a lot on the humans with merge privileges. There are still problems - every programmer has had the horrible experience of a harsh or petty code review, but at least there are tools that facilitate and document discussion.

The saving grace of GitHub workflow is that if the maintainers of a repo are mean or incompetent, you can just fork the repo and try to do better. In Wikipedia, controversy gets pushed up a hierarchy of privileged clerics. The Wikipedia clergy does an amazingly good job, considering what they're up against, and their workings are in the open for the most part, but the lowly wiki-parishioner rarely experiences joy when they get involved. In principle, you can fork wikipedia, but what good would it do you?

The miracle of Wikipedia has taught us a lot; as we struggle to modernize our society's methods of establishing truth, we need to also learn from GitHub.

Update 1/19: It seems this got picked up by Hacker News. The comment by @avian is worth noting. The flip side of my post is that Wikipedia offers immediate gratification, while a poorly administered GitHub repo can let contributions languish forever, resulting in frustration and disappointment. That's something repo admins need to learn from Wikipedia!

District Dispatch: Mother Teresa and Margaret Sanger do not mix

Fri, 19 Jan 2018 14:56:28 +0000

The American Library Association gets hundreds of calls a year from libraries tackling book challenges and other forms of censorship. Heck, we even celebrate with banned book week. Our Office of Intellectual Freedom (OIF) takes these calls and advises librarians on their options. One library director in the small town of Trumbull in Connecticut called OIF when people objected to a painting on display at the Trumbull Public Library. It was part of a series of works by Robin Morris called the Great Minds Collection. Richard Resnick, a citizen of Trumbull commissioned the works and gave the collection of 33 artworks to the library to exhibit. One painting—Onward We March—in the collection depicts several famous women at a rally. Mother Teresa is there, representing the Mission of Charity along with Gloria Steinem, Clara Barton, Susan B. Anthony and others including Margaret Sanger, the founder of Planned Parenthood. The citizen complained about the juxtaposition of Mother Teresa and Margaret Sanger. Their argument was the Mother Teresa would never march with the likes of Sanger. It was offensive. The Missionaries of Charity, an organization founded by Mother Teresa, said the painting had to be removed because they had intellectual property rights of the image of Saint Teresa. The Library Board of Trustees who stood firm and maintained the painting remain. They noted the library’s support of free expression and diversity of opinion. They also noted that a copyright infringement claim seemed dubious. Was this just an excuse for removing the painting? Enter the attorneys, religious leaders, ACLU and First Selectman Tim Herbst who represents the district in the state legislature. Herbst, with political ambitions, struggled with a decision he thought was his to make. (The Library Board of Trustees thought it was their decision). Despite the bogus copyright claims, the story about potential liability for the city was a convenient excuse to remove the painting from the library. Against the decision by the Library Board of Trustees to keep the painting on display, Herbst removed the painting from the exhibit saying: “After learning that the Trumbull Library Board did not have the properly written indemnification for the display of privately owned artwork in the town’s library, and also being alerted to allegations of copyright infringement and unlawful use of Mother Teresa’s image, upon the advice of legal counsel, I can see no other respectful and responsible alternative than to temporarily suspend the display until the proper agreements and legal assurances.” Less than a week later, the painting was back up after Richard Resnick, against the advice of his attorney, signed a document that he would take responsibility if the library or city was sued. Herbst announced his decision to replace the painting at a town library meeting. While giving his remarks, there was a loud commotion in the library room next door. When people ran to look at what was happening, they saw a woman defaced the painting, using a back marker to cross out the face of Margaret Sanger. The woman fled the scene. Police were called and people were questioned, but the culprit was never found. Those at the meeting agreed, that in spite of their differences of opinion, none of them wanted the painting vandalized. Since then, the library has tried to put the situation behind them. The Great Minds Collection is still being exhibited alongside the Onward We March painting restored. Robin Morris’s art has gained widespread recognition in popularity. Images of her work on cups, posters, shirts and shopping bags are now available. The post Mother Teresa and Margaret Sanger do not mix appeared first on District Dispatch.[...]

Alf Eaton, Alf: JANICE: a prototype re-implementation of JANE, using the Semantic Scholar Open Research Corpus

Fri, 19 Jan 2018 13:50:37 +0000


For many years, JANE has provided a free service to users who are looking to find experts on a topic (usually to invite them as peer reviewers) or to identify a suitable journal for manuscript submission.

The source code for JANE has recently been published, and the recommendation process is described in a 2008 paper: essentially the algorithm takes some input text (title and/or abstract), queries a Lucene index of PubMed metadata to find similar papers (with some filters for recency, article type and journal quality), then calculates a score for each author or journal by summing up the relevance scores over the most similar 50 articles.

JANE produces a list of the most relevant authors of similar work, and does some extra parsing to extract their published email addresses. As PubMed doesn't disambiguate authors (apart from the relatively recent inclusion of ORCID identifiers), the name is used as the key for each author, so it's possible (but unusual) that two authors with the same name could be combined in the search results.

Semantic Scholar

The latest release of Semantic Scholar's Open Research Corpus contains metadata for just over 20 million journal articles published since 1991, covering computer science and biomedicine. The metadata for each paper includes title, abstract, year of publication, authors, citations (papers that cited this paper) and references (papers that were cited by this paper). Importantly, authors and papers are each given a unique ID.


JANICE is a prototype re-implementation of the main features of JANE: taking input text and finding similar authors or journals. It runs a More Like This query with the input text against an Elasticsearch index of the Open Research Corpus data, retrieves the 100 most similar papers (optionally filtered by publication date), then calculates a score for each author or journal by summing up their relevance scores.

The results of this algorithm are promising: using open peer review data from one of PeerJ's published articles, JANICE returned a list of suggested reviewers containing 2 of the 3 actual reviewers within the top 10; the other reviewer was only missing from the list because although they had authored a relevant paper, it happened to not use the same keywords as the input text (using word vectors would help here).


This prototype was built as part of the development of xpub, a journal platform produced by the Collaborative Knowledge Foundation and partner organisations.

Ed Summers: Desire

Fri, 19 Jan 2018 05:00:00 +0000

I recently reviewed an article draft that some EDGI folks were putting together that examines their work to date. The draft is quite useful if you are interested in how EDGI’s work to archive potentially at risk environmental scientific data fits in with related efforts such as Data Rescue, Data Refuge and Data Together. The article is also quite interesting because it positions their work by thinking of it in terms of an emerging framework for environmental data justice. Environmental data justice is a relatively new idea that sits at the intersection of environmental justice and critical data studies (note I didn’t link to the Wikipedia entry because it needs quite a bit of improvement IMHO). I think it could be useful for ideas of environmental data justice to also draw on a long strand of thinking about archives as the embodiment-of and a vehicle-for social justice (Punzalan & Caswell, 2016), which goes back some 40 years. I think it could also be also useful to think it in terms of emerging ideas around data activism that are popping up in activities such as the Responsible Data Forum. At any rate, this post wasn’t actually meant to about any of that, but just meant to be a note to myself about a reference in the EDGI draft to a piece by Eve Tuck entitled Suspending Damage: A Letter to Communities (Tuck, 2009). In this open letter, published in the Harvard Educational Review, Tuck calls on researchers to put a moratorium on what she calls damaged centered research: In damaged-centered research, one of the major activities is to document pain or loss in an individual, community, or tribe. Though connected to deficit models—frameworks that emphasize what a particular student, family, or community is lacking to explain underachievement or failure–damage-centered research is distinct in being more socially and historically situated. It looks to historical exploitation, domination, and colonization to explain contemporary brokenness, such as poverty, poor health, and low literacy. Common sense tells us this is a good thing, but the danger in damage-centered research is that it is a pathologizing approach in which the oppression singularly defines a community. Here’s a more applied definition of damage-centered research: research that operates, even benevolently, from a theory of change that establishes harm or injury in order to achieve reparation. Instead Tuck wants to re-orient research around a theory of change that documents desire instead of damage: As I will explore, desire-based research frameworks are concerned with understanding complexity, contradiction, and the self-determination of lived lives … desire-based frameworks defy the lure to serve as “advertisements for power” by documenting not only the painful elements of social realities but also the wisdom and hope. Such an axiology is intent on depathologizing the experiences of dispossessed and disenfranchised communities so that people are seen as more than broken and conquered. This is to say that even when communitiee are broken and conquered, they are so much more than that so much more that this incomplete story is an act of aggression. Tuck points out that she isn’t suggesting that desire-based research should replace damaged-centered research, but that instead it is part of an epistemological shift: how knowledge is generated and understood, or how we know what we know. This is a subtle point, but Tuck does a masterful job of providing real examples in this piece, so its well worth a read if this sounds at all interesting. I was kind of surprised that Tuck draws on assemblage theory and the work of Deleuze & Guattari (1987) in developing this idea of desire-based research: Poststructuralist theoris[...]

Library of Congress: The Signal: January Innovator-in-Residence Update: Experiments with Jer Thorp

Thu, 18 Jan 2018 21:11:00 +0000

We’ve been delighted to have Library of Congress Innovator-in-Residence Jer Thorp with us since October. During the first three months of his residency he has connected with staff, visited collections, and explored forms of data to make better sense the inner workings of the Library. Jer has been weaving together those threads with experiments and other works in progress.  Turning the Process on its Ear  Jer has made record of his activity from the start via interviews with Library staff and from within the Library of Congress main reading room and stacks, while reflecting on what he has encountered. The result is the podcast “Artist in the Archive.” It’s on a roll with two episodes so far. The podcast follows a format that includes detailed discussions with National Digital Initiatives Chief Kate Zwaard, Curator of the Jay I. Kislak Collection of the Archaeology and History of the Early Americas, John Hessler, and Director For Acquisitions And Bibliographic Access, Beacher Wiggins. These longer discussions are framed by segments with Library of Congress curators and archivists such as Meg McAleer and Todd Harvey sharing share vignettes of unique collections; from Sputnik’s launch to folk revival in New York City, bringing the perspectives of the past to life. Listen to the first two episodes of “Artist in the Archive” and share your thoughts and questions with Jer. You can also find transcripts for episode one and episode two, as well as finding aids with images of objects described in episodes one and two.  Arranging Appellations Sometimes the language of the Library can be on the tip of your tongue; other times, you’d need a glossary to define the experience. For example, are you on the hunt for Hapax legomenon? Misplaced your best Volvelle? Learn more about unique and obscure terminology from the library world in this crowdsourced glossary Jer compiled in October. Finding your favorite library term missing? Let Jer know in the comments or on Twitter. Experiments and Exploring Collections  In October we shared details of Jer’s Library of Names app here on the Signal. Built with the name authority files from the Library of Congress MARC records, the Library of Names carves out the first names of authors at five-year intervals; exploring with the app allows one to imagine the mix of creators across time. “A person of encyclopedic learning” according to Merriam-Webster; a polymath is an individual whose expertise spans diverse subject matter or disciplines. If you’ve listened to episode two of Jer’s podcast, you’ll have learned that the subject matter expertise of creators is captured in the name authority field in MARC records. What can these records tell us about the careers of authors? Armed with this data and a handful of questions, it is possible to probe the edges and overlaps of expertise; such as the painter-pianist-composer Ann Wyeth McCoy (sister of artist Andrew Wyeth). While gathering stories from within collections here at the Library of Congress, Jer has also been making queries within the 25 million MARC records. Jer created a network map by taking approximately 9 million name authority files from the MARC records as a starting exercise. Next, he returned to those same people and calculated their movement across the map. He shared this exercise with reflections on its promise and potential problems with this approach on Twitter, along with the code. See this Twitter thread for more details and examples, such as a poet-diplomat-composer (Oswald von Wolkenstein) and a soldier-shoemaker-postmaster-teacher-surveyor-civil engineer-photographer-deacon in one Samuel Chase Hodgman.    Polymaths mapped from name authority fil[...]

Terry Reese: MarcEdit 7: The great [Normalization] escape

Thu, 18 Jan 2018 19:12:09 +0000

working out some thoughts here — this will change as I continue working through some of these issues. If you follow the MarcEdit development, you’ll know that last week, I posted a question in a number of venues about the affects of Unicode Normalization and its potential impacts for our community.  I’ve been doing a little bit of work in MarcEdit, having a number of discussions with vendors and folks that work with normalizations regularly – and have started to come up with a plan.  But I think there is a teaching opportunity here as well, an opportunity to discuss how we find ourselves having to deal with this particular problem, where the issue is rooted, and the impacts that I see right now in ILS systems and for users of tools like MarcEdit.  This isn’t going to be an exhaustive discussion, but hopefully it helps folks understand a little bit more what’s going on, and why this needs to be addressed. Background So, let’s start at the beginning.  What exactly are Unicode normalizations, and why is this something that we even need to care about…. Unicode Normalizations are, in my opinion, largely an artifact of our (the computing industry’s) transition from a non-Unicode world to Unicode, especially in the way that the extended Latin character sets ended up being supported. So, let’s talk about character sets and code pages.  Character sets define the language that is utilized to represent a specific set of data.  Within the operating system and programming languages, these character sets are represented as code pages. For example, Windows provides support for the following code pages:    Essentially, code pages are lists of numeric values that tell the computer how to map a  representation of a letter to a specific byte.  So, let’s use a simple example, “A”.  In ASCII and UTF8 (and other) code pages, the A that we read, is actually represented as a byte of data.  This byte is 0x41.  When the browser (or word processor) sees this value, it checks the value against the defined code page, and then provides the appropriate value from the font being utilized.  This is why, in some fonts, some characters will be represented as a “?” or a block.  These represent bytes or byte sequences that may (or may not) be defined within the code page, but are not available in the font. Prior to Unicode implementations, most languages had their own code pages.  In Windows, the US. English code page would default to 1252.  In Europe, if ISO-8859 was utilized, the code page would default to  28591.  In China, the code page could be one of many.  Maybe “Big-5”, or code page 950, or what is referred to as Simplified Chinese, or code page 936.  The gist here, is that prior to the Unicode standard, languages were represented by different values, and the keyboards, fonts, systems – would take the information about a specific code page and interpret the data so that it could be read.  Today, this is why catalogers may still encounter confusion if they get records from Asia, where the vendor or organization makes use of “Big-5” as the encoding.  When they open the data in their catalog (or editor), the data will be jumbled.  This is because MARC doesn’t include information about the record code page – rather, it defines values as Unicode, or something else.  So, it is on catalogers and systems to know the character set being utilized, and utilized tools to convert the byte points from a character encoding that they might not be able to use, to one that is friendly for their systems. So, let’s get back to this idea of Norm[...]

Brown University Library Digital Technologies Projects: Python/Django warnings

Thu, 18 Jan 2018 18:35:51 +0000

I recently updated a Django project from 1.8 to 1.11. In the process, I started turning warnings into errors. Django docs recommend resolving any deprecation warnings with current version, before upgrading to a new version of Django. In this case, I didn’t start my upgrade work by resolving warnings, but I did run the tests with warnings enabled for part of the process.

Here’s how to enable all warnings when you’re running your tests:

  1. From the CLI
    • use -Werror to raise Exceptions for all warnings
    • use -Wall to print all warnings
  2. In the code
    • import warnings; warnings.filterwarnings(‘error’) – raise Exceptions on all warnings
    • import warnings; warnings.filterwarnings(‘always’) – print all warnings

If a project runs with no warnings on a Django LTS release, it’ll (generally) run on the next LTS release as well. This is because Django intentionally tries to keep compatibility shims until after a LTS release, so that third-party applications can more easily support multiple LTS releases.

Enabling warnings is nice because you see warnings from python or other packages, so you can address whatever problems they’re warning about, or at least know that they will be an issue in the future.

Jonathan Rochkind: attachment filename downloads in non-ascii encodings, ruby, s3

Thu, 18 Jan 2018 18:23:44 +0000

You tell the browser to force a download, and pick a filename for the browser to ‘save as’ with a Content-Disposition header that looks something like this: Content-Disposition: attachment; filename="filename.tiff" Depending on the browser, it might open up a ‘Save As’ dialog with that being the default, or might just go ahead and save to your filesystem with that name (Chrome, I think). If you’re having the user download from S3, you can deliver an S3 pre-signed URL that specifies this header — it can be a different filename than the actual S3 key, and even different for different users, for each pre-signed URL generated. What if the filename you want is not strictly ascii? You might just stick it in there in UTF-8, and it might work just fine with modern browsers — but I was doing it through the S3 content-disposition download, and it was resulting in S3 delivering an XML error message instead of the file, with the message “Header value cannot be represented using ISO-8859-1.response-content-disposition”. Indeed, my filename in this case happened to have a Φ (greek phi) in it, and indeed this does not seem to exist as a codepoint in ISO-8859-1 (how do I know? In ruby, try `”Φ”.encode(“ISO-8859-1”)`, which perhaps is the (standard? de facto?) default for HTTP headers, as well as what S3 expects. If it was unicode that could be trans-coded to ISO-8859-1, would S3 have done that for me? Not sure. But what’s the right way to do this?  Googling/Stack-overlowing around, I got different answers including “There’s no way to do this, HTTP headers have to be ascii (and/or ISO-8859-1)”, “Some modern browsers will be fine if you just deliver UTF-8 and change nothing else” [maybe so, but S3 was not], and a newer form that looks like filename*=UTF-8''#{uri-encoded ut8} [no double quotes allowed, even though they ordinarily are in a content-disposition filename] — but which will break older browsers (maybe just leading to them ignoring the filename rather than actually breaking hard?). The golden answer appears to be in this stackoverflow answer — you can provide a content-disposition header with both a filename=$ascii_filename (where $filename is ascii or maybe can be ISO-8859-1?), followed by a filename*=UTF-8'' sub-header. And modern browsers will use the UTF-8 one, and older browsers will use the ascii one. At this point, are any of these “older browsers” still relevant? Don’t know, but why not do it right. Here’s how I do it in ruby, taking input and preparing a) a version that is straight ascii, replacing any non-ascii characters with _, and b) a version that is UTF-8, URI-encoded. ascii_filename = file_name.encode("US-ASCII", undef: :replace, replace: "_") utf8_uri_encoded_filename = URI.encode(filename) something["Content-Disposition"] = "attachment; filename=\"#{ascii_filename}\"; filename*=UTF-8''#{utf8_uri_encoded_filename}" Seems to work. S3 doesn’t complain. I admit I haven’t actually tested this on an “older browser” (not sure how old one has to go, IE8?), but it does the right thing (include the  “Φ ” in filename) on every modern browser I tested on MacOS, Windows (including IE10 on Windows 7), and Linux.[...]

Evergreen ILS: Evergreen 3.03 and 2.12.9 released

Thu, 18 Jan 2018 14:43:52 +0000

The Evergreen community is pleased to announce two maintenance releases of Evergreen 3.0.3 and 2.12.9.

Evergreen 3.0.3 has the following changes improving on Evergreen 3.0.2:

  • Fixes several issues related to the display of located URIs and records with bib sources in search results.
  • Setting opac_visible to false for a copy location group now hides only the location group itself, rather than also hiding every single copy in the group.
  • Fixes a bug that prevented the copy editor from displaying the fine level and loan duration fields.
  • The “Edit Items” grid action in the Item Status interface will now open in the combined volume/copy editor in batch. This makes the behavior consistent with the “Edit Selected Items” grid action in the copy buckets interface.
  • Staff members are now required to choose a billing type when creating a bill on a user account.
  • The Web client now provides staff users with an alert and option to override when an item with the Lost and Paid status is checked in.
  • Fixes a bug where the Web client offline circ interface was not able to set its working location.
  • Fixes an issue that prevented the ADMIN_COPY_TAG permission from being granted.
  • The MARC editor in the Web staff client now presents bib sources in alphabetical order.
  • Both circulation and grocery bills are now printed when a staff user selects a patron account and clicks “Print Bills”.
  • Fixes an issue in the XUL serials interface the “Receive move/selected” action from succeeding.
  • Fixes a typo in the user password testing interface.

Please note that the upgrade script for 3.0.3 contains a post-transaction command to forcibly update the visibility attributes of all bibs that make use of Located URIs or bib sources. This script may take a while to run on large datasets. If it it running too long, it can be canceled, and administrators can use a psql command detailed in the Release Notes to perform the same action serially over time without blocking writes to bibs.

Evergreen 2.12.9 has a fix that installs NodeJs from source, allowing the web staff client to build without failure.

Please visit the Evergreen downloads page to download the upgraded software and to read full release notes. Many thanks to everyone who contributed to the releases!

Open Knowledge Foundation: Publication: A Field Guide to “Fake News” and Other Information Disorders

Thu, 18 Jan 2018 10:41:29 +0000

This blog has been reposted from Last week saw the launch of A Field Guide to “Fake News” and Other Information Disorders, a new free and open access resource to help students, journalists and researchers investigate misleading content, memes, trolling and other phenomena associated with recent debates around “fake news”. The field guide responds to an increasing demand for understanding the interplay between digital platforms, misleading information, propaganda and viral content practices, and their influence on politics and public life in democratic societies. It contains methods and recipes for tracing trolling practices, the publics and modes of circulation of viral news and memes online, and the commercial underpinnings of this content. The guide aims to be an accessible learning resource for digitally-savvy students, journalists and researchers interested in this topic. The guide is the first project of the Public Data Lab, a new interdisciplinary network to facilitate research, public engagement and debate around the future of the data society – which includes researchers from several universities in Europe, including King’s College London, Sciences Po Paris, Aalborg University in Copenhagen, Politecnico of Milano, INRIA, École Normale Supérieure of Lyon and the University of Amsterdam. It has been undertaken in collaboration with First Draft, an initiative dedicated to improving skills and standards in the reporting and sharing of information that emerges online, which is now based at the Shorenstein Center on Media, Politics, and Public Policy at the John F. Kennedy School of Government at Harvard University. Claire Wardle who leads First Draft comments on the release: “We are so excited to support this project as it provides journalists and students with concrete computational skills to investigate and map these networks of fabricated sites and accounts. Few people fully recognize that in order to understand the online disinformation ecosystem, we need to develop these computational mechanisms for monitoring this type of manipulation online. This project provides this skills and techniques in a wonderfully accessible way.” A number of universities and media organisations have been testing, using and exploring a first sample of the guide which was released in April 2017. Earlier in the year, BuzzFeed News drew on several of the methods and datasets in the guide in order to investigate the advertising trackers used on “fake news” websites. The guide is freely available at on the project website at (direct PDF link here), as well as on Zenodo at It is released under a Creative Commons Attribution license to encourage readers to freely copy, translate, redistribute and reuse the book. A translation is underway into Japanese. All the assets necessary to translate and publish the guide in other languages are available on the Public Data Lab’s GitHub page. Further details about contributing researchers, institutions and collaborators are available on the website. The project is being launched at the Digital Methods Winter School 2018 organised by the Digital Methods Initiative at the University of Amsterdam, a year after we first started working on the project at the Winter School 2017. We are also in discussion with Sage about a book drawing on this project. [...]

DuraSpace News: DuraSpace Board of Directors Changes Leadership

Thu, 18 Jan 2018 00:00:00 +0000

As we begin a new year, DuraSpace welcomes new leadership to our Board of Directors. The Board helps set DuraSpace’s priorities to ensure that our digital heritage is widely discoverable and accessible over the long term with a community-based open source technology portfolio.

DuraSpace News: Telling VIVO Stories at The Marine Biological Laboratory Woods Hole Oceanographic Institution (MBLWHOI) with John Furfey

Thu, 18 Jan 2018 00:00:00 +0000

VIVO is member-supported, open source software and ontology for representing scholarship.

“Telling VIVO Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing VIVO implementation details for the VIVO community and beyond. The following interview includes personal observations that may not represent the opinions and views of the Marine Biological Laboratory / Woods Hole Oceanographic Institution Library (MBLWHOI) or the VIVO Project.

HangingTogether: NEW: The Realities of Research Data Management: Part Three Now Available!

Wed, 17 Jan 2018 21:43:51 +0000

A new year heralds a new RDM report! Check out Incentives for Building University RDM Services, the third report in OCLC Research’s four-part series exploring the realities of research data management. Our new report explores the range of incentives catalyzing university deployment of RDM services. Our findings in brief: RDM is not a fad, but instead a rational response by universities to powerful incentives originating from both internal and external sources. The Realities of Research Data Management, an OCLC Research project, explores the context and choices research universities face in building or acquiring RDM capacity. Findings are derived from detailed case studies of four research universities: University of Edinburgh, University of Illinois at Urbana-Champaign, Monash University, and Wageningen University and Research. Previous reports examined the RDM service space, and the scope of the RDM services deployed by our case study partners. Our final report will address sourcing and scaling choices in acquiring RDM capacity. Incentives for Building University RDM Services continues the report series by examining the factors which motivated our four case study universities to supply RDM services and infrastructure to their affiliated researchers. We identify four categories of incentives of particular importance to RDM decision-making: compliance with external data mandates; evolving scholarly norms around data management; institutional strategies related to researcher support; and researcher demand for data management support. Our case studies suggest that the mix of incentives motivating universities to act in regard to RDM differ from university to university. Incentives, ultimately, are local. RDM is both an opportunity and a challenge for many research universities. Moving beyond the recognition of RDM’s importance requires facing the realities of research data management. Each institution must shape its local RDM service offering by navigating several key inflection points: deciding to act, deciding what to do, and deciding how to do it. Our Realities of RDM report series examines these decisions in the context of the choices made by the case study partners. Visit the Realities of Research Data Management website to access all the reports, as well as other project outputs.     [...]

LITA: Jobs in Information Technology: January 17, 2018

Wed, 17 Jan 2018 19:25:30 +0000

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

City of El Segundo, Library Services Director, El Segundo, CA

New York University, Division of Libraries, Metadata Librarian, New York, NY

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Library of Congress: The Signal: Digital Scholarship Resource Guide: So now you have digital data… (part 3 of 7)

Wed, 17 Jan 2018 17:46:52 +0000

This is part three of our Digital Scholarship Research Guide created by Samantha Herron. See parts one about digital scholarship projects and two about how to create digital documents. So now you have digital data… Great! But what to do? Regardless of what your data are (sometimes it’s just pictures and documents and notes, sometimes it’s numbers and metadata), storage, organization, and management can get complicated. Here is an excellent resource list from the CUNY Digital Humanities Resource Guide that covers cloud storage, password management, note storage, calendar/contacts, task/to-do lists, citation/reference management, document annotation, backup, conferencing & recording, screencasts, posts, etc. From the above, I will highlight: Cloud-based secure file storage and sharing services like Google Drive and Dropbox. Both services offer some storage space free, but increased storage costs a monthly fee. With Dropbox, users can save a file to a folder on their computer, and access it on their phone or online. Dropbox folders can be collaborative, shared and synced. Google Drive is a web-based service, available to anyone with a Google account; any file can be uploaded, stored, and shared with others through Drive. Drive will also store Google Documents and Sheets that can be written in browser, and collaborated on in real time. Zotero, a citation management service. Zotero allows users to create and organize citations using collections and tags. Zotero can sense bibliographic information in the web browser, and add it to a library with the click of a button. It can generate citations, footnotes, endnotes, and in-text citations in any style, and can integrate with Microsoft Word. If you have a dataset: Here are some online courses from School for Data about how to extract, clean, and explore data. OpenRefine is one popular software for working with and organizing data. It’s like a very fancy Excel sheet. It looks like this: Screenshot of the Open Refine tool. Here is an introduction to OpenRefine from Owen Stephens on behalf of the British Library, 2014. Programming Historian also has a tutorial for cleaning data with OpenRefine. Some computer-y basics A sophisticated text editing software is good to have. Unlike a word processor like Microsoft Word, text editors are used to edit plaintext–text without other formatting like font, size, page breaks, etc. Text editors are important for writing code and manipulating text. Your computer probably has one preloaded (e.g. Notepad on Windows computers), but there are more robust ones that can be downloaded for free, like Notepad++ for Windows, Text Wrangler for Mac OSX, or Atom for either. The command line is a way of interacting with a computer program with text instructions (commands), instead of point-and-click GUIs, (graphical user interfaces). For example, instead of clicking on your Documents folder and scrolling through to find a file, you can type text commands into a command prompt to do the same thing. Knowing the basics of the command line helps to understand how a computer thinks, and can be a good introduction to code-ish things for those who have little experience. This Command Line Crash Course from Learn Python the Hard Way gives a quick tutorial on how to use the command line to move through your computer’s file structure. Code Academy has free, interactive lessons in many different coding languages. Python seems to be the c[...]

LITA: #LITAchat – LITA at ALA Midwinter 2018

Wed, 17 Jan 2018 17:37:53 +0000

Attending the 2018 ALA Midwinter conference? Curious about what LITA is up to?

Join us on Friday, January 26, 1:00-2:00pm EST on Twitter to discuss and ask questions about the LITA events, activities, and more happening at this year’s 2018 ALA Midwinter Meeting in Denver, CO, February 9-13.

To participate, launch your favorite Twitter mobile app or web browser and search for the #LITAchat hashtag and select “Latest” to follow along and reply to questions asked by moderator or other participants. When replying to discussion or asking questions, add or incorporate the hashtags #alamw18 and #litachat.

See you there!

District Dispatch: Bridging the Spectrum symposium at CUA/LIS highlights public policy directions in Washington

Wed, 17 Jan 2018 16:57:58 +0000

On Friday, February 2, The Catholic University of America (CUA) Library and Information Sciences Department will host its Tenth Annual Bridging the Spectrum: A Symposium on Scholarship and Practice in Library and Information Science (LIS). A one-day event, Bridging the Spectrum provides attendees with a knowledge-sharing forum and meeting place for practitioners, students, and faculty in Library and Information Sciences and Services to share work and to foster unexpected connections across the spectrum of the information professions. Dr. Alan Inouye will be the keynote speaker at CUA’s 10th annual Bridging the Spectrum symposium on February 2, 2018. The keynote address this year will be given by American Library Association Washington Office Director Dr. Alan Inouye. In Making Sense of the Headlines: Advancing Public Policy for the LIS Community, Dr. Inouye looks at the interplay between forming national policy on LIS issues such as net neutrality, federal funding for libraries and education policy with larger trends in government, technology, commerce and society, asking, “What is the more fundamental change taking place? What is really happening beneath the surface and over time—policy-wise? And how can the library and information science community best influence policy and move our interests higher on the political agenda—or at least defend ourselves as much as possible?” This year, Bridging the Spectrum continues this tradition with a varied program that covers and discusses a diverse set of trends and challenges faced within the LIS fields. Both the morning and afternoon sessions feature presentations and speakers focusing on topics from the impact of digitization and establishing credible news sources, to conducting outreach to minority groups and reinventing programming for the digital natives of the Millennial Generation and Generation Z. Beyond this, the symposium also features a poster lightning round, with posters discussing emerging trends and pedagogy in archival and librarian services. “Since 2009, Catholic University of America has been proud to have established a community of learning and knowledge-sharing through our annual Bridging the Spectrum: Symposium on Scholarship and Practice,” says Dr. Renate Chancellor, Associate Professor and University Pre-Law Advisor. Chancellor, who serves on the Symposium Committee, went on to say that the impetus for the symposium was to create an opportunity for practitioners, students and faculty in LIS to come together to showcase the wide range of research taking place throughout DC/VA/MD region. “It’s exciting to know that we are celebrating our 10th anniversary and all of the wonderful speakers, panels, and poster sessions we have seen over the years,” says Chancellor, “and to know that we have been instrumental in fostering a forum for dialogue on the important issues relevant to the LIS community.” Bridging the Spectrum: A Symposium on Scholarship and Practice in Library and Information Science, is open to the public, and will be held in the Great Room of the Pryzbala Student Center on CUA’s campus. For more information about how to register to attend, please visit We look forward to seeing you there! This guest post was contributed by Babak Zarin, an LIS candidate at CUA and research assistant for Dr. Renate Chancellor. The post Bri[...]

Open Knowledge Foundation: Educators ask for a better copyright

Wed, 17 Jan 2018 11:28:42 +0000

This blog has been reposted from the Open Education Working Group page.   Today we, the Open Education Working Group, publish a joint letter initiated by Communia Association for the Public Domain that urgently requests to improve the education exception in the proposal for a Directive on Copyright in the Digital Single Market (DSM Directive). The letter is supported by 35 organisations representing schools, libraries and non-formal education, and also individual educators and information specialists.   In September 2016 the European Commission published its proposal of a DSM Directive that included an education exception that aimed to improve the legal landscape. The technological ages created new possibilities for educational practices. We need copyright law that enables teachers to provide the best education they are capable of and that fits the needs of teachers in the 21st century. The Directive is able to improve copyright. However, the proposal does not live up to the needs of education. In the letter we explain the changes needed to facilitate the use of copyrighted works in support of education. Education communities need an exception that covers all relevant providers, and which permits a diversity of educational uses of copyrighted content. We listed four main problems with the Commission’s proposal: #1:  A limited exception instead of a mandatory one The European Commission proposed a mandatory exception, which can be overridden by licenses. As a consequence educational exception will still be different in each Member State. Moreover, educators will need a help from a lawyer to understand what they are allowed to do. #2 Remuneration should not be mandatory Currently most Member States have exceptions for educational purposes that are completely or largely unremunerated. Mandatory payments will change the situation of those educators (or their institutions), which will have to start paying for materials they are now using for free. #3: Excluding experts The European Commission’s proposal does not include all important providers of education as only formal educational establishments are covered by the exception. We note that the European lifelong-learning model underlines the value of informal and non-formal education conducted in the workplace. All these are are excluded from the education exception. #4: Closed-door policy The European Commission’s proposal limits digital uses to secure institutional networks and to the premises of an educational establishment. As a consequence educators will not develop and conduct educational activities in other facilities such as libraries and museums, and they will not be able to use modern means of communication, such as emails and the cloud. To endorse the letter, send an email to Do you want to receive updates on the developments around copyright and education, sign up for Communia’s newsletter Copyright Untangled. You can read the full letter in this blog on the Open Education website or download the PDF. [...]

DuraSpace News: Registration Open for Fedora Camp at NASA

Wed, 17 Jan 2018 00:00:00 +0000

Fedora is the robust, modular, open source repository platform for the management and dissemination of digital content. Fedora 4, the latest production version of Fedora, features vast improvements in scalability, linked data capabilities, research data support, modularity, ease of use and more. Fedora Camp offers everyone a chance to dive in and learn all about Fedora.
The Fedora team will offer a Camp from Wednesday May 16 - Friday May 18, 2018 at the NASA Goddard Space Flight Center  in Greenbelt, Maryland outside of Washington, D.C.

Library of Congress: The Signal: From Code to Colors: Working with the JSON API

Tue, 16 Jan 2018 21:26:37 +0000

The following is a guest post by Laura Wrubel, software development librarian with George Washington University Libraries, who has joined the Library of Congress Labs team during her research leave. The Library of Congress website has an API ( “application programming interface”) which delivers the content for each web page. What’s kind of exciting is that in addition to providing HTML for the website, all of that data–including the digitized collections–is available publicly in JSON format, a structured format that you can parse with code or transform into other formats. With an API, you can do things like: build a dataset for analysis, visualization, or mapping dynamically include content from a website in your own website query for data to feed a Twitter bot This opens up the possibility for a person to write code that sends queries to the API in the form of URLs or “requests,” just like your browser makes. The API returns a “response” in the form of structured data, which a person can parse with code. Of course, if there were already a dataset available to download that would be ideal. David Brunton explains how bulk data is particularly useful in his talk “Using Data from Historical Newspapers.” Check out LC for Robots for a growing list of bulk data currently available for download. I’ve spent some of my time while on research leave creating documentation for the JSON API.  It’s worth keeping in mind that the JSON API is a work in progress and subject to change. But even though it’s unofficial, it can be a useful access point for researchers.  I had a few aims in this documentation project: make more people aware of the API and the data available from it, remove some of the barriers to using it by providing examples of queries and code, and demonstrate some ways to use it for analysis. I approached this task keeping in mind a talk I heard at PyCon 2017, Daniele Procida’s “How documentation works, and how to make it work for your project” (also available as a blog post), which classifies documentation into four categories: reference, tutorials, how-to, and explanation. This framing can be useful in making sure your documentation is best achieving its purpose. The JSON API documentation is reference documentation, and points to Jupyter notebooks for Python tutorials and how-to code. If you have ideas about additional “how-to” guides and tutorials would be useful, I’d be interested to hear them! At the same time that I was digging into the API, I was working on some Jupyter notebooks with Python code for creating image datasets, for both internal and public use. I became intrigued by the possibilities of programmatic access to thumbnail images from the Library’s digitized collections. I’ve had color on my mind as an entry point to collections since I saw Chad Nelson’s DPLA Color Browse project at DPLAfest in 2015. So as an experiment, I created Library of Congress Colors. View of colors derived from the Library of Congress Baseball Cards digital collection The app displays six colors swatches, based on cluster analysis, from each of the images in selected collections. Most of the collections have thousands of images, so it’s striking to see the patterns that emerge as you scroll through t[...]

District Dispatch: UPDATE: 50 Senators support CRA to restore Net Neutrality

Tue, 16 Jan 2018 17:59:12 +0000

Senate legislation to restore 2015’s strong, enforceable net neutrality rules now has the bipartisan support from 50 of 100 senators and would be assured of passage if just one more Republican backs the effort. The bill is a Congressional Review Act (CRA) resolution from Sen. Ed Markey (D-MA), which would block the Federal Communications Commission’s (FCC) December repeal of net neutrality rules.

The measure is backed by all 49 members of the Senate Democratic caucus, including 47 Democrats and two independents who caucus with Democrats. Sen. Susan Collins (R-ME) is the only Republican to support the bill so far, and supporters are trying to secure one more Republican vote. A successful CRA vote, in this case, would invalidate the FCC’s net neutrality repeal and prevent the FCC from issuing a similar repeal in the future. But the Senate action needs a counterpart in the House, and this Congressional action would be subject to Presidential approval.

ALA is working with allies to encourage Congress to overturn the FCC’s egregious action. Email your members of Congress today and ask them to use a Joint Resolution of Disapproval under the CRA to repeal the December 2017 FCC action and restore the 2015 Open Internet Order protections.

We will continue to update you on the activities above and other developments as we continue to work to preserve a neutral internet.

The post UPDATE: 50 Senators support CRA to restore Net Neutrality appeared first on District Dispatch.

pinboard: Availability Calendar - Kalorama Guest House

Tue, 16 Jan 2018 17:55:55 +0000


David Rosenthal: Not Really Decentralized After All

Tue, 16 Jan 2018 16:00:39 +0000

Here are two more examples of the phenomenon that I've been writing about ever since Economies of Scale in Peer-to-Peer Networks more than three years ago, centralized systems built on decentralized infrastructure in ways that nullify the advantages of decentralization:

Open Knowledge Foundation: A lookback on 2017 with OK Brazil

Tue, 16 Jan 2018 09:30:56 +0000

This blog has been written by Natalia Mazotte and Ariel Kogan, co-directors of Open Knowledge Brazil (OKBR). It has been translated from the original version at by Juliana Watanabe, volunteer of OKBR. For us at Open Knowledge Brazil (OKBR), the year 2017 was filled with multiple partnerships, support and participation in events; projects and campaigns for mobilisation. In this blog we selected some of these highlights. Furthermore, newsflash for the team: the journalist Natália Mozatte, that was already leading Escola de Datos (School of Data) in Brazil, became co-director with Ariel Kogan (executive director since July 2016). Foto: Engin_Akyurt / Creative Commons CC0 Mobilisation At the beginning of the year, OKBR and several other organizations introduced the Manifest for Digital Identification in Brazil. The purpose of the Manifest is to be a tool for society to take a stand towards the privacy and safety of personal data of citizens and turn digital identification into a safe, fair and transparent action. We monitored one of the main challenges in the city of São Paulo and contributed to the mobilisation for this. Along with other civil society organisations, we urged the City Hall of São Paulo for transparency regarding mobility. The reason: on 25 January 2017, the first day of the new increase to the speed limits on Marginais Pinheiros and Tietê, we noticed several news items about the decrease in traffic accidents linked to the policy of reducing speed in certain parts of the city was unavailable on the site of the Traffic Engineering Company (CET). For a few months, we conducted a series of webinars called OKBR Webinars Serires, about open knowledge of the world. We had the participation of the following experts: Bart Van Leeuwen, entrepreneur; Paola Villareal, Fellow from the Berkman Klein Center, designer/data scientist; Fernanda Campagnucci, journalist and analyst of public policies and Rufus Pollock, founder of Open Knowledge International. We took part in a major victory for society! Along with the Movimento pela Transparência (PartidáriaMovement for Partisan Transparency), we conducted a mobilisation against the rapporteur’s proposal for a political reform, congressman Vicente Cândido (PT-SP), about hidden contributions from the campaign and the result was very positive. Besides us, a variety of organisations and movements took part in this initiative against hidden donations,: we published and handed out a public statement. The impact was huge: as a consequence, the rapporteur announced the withdrawal of secret donations. We also participated in #NãoValeTudo, a collective effort to discuss the correct use of technology for electoral purposes along with AppCívico, o Instituto Update, o Instituto Tecnologia e Equidad. Projects We performed two cycles of OpenSpending. The first cycle initiated in January and involved 150 municipalities. In July, we published the report of cycle 1. In August, we started the second cycle of the game with something new: Guaxi, a robot which was the digital assistant to competitors. It is an expert bot developed with innovative chatbot techn[...]

Ed Summers: Programmed Visions

Tue, 16 Jan 2018 05:00:00 +0000

I’ve been meaning to read Wendy Hui Kyong Chun for some time now. Updating to Remain the Same is on my to-read list, but I recently ran across a reference to Programmed Visions: Software and Memory in Rogers (2017), which I wrote about previously, and thought I would give it a quick read beforehand. Programmed Visions is a unique mix of computing history, media studies and philosophy that analyzes the ways in which software has been reified or made into a thing. I’ve begun thinking about using software studies as a framework for researching the construction and operation of web archives, and Chun lays a useful theoretical foundation that could be useful for critiquing the very idea of software, and investigating its performative nature. Programmed Visions contains a set of historical case studies that it draws on as sites for understanding computing. She looks at early modes of computing involving human computers (ENIAC) which served as a prototype for what she calls “bureaucracies of computing” and the psychology of command and control that is built into the performance of computing. Other case studies involving the Memex, the Mother of All Demos, and John von Neumann’s use of biological models of memory as metaphors for computer memory in the EDVAC are described in great detail, and connected together in quite a compelling way. The book is grounded in history but often has a poetic quality that is difficult to summarize. On the meta level Chun’s use of historical texts is quite thorough and its a nice example of how research can be conducted in this area. There are two primary things I will take away from Programmed Visions. The first is how software, the very idea of source code, is itself achieved through metaphor, where computing is a metaphor for metaphor itself. Using higher level computer programming languages gives software the appearance of commanding the computer, however the source code is deeply entangled with the hardware itself, the source code is interpreted and compiled by yet more software, which are ultimately reduced to fluctuations in voltages circuitry. The source code and software cannot be extracted from this performance of computing. This separation of software from hardware is an illusion that was achieved in the early days of computing. Any analysis of software must include the computing infrastructures that make the metaphor possible. Chun chooses an interesting passage from Dijkstra (1970) to highlight the role that source code plays: In the remaining part of this section I shall restrict myself to programs written for a sequential machine and I shall explore some of the consequences of our duty to use our understanding of a program to make assertions about the ensuing computations. It is my (unproven) claim that the ease and reliability with which we can do this depends critically upon the simplicity of the relation between the two, in particular upon the nature of sequencing control. In vague terms we may state the desirability that the structure of the program text reflects the structure of the computation. Or, in other terms, “What can we do to shorten the conceptual gap between the static[...]

David Rosenthal: The Internet Society Takes On Digital Preservation

Mon, 15 Jan 2018 16:01:00 +0000

Another worthwhile initiative comes from The Internet Society, through its New York chapter. They are starting an effort to draw attention to the issues around digital presentation. Shuli Hallack has an introductory blog post entitled Preserving Our Future, One Bit at a Time. They kicked off with a meeting at Google's DC office labeled as being about "The Policy Perspective". It was keynoted by Vint Cerf with respondents Kate Zwaard and Michelle Wu. I watched the livestream. Overall, I thought that the speakers did a good job despite wandering a long way from policies, mostly in response to audience questions.

Vint will also keynote the next event, at Google's NYC office February 5th, 2017, 5:30PM – 7:30PM. It is labeled as being about "Business Models and Financial Motives" and, if that's what it ends up being about it should be very interesting and potentially useful. I hope to catch the livestream.

Ed Summers: Delete Notes

Mon, 15 Jan 2018 05:00:00 +0000

I recently finished reading Delete by Viktor Mayer-Schönberger and thought I would jot down some brief notes for my future self, since it is a significant piece of work for my interests in web archiving. If you are interested in memory and information and communication technologies (ICT) then this book is a must read. Mayer-Schönberger is a professor at the Oxford Internet Institute where he has focused on issues at the intersection of Internet studies and governance. Delete is a particularly good tonic for the widespread idea that electronic records are somehow fleeting, impermanent artifacts–a topic that Kirschenbaum (2008) explored so thoroughly a year earlier from a materialist perspective. Delete functions largely in two modes. The first (and primary) is to give an overview of how our ideas of permanence have shifted with the wide availability of computer storage and the Internet. The focus isn’t so much on these technologies themselves, but on the impact that storage capabilities and wide distribution has had on privacy and more generally on our ability to think. Mayer-Schönberger observes that for much of human history remembering has been difficult and has required concerted effort (think archival work here). The default was to forget information. But today’s information technologies allow the default to be set to remember, and it now requires effort to forget. Examining the potential impacts upon cognition and our ability to think are where this book shines brightest. If the default is set to remember how does this shape public discourse? How will ever present and unlimited storage with a surveillance culture work to cement self-censorship that will suppress free expression and identity formation? The book contends that ICT allow Bentham’s Panopticon to be extended not only in space, but also in time, recalling Orwell’s popular quote, as used by (Samuels, 1986): Who controls the past controls the future. Who controls the present controls the past. It is easy to see this mechanism at work in large social media companies like Google and Facebook, where efforts to delete our data can get interpreted instead as deactivate, which simply renders the content inaccessible to those outside the corporate walls. The data that these companies collect is core to their business, and extremely valuable. They often aren’t deleting it, even when we tell them to, and are incentivized not to. What’s more the parties that the data has been sold to, or otherwise shared with, probably aren’t deleting it either. While it’s true that computer storage has greatly enabled the storage of information it has also, at the same time, accelerated our ability to generate information. I feel like Mayer-Schönberger could have spent more time addressing this side of the equation. There are huge costs associated with remembering at the scales that Google, Facebook and other companies are working at. There are also large costs associated with preserving information for the long term (Rosenthal et al., 2012). Are large technology companies invested in saving data for the long term? Or are[...]

Terry Reese: MarcEdit Unicode Question [also posted on the listserv]

Fri, 12 Jan 2018 22:37:11 +0000

** This was posted on the listserv, but I’m putting this out there broadly ** ** Updated to include a video demonstrating how Normalization currently impacts users ** Video demonstrating the question at hand:   So, I have an odd unicode question and I’m looking for some feedback.  I had someone working with MarcEdit and looking for é.  This (and a few other characters) represent some special problems when doing replacements because they can be represented by multiple codepoints.  They can be represented as a letter + diacritic (like you’d find in MARC8) or they can be represented as a single code point. Here’s the rub.  In Windows 10 — if you do a find and replace using either type of normalization (.NET supports 4 major normalizations), the program will find the string, and replace the data.  The problem is that it replaces the data in the normalization that is presented — meaning, that, if in your file, you have data where your system provides multiple codepoints (the traditional standard with MARC21 — what is called the KD normalization) and you do a search where the replacement using a single code point, the replacement will replace the multiple code points with a single code point.  This is apparently, a Windows 10 behavior.  But I find this behaves differently on Mac system (and linux) — which is problematic and confusing. At the same time, most folks don’t realize that characters like é have multiple iterations, and MarcEdit can find them but won’t replace them unless they are ordinally equivalent (unless you do a case insensitive search).  So, the tool may tell you it’s found fields with this value, but that when the replacement happens, it reports replacements having been made, but no data is actually changed (because ordinally, they are *not* the same). So, I’ve been thinking about this.  There is something I could do.  In the preferences, I allow users to define which unicode normalization they want to use when converting data to Unicode.  This value only is used by the MarcEngine.  However, I could extend this to the editing functions.  Using this method, I could for data that comes through the search to conform to the desired normalization — but, you still would have times, again, where you are looking for data say that is normalized in Form C, you’ve told me you want all data in Form KD, and so again, é may not be found because again, ordinally they are not correct. The other option — and this seems like the least confusing, but it has other impacts, would be to modify the functions so that the tool tests the Find string and based on the data present, normalizes all data so that it matches that normalization.  This way, replacements would always happen appropriately.  Of course, this means that if your data started in KD notation, it may end up (would likely end up, if you enter these diacritics from a keyboard) in C notation.  I’m not sure what the impact would be for ILS systems, as they may expect one notation, and get another.  They should support all Unicode notations, but g[...]

District Dispatch: Tax season is here: How libraries can help communities prepare

Fri, 12 Jan 2018 14:52:04 +0000

This blog post, written by Lori Baux of the Computer & Communications Industry Association, is one in a series of occasional posts contributed by leaders from coalition partners and other public interest groups that ALA’s Washington Office works closely with. Whatever the policy – copyright, education, technology, to name just a few – we depend on relationships with other organizations to influence legislation, policy and regulatory issues of importance to the library field and the public. It’s hard to believe, but as the holiday season comes to an end, tax season is about to begin. For decades, public libraries have become unparalleled resources in their communities, far beyond their traditional, literary role. Libraries assist those who need it most by providing free Internet access, offering financial literacy classes, job training, employment assistance and more. And for decades, libraries have served as a critical resource during tax season. Each year, more and more Americans feel as though they lack the necessary resources to confidently and correctly file their taxes on time. This is particularly true for moderate and lower-income individuals and families who are forced to work multiple jobs just to make ends meet. The question is “where is help available?” Libraries across the country are stepping up their efforts to assist local taxpayers in filing their taxes for free. Many libraries offer in-person help, often serving as a Volunteer Income Tax Assistance (VITA) location or AARP Tax-Aide site. But appointments often fill up quickly, and many communities are without much, if any free in-person tax assistance. There is an option for free tax prep that libraries can provide—and with little required from already busy library staff. The next time that a local individual or family comes looking for a helping hand with tax preparation, libraries can guide them to a free online tax preparation resource—IRS Free File: Through the Free File Program, those who earned $66,000 or less last year—over 70 percent of all American taxpayers—are eligible to use at least one of 12 brand-name tax preparation software to file their Federal (and in many cases, state) taxes completely free of charge. More information is available at Free File starts on January 12, 2018. Free File complements local VITA programs, where people can get in-person help from IRS certified volunteers. There are over 12,000 VITA programs across the country to help people in your community maximize their refund and claim all the credits that they deserve, including the Earned Income Tax Credit (EITC). Any individual making under $54,000 annually may qualify. More information on VITAs is available at More information about AARP Tax-Aide can be found here. With help from libraries and volunteers across the nation, we can work together to ensure that as many taxpayers as possible have access to the resources and assistance that they need to file their returns. The Computer & Communications Industry Ass[...]

Open Knowledge Foundation: New edition of Data Journalism Handbook to explore journalistic interventions in the data society

Fri, 12 Jan 2018 09:48:49 +0000

This blog has been reposted from The first edition of The Data Journalism Handbook has been widely used and widely cited by students, practitioners and researchers alike, serving as both textbook and sourcebook for an emerging field. It has been translated into over 12 languages – including Arabic, Chinese, Czech, French, Georgian, Greek, Italian, Macedonian, Portuguese, Russian, Spanish and Ukrainian – and is used for teaching at many leading universities, as well as teaching and training centres around the world. A huge amount has happened in the field since the first edition in 2012. The Panama Papers project undertook an unprecedented international collaboration around a major database of leaked information about tax havens and offshore financial activity. Projects such as The Migrants Files, The Guardian’s The Counted and ProPublica’s Electionland have shown how journalists are not just using and presenting data, but also creating and assembling it themselves in order to improve data journalistic coverage of issues they are reporting on. The Migrants’ Files saw journalists in 15 countries work together to create a database of people who died in their attempt to reach or stay in Europe. Changes in digital technologies have enabled the development of formats for storytelling, interactivity and engagement with the assistance of drones, crowdsourcing tools, satellite data, social media data and bespoke software tools for data collection, analysis, visualisation and exploration. Data journalists are not simply using data as a source, they are also increasingly investigating, interrogating and intervening around the practices, platforms, algorithms and devices through which it is created, circulated and put to work in the world. They are creatively developing techniques and approaches which are adapted to very different kinds of social, cultural, economic, technological and political settings and challenges. Five years after its publication, we are developing a revised second edition, which will be published as an open access book with an innovative academic press. The new edition will be significantly overhauled to reflect these developments. It will complement the first edition with an examination of the current state of data journalism which is at once practical and reflective, profiling emerging practices and projects as well as their broader consequences. “The Infinite Campaign” by Sam Lavigne (New Inquiry) repurposes ad creation data in order to explore “the bizarre rubrics Twitter uses to render its users legible”. Contributors to the first edition include representatives from some of the world’s best-known newsrooms data journalism organisations, including the Australian Broadcasting Corporation, the BBC, the Chicago Tribune, Deutsche Welle, The Guardian, the Financial Times, Helsingin Sanomat, La Nacion, the New York Times, ProPublica, the Washington Post, the Texas Tribune, Verdens Gang, Wales O[...]

LITA: This is Jeopardy! Or, How Do People Actually Get On That Show?

Thu, 11 Jan 2018 20:55:25 +0000

This past November, American Libraries published a delightful article on librarians that have appeared on the iconic game show Jeopardy! It turns out one of our active LITA members also recently appeared on the show. Here’s her story… On Wednesday, October 18th, one of my lifelong dreams will come true: I’ll be a contestant on Jeopardy! It takes several steps to get onto the show: first, you must pass an online exam, but you don’t really learn the results unless you make it to the next stage: the invitation to audition. This step is completed in person, comprising a timed, written test, playing a mock game with other aspiring players in front of a few dozen other auditionees, and chatting amiably in a brief interview, all while being filmed. If you make it through this gauntlet, you go into “the pool”, where you remain eligible for a call to be on the show for up to 18 months. Over the course of one year of testing and eligibility, around 30,000 people take the first test, around 1500 to 1600 people audition in person, and around 400 make it onto the show each season. For me, the timeline was relatively quick. I tested online in October 2016, auditioned in January 2017, and thanks to my SoCal address, I ended up as a local alternate in February. Through luck of the draw, I was the leftover contestant that day. I didn’t tape then, but was asked back directly to the show for the August 3rd recording session, which airs from October 16th to October 20th. The call is early – 7:30am – and the day’s twelve potential contestants take turns with makeup artists while the production team covers paperwork, runs through those interview stories one-on-one, and pumps up the contestants to have a good time. Once you’re in, you’re sequestered. There’s no visiting with family or friends who accompanied you to the taping and no cellphones or internet access allowed. You do have time to chat with your fellow contestants, who are all whip smart, funny, and generally just as excited as you are to get to be on this show. There’s also no time to be nervous or worried: you roll through the briefing onto the stage for a quick run-down on how the podiums work (watch your elbows for the automated dividers that come up for Final Jeopardy!), how to buzz in properly (there’s a light around the big game board that you don’t see at home that tells you when you can ring in safely), and under no circumstances are you to write on the screen with ANYTHING but that stylus! Next, it’s time for your Hometown Howdy, the commercial blurb that airs on the local TV station for your home media market. Since I’d done it before when I almost-but-not-quite made it on the air in February, I knew they were looking for maximum cheese. My friends and family tell me that I definitely delivered. Immediately before they let in the live studio audience for seating, contestants run through two quick dress rehearsal games to get out any final nerves, test the equipment for the stage crew, and pr[...]

Islandora: Islandora Camp - Call for Proposals

Thu, 11 Jan 2018 18:42:52 +0000

Doing something great with Islandora that you want to share with the community? Have a recent project that the world just needs to know about? Send us your proposals to present at Islandora Camp! Presentations should be roughly 20-25 minutes in length (with time after for questions) and deal with Islandora in some way. Want more time or to do a different format? let us know in your proposal and we'll see what we can do. You can see examples of previous Islandora camp sessions on our YouTube channel. The Call for Proposals for iCampEU in Limerick will be open until March 1st.Type: blog Name * Tell us your name. Institution Tell us where you're joining us from. Email Address * Tell us how to contact you. Session Title * Tell us what you want to call your proposal. You can change this later. Session Details * Tell us about what you want to present. Brief Summary Please give a brief summary that can be printed in the camp schedule if your proposal is accepted. [...]

Islandora: Islandora Camp EU 2018 - Registration

Thu, 11 Jan 2018 18:41:02 +0000

Islandora Camp is heading to Ireland June 20 - 22, 2018, hosted by the University of Limerick. Early Bird rates are available until March 1st, 2018, after which the rate will increase to €399,00. 360,00 €Attendee Information:  Registrant Name Please enter the full name of the person who will attend the event. Email Please provide the email address of the person attending so we can send notices and updates. We promise we'll keep them to a minimum! Institution Track * Admin Developer Please select the curriculum you wish to join. Admin: For repository and collection managers, librarians, archivists, and anyone else who deals primarily with the front-end experience of Islandora and would like to learn how to get the most out of it, or developers who would like to learn more abut the front-end experience. Developer: For developers, systems people, and anyone dealing with Islandora at the code-level, or any front-end Islandora users who are interested in learning more about the developer side. Tee Shirt Size N/A Small Medium Large X-Large XX-Large 3X-Large 4X-Large 5X-Large Islandora Camp comes with a t-shirt. What size is preferred? Special Considerations Please let us know about any dietary restrictions or other special considerations that may need to be accommodated. Share contact info? N/A Share my info Opt out We would like to share your name and email address with your fellow attendees (and ONLY them) before the event so you can see who else is going. If you would rather we not include your info, please opt out. Learning Goals What do you want to learn at this camp? Be as general or specific as possible - if you have particular questions or problems you're tackling, or topics you'd like to learn about, please put them here. [...]

David Rosenthal: It Isn't About The Technology

Thu, 11 Jan 2018 16:10:40 +0000

A year and a half ago I attended Brewster Kahle's Decentralized Web Summit and wrote:I am working on a post about my reactions to the first two days (I couldn't attend the third) but it requires a good deal of thought, so it'll take a while.As I recall, I came away from the Summit frustrated. I posted the TL;DR version of the reason half a year ago in Why Is The Web "Centralized"? :What is the centralization that decentralized Web advocates are reacting against? Clearly, it is the domination of the Web by the FANG (Facebook, Amazon, Netflix, Google) and a few other large companies such as the cable oligopoly.These companies came to dominate the Web for economic not technological reasons.Yet the decentralized Web advocates persist in believing that the answer is new technologies, which suffer from the same economic problems as the existing decentralized technologies underlying the "centralized" Web we have. A decentralized technology infrastructure is necessary for a decentralized Web but it isn't sufficient. Absent an understanding of how the rest of the solution is going to work, designing the infrastructure is an academic exercise.It is finally time for the long-delayed long-form post. I should first reiterate that I'm greatly in favor of the idea of a decentralized Web based on decentralized storage. It would be a much better world if it happened. I'm happy to dream along with my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?. He describes a potential future decentralized system of scholarly communication built on existing Web protocols. But even he prefaces the dream with a caveat that the future he describes "will most likely never exist".I agree with Herbert about the desirability of his vision, but I also agree that it is unlikely. Below the fold I summarize Herbert's vision, then go through a long explanation of why I think he's right about the low likelihood of its coming into existence.Herbert identifies three classes of decentralized Web technology and explains that he decided not to deal with these two:Distributed file systems. Herbert is right about this. Internet-scale distributed file systems were first prototyped in the late 90s with Intermemory and Oceanstore, and many successors have followed in their footsteps. None have achieved sustainability or Internet platform scale. The reasons are many, the economic one of which I wrote about in Is Distributed Storage Sustainable? Betteridge's Law applies, so the answer is "no".Blockchains. Herbert is right about this too. Even the blockchain pioneers have to admit that, in the real world, blockchains have failed to deliver any of their promised advantages over centralized systems. In particular, as we see with Bitcoin, maintaining decentralization against economies of scale is a fundamental, unsolved problem:Trying by technical means to[...]

District Dispatch: ALA to Congress in 2018: Continue to #FundLibraries

Thu, 11 Jan 2018 15:10:48 +0000

2017 was an extraordinary year for America’s libraries. When faced with serious threats to federal library funding, ALA members and library advocates rallied in unprecedented numbers to voice their support for libraries at strategic points throughout the year*. Tens of thousands of phone calls and emails to Congress were registered through ALA’s legislative action center. ALA members visited Congress in Washington and back home to demonstrate the importance of federal funding. The challenge to #FundLibraries in 2018 is great: not only is Congress late in passing an FY 2018 budget, it’s time to start working on the FY 2019 budget. ALA members have a lot to be proud of. Thanks to library advocates, Congress did not follow the administration’s lead in March 2017, when the president made a bold move to eliminate the Institute of Museum and Library Services (IMLS) and virtually all federal library funding. In every single state and congressional district, ALA members spoke up in support for federal library funding. We reminded our senators and representatives how indispensable libraries are for the communities they represent. And our elected leaders listened. By the time FY 2018 officially began in October 2017, the Appropriations Committees from both chambers of Congress had passed bills that maintained (and in the Senate, increased by $4 million) funding for libraries. Despite our strong advocacy, we have not saved library funding for FY 2018. We’re more than three months into the fiscal year, and the U.S. government still does not have an FY 2018 budget. Because the House and Senate have not reconciled their FY 2018 spending bills, the government is operating under a “continuing resolution” (CR) of the FY 2017 budget. What happens when that CR expires on January 19, 2018 is a matter of intense speculation; options include a bi-partisan budget deal, another CR or a possible government shutdown. While government may seem to be paralyzed, this is no time for library advocates to take a break. The challenge in 2018 is even greater than 2017: not only is Congress late in passing an FY 2018 budget, it’s time to start working on the FY 2019 budget. The president is expected to release his FY 2019 budget proposal in February, and we have no reason to believe that libraries have moved up on the list of priorities for the administration. 2018 is a time for all of us to take our advocacy up a notch. Over the coming weeks, ALA’s Washington Office will roll out resources to help you tell your library story and urge your members of Congress to #FundLibraries. In the meantime, here’s what you can do: Stay informed. The U.S. budget and appropriations process is more dynamic than ever this year. There is a strong chance that we will be advocating for library funding for FY 2018 and FY 2019 at the same time. Regularly visit, the Washington Office b[...]

Open Knowledge Foundation: 2017: A Year to Remember for OK Nepal

Thu, 11 Jan 2018 09:24:35 +0000

This blog has been cross-posted from the OK Nepal blog as part of our blog series of Open Knowledge Network updates. Best wishes for 2018 from OK Nepal to all of the Open Knowledge family and friends!! The year 2017 was one of the best years for Open Knowledge Nepal. We started our journey by registering Open Knowledge Nepal as a non-profit organization under the Nepal Government and as we start to reflect 2017, it has been “A Year to Remember”. We were able to achieve many things and we promise to continue our hard work to improve the State of Open Data in South Asia in 2018 also. Some of the key highlights of 2017 are: Organizing Open Data Day 2017 For the 5th time in a row, the Open Knowledge Nepal team led the effort of organizing International Open Data Day at Pokhara, Nepal. This year it was a collaborative effort of Kathmandu Living Labs and Open Knowledge Nepal. It was also the first official event of Open Knowledge Nepal that was held out of the Kathmandu Valley.   Launching Election Nepal Portal   On 13th April 2017 (31st Chaitra 2073), a day before Nepalese New Year 2074, we officially released the  Election Nepal Portal in collaboration with Code for Nepal and made it open for contribution. Election Nepal is a crowdsourced citizen engagement portal that includes the Local Elections data. The portal will have three major focus areas; visualizations, datasets, and twitter feeds. Contributing to Global Open Data Index   On May 2nd, 2017 Open Knowledge International launched the 4th edition of Global Open Data Index (GODI), a global assessment of open government data publication. Nepal has been part of this global assessment continuously for four years with lots of ups and downs. We have been leading it since the very beginning. With 20% of openness, Nepal was ranked 69 in 2016 Global Open Data Index. Also, this year we helped Open Knowledge International by coordinating for South Asia region and for the first time, we were able to get contributions from Bhutan and Afghanistan. Launching Local Boundaries    To help journalists and researchers visualize the geographical data of Nepal in a map, we build Local Boundaries where we share the shapefile of Nepal federal structure and others. Local Boundaries brings the detailed geodata of administrative units or maps of all administrative boundaries defined by Nepal Government in an open and reusable format, free of cost. The local boundaries are available in two formats (TopoJSON and GeoJSON) and can be easily reused to map local authority data to OpenStreetMap, Google Map, Leaflet or MapBox interactively. Launching Open Data Handbook Nepali Version   After the work of a year followed by a series of discussion and consultation, on 7 August 2017 Open Knowledge Nepal launched the first version of Nepali Open Data Handbook – An introductory guidebook used by [...]

Terry Reese: MarcEdit Updates (All versions)

Thu, 11 Jan 2018 05:35:34 +0000

I’ve posted updates for all versions of MarcEdit, including MarcEdit MacOS 3.

MarcEdit 7 (Windows/Linux) changelog:
  • Bug Fix: Export Settings: Export was capturing both MarcEdit 6.x and MarcEdit 7.x data.
  • Enhancement: Task Management: added some continued refinements to improve speed and processing
  • Bug Fix: OCLC Integration: Corrected an issue occuring when trying to post bib records using previous profiles.
  • Enhancement: Linked Data XML Rules File Editor completed
  • Enhancement: Linked Data Framework: Formal support for local linked data triple stores for resolution

One of the largest enhancements is the updated editor to the Linked Data Rules File and the Linked Data Framework. You can hear more about these updates here:

MarcEdit MacOS 3:

Today also marks the availability of MarcEdit MacOS 3. You can read about the update here: MarcEdit MacOS 3 has Arrived!

If you have questions, please let me know.


Terry Reese: MarcEdit MacOS 3 has Arrived!

Thu, 11 Jan 2018 05:01:22 +0000

MarcEdit MacOS 3 is the latest branch of the MarcEdit 7 family. MarcEdit MacOS 3 represents the next generational update for MarcEdit on the Mac and is functionally equivalent to MarcEdit 7. MarcEdit MacOS 3 introduces the following features: Startup Wizard Clustering Tools New Linked Data Framework New Task Management and Task Processing Task Broker OCLC Integration with OCLC Profiles OCLC Integration and search in the MarcEditor New Global Editing Tools Updated UI More   There are also a couple things that are currently missing that I’ll be filling in over the next couple of weeks. Presently, the following elements are missing in the MacOS version: OCLC Downloader OCLC Bib Uploader (local and non-local) OCLC Holdings update (update for profiles) Task Processing Updates Need to update Editor Functions Dedup tool – Add/Delete Function Move tool — Copy Field Function RDA Helper — 040 $b language Edit Shortcuts — generate paired ISBN-13 Replace Function — Exact word match Extract/Delete Selected Records — Exact word match Connect the search dropdown Add to the MARC Tools Window Add to the MarcEditor Window Connect to the Main Window Update Configuration information XML Profiler Linked Data File Editor Startup Wizard Rather than hold the update till these elements are completed, I’m making the MarcEdit MacOS version available now so that users can be testing and interacting with the tooling, and I’ll finish adding these remaining elements to the application. Once completed, all versions of MarcEdit will share the same functionality, save for elements that rely on technology or practices tied to a specific operating system. Updated UI The MarcEdit MacOS 3 introduces a new UI. While the UI is still reflective of MacOS best practices, it also shares many of the design elements developed as part of MarcEdit 7. This includes new elements like the StartUp wizard with Fluffy Install agent:   The Setup Wizard provides users the ability to customize various application settings, as well as import previous settings from earlier versions of MarcEdit.   Updates to the UI New Clustering tools MarcEdit MacOS 3 provides MacOS users more tools, more help, more speed…it gives you more, so you can do more. Downloading: Download the latest version of MarcEdit MacOS 3 from the downloads page at: -tr[...]

Library of Congress: The Signal: Digital Scholarship Resource Guide: Making Digital Resources, Part 2 of 7

Wed, 10 Jan 2018 22:25:47 +0000

This is part two in a seven part resource guide for digital scholarship by Samantha Herron, our 2017 Junior Fellow. Part one is available here, and the full guide is available as a PDF download.  Creating Digital Documents Internet Archive staff members such as Fran Akers, above, scan books from the Library’s General Collections that were printed before 1923.  The high-resolution digital books are made available online at­ within 72 hours of scanning.  The first step in creating an electronic copy of an analog (non-digital) document is usually scanning it to create a digitized image (for example, a .pdf or a .jpg). Scanning a document is like taking an electronic photograph of it–now it’s in a file format that can be saved to a computer, uploaded to the Internet, or shared in an e-mail. In some cases, such as when you are digitizing a film photograph, a high-quality digital image is all you need. But in the case of textual documents, a digital image is often insufficient, or at least inconvenient. In this stage, we only have an image of the text; the text isn’t yet in a format that can be searched or manipulated by the computer (think: trying to copy & paste text from a picture you took on your camera–it’s not possible). Optical Character Recognition (OCR) is an automated process that extracts text from a digital image of a document to make it readable by a computer. The computer scans through an image of text, attempts to identify the characters (letters, numbers, symbols), and stores them as a separate “layer” of text on the image. Example Here is a digitized copy of Alice in Wonderland in the Internet Archive. Notice that though this ebook is made up of scanned images of a physical copy, you can search the full text contents in the search bar. The OCRed text is “under” this image, and can be accessed if you select “FULL TEXT” from the Download Options menu. Notice that you can also download a .pdf, .epub, or many other formats of the digitized book. Though the success of OCR depends on the quality of the software and the quality of the photograph–even sophisticated OCR has trouble navigating images with stray ink blots or faded type–these programs are what allow digital archives users to not only search through catalog metadata, but through the full contents of scanned newspapers (as in Chronicling America) and books (as in most digitized books available from libraries and archives). ABBYY FineReader, an OCR software. As noted, the automated OCR text often needs to be “cleaned” by a human reader. Especially with older, typeset texts that have faded or mildewed or are otherwise irregular, the software may mistake characters or character combinations for others (e.g. the computer might take “rn” to b[...]

LITA: Jobs in Information Technology: January 10, 2018

Wed, 10 Jan 2018 20:09:16 +0000

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

University of Arkansas, Assistant Head of Special Collections, Fayetteville, AR

West Chester University, Electronic Resources Librarian, West Chester, PA

Miami University Libraries, Web Services Librarian, Oxford, OH

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Peter Murray: Anxious Anger – or: why does my profession want to become a closed club

Wed, 10 Jan 2018 18:38:00 +0000

I’m in the Austin, Texas, airport – having just left the closing session of the Re-Think It conference – and I’m wondering what the heck is happening to my chosen profession. When did we turn into an exclusive member’s only club with unrealistic demands on professionalism and a secret handshake? The closing keynote featured current president of the American Library Association (ALA) Jim Neal and past president Julie Todaro on the topic Library Leadership in a Period of Transformation. The pair were to address questions like “What trends are provoking new thinking about the 21st century library?” and “Do 20th century visions and skills still matter?” I expected to be uplifted and inspired. Instead, I came away feeling anxious and angry about their view of the library profession and the premier library association, ALA. To start with a bit of imposter syndrome exposure: I’ve been working in and around libraries for 25 years, but I don’t follow the internal workings and the politics of the principal librarian professional organization(s) in the United States. I read about the profession — enough to know that primary school librarians are under constant threat of elimination in many school districts and that usage of public libraries, particularly public libraries that are taking an expansive view of their role in the community, is through the roof. I hear the grumbles about how library schools are not preparing graduates of masters programs for “real world” librarianship, but in my own personal experience, I am indebted to the faculty at Simmons College for the education I received there. The pay inequity sucks. The appointment of a professional African American librarian to head the Library of Congress is to be celebrated, and the general lack of diversity in the professional ranks is a point to be worked on. My impression of ALA is of an unnecessarily large and bureaucratic organization with some seriously important bright spots (the ALA Washington Office for example), and that the governance of ALA is plodding and cliquish, but for which some close colleagues find professional satisfaction for their extra energies. I’m pretty much hands off ALA, particularly in the last 15 years, and view it (in the words of Douglas Adams) as Mostly Harmless. So anxious and angry are unexpected feelings for this closing keynote. I don’t think there is a recording of Jim’s and Julie’s remarks, so here in the airport, the only thing I have to go on are my notes. I started taking notes at the beginning of their talks expecting there would be uplifting ideas and quotes that I could attribute to them as I talk with others about the aspirations of the FOLIO project (a crucial part[...]