Home » Articles posted by Maura A. Smale

Author Archives: Maura A. Smale

Computers and Crowds: Unexpected Authors and Their Impact on Scholarly Research

On Friday, May 17, nearly 50 librarians from CUNY and other New York City libraries gathered at the CUNY Graduate School of Journalism to participate in a program about new models for content production. This exciting program was jointly organized by the LACUNY Emerging Technologies Committee, the LACUNY Scholarly Communications Roundtable, LILAC, and the Office of Library Services.

The morning began with a lively presentation from Kate Peterson, Information Literacy Librarian at the University of Minnesota-Twin Cities, and Paul Zenke, DesignLab/Digital Humanities Initiative Project Assistant at the University of Wisconsin-Madison. In their presentation, titled “Hats, Farms, and Bubbles: How Emerging Marketing & Content Production Models are Making Research More Difficult (And What You and Your Students Can Do About It),” Kate and Paul discussed five initiatives that currently affect content creation and propagation on the internet: search engine optimization (SEO), filter bubbles, content farms, algorithm-created content, and crowdsourcing (see their slides from this talk in the Program Materials section below).

The session began with an active poll in which attendees were asked to walk to labeled parts of the room to show the audience’s familiarity with each of the five concepts. With a range of prior knowledge among attendees, you could see through this activity that everyone had something to learn from the presentation.

The first item that was discussed was SEO: techniques used to increase the visibility of a website to search engines. Paul noted that while all website owners want their sites to be found, practitioners of “black hat” SEO typically use content spam (hiding or manipulating text) or link spam (increasing the number of links to a website) to try and trick search engine ranking algorithms into ranking their sites highly. Some search engines have tried to mitigate the effects of SEO: in 2012 Google launched Penguin which provides guidelines for webmasters and applies a penalty to websites that violate the guidelines.

Next Kate explained the concept of a filter bubble, a term that describes the potential for different search engine results when two identical searches are performed on two different computers (remember those Google ads that highlighted personalized searching for a beetle – the bug vs. the car?). The term filter bubble was coined by Eli Pariser in his book of the same name; we watched a brief clip of Pariser’s TED talk in which he explained the dangers of filter bubbles. When search engine algorithms increasingly tailor search engine results to our interests – which they equate with whatever content we click on while web surfing – we aren’t seeing the full range of information available on the internet. Facebook uses similar techniques to display content based on our friends’ interests. By creating these filter bubbles, internet corporations are restricting the opportunities for us to encounter information that may be new or challenging to us, or present a different point of view from our own.

Most academic librarians are familiar with content farms: websites that pay very low wages to freelancers to write large volumes of low quality articles, sites like About.com, Ehow.com, and others. Often the article topics are drawn from algorithmic analysis of search data that suggests titles and keywords that are most profitable for advertisers – unlike journalism, this model of content creation starts with consumer demand. Paul noted that Google has also come out with a strategy to attempt to stem the tide of low quality content from content farm websites; in 2011 it debuted Google Panda and downgraded 11% of content it indexed that year. While it’s useful to us, as librarians, when Google addresses the content farm problem, it’s also somewhat troubling to realize that Google is developing algorithms for evaluating information sources.

Perhaps one of the most surprising topics discussed was content created by machine, or algorithm-generated content. Algorithms have already been implemented to synthesize large data sets into an accessible narrative. They are popular in areas like sports writing or business news where there is an emphasis on statistics and identifying trends or patterns. But algorithms are also already being used to write content such as restaurant reviews or haikus. These algorithms can even be programmed to generate a certain tone within the article, or make different types of articles for different situations using the same data. Other ways they have been used in academic settings might be to give students feedback on their preparation for tests like the SAT or ACT. One point of discussion during the event was the labor issues with algorithms (or lack thereof) — the incentive to use algorithms to create content eliminates the need to pay any person (no author is paid even just a small amount, as with content farms, because essentially there is no author). A question from the audience brought up the dying art of fact checking in journalism today. Kate pointed out, interestingly, that although these articles are not written by a person, they need very little fact-checking, since they rely so heavily upon the direct import of factual data.

Crowdsourcing was also discussed as an emerging way content is created or supported through the work of the masses. Paul briefly discussed content created through crowdsourcing such as is done on web sites like Carnegie Mellon’s Eterna and the site Foldit where contributors play a game involving protein folding. He also  focused on crowdsourcing for fundraising using web sites like Kickstarter and (Indiegogo. There are implications for what people decide to fund and not to fund. What does this mean especially in these times of federal austerity?

During a following breakout session the crowdsourcing topic was explored further. Examples of user supplied content included Wikipedia. Wikipedia Editathons such as the one held at NYPL to increase access to the NYPL theater and performing arts collection were noted. MOOCs became a part of the discussion on crowdsourcing, where examples of student solutions to problems have been integrated as illustration in a course. Readersourcing.org, though it was never launched, was an attempt to crowdsource peer review. There was also an extended discussion about crowdsourcing as a news gathering technique. Twitter has surfaced as a way to gather information about events as they happen. Of concern for librarians, always interested in the accuracy of information, is whether or not information gathered through Twitter can be trusted. Additionally, Daren C. Brabham’s research on the ethics of using the crowd as a source of free labor was also discussed. According to Brabham, a myth is perpetuated about the amateur nature of crowd contributors, when in reality many who contribute as anything from citizen scientist to citizen graphic designer are often professionals who deserve compensation.

Kate and Paul ended by suggesting strategies that we can use to mitigate potentially negative effects of new content production, both for us — as librarians and as internet users — and for the students and faculty with whom we work. And indeed, academic librarians are well-placed to implement these recommendations as we work to educate ourselves and our patrons. We must continue to teach students to evaluate their sources, and perhaps expand to evaluating the possible filters they experience as well. Looking for opportunities to create more chance encounters with information could help burst those bubbles. Many of us already clear our web browser history and cookies regularly; can we also demand more transparency from our vendors about the information they collect from users? Finally, Kate and Paul challenged us to think about ways that we can put students into the role of creator — rather than simply consumer —  to raise their awareness about these issues surrounding content production and increase their data literacy and information literacy.

After Kate and Paul’s presentation, participants broke up into three discussion groups: content farms (led by Paul), algorithms (led by Kate) and crowdsourcing (led by Prof. Beth Evans). Participants explored the implications of each of these topics for work in the library, and also discussed other issues surrounding research and the internet.

Awareness of all of these issues might help to insure that librarians and researchers (and the students we teach at the reference desk and in the classroom) don’t get stuck in the filter bubble, surrounded by thin information that was written by bots!

— by Maura Smale (City Tech), Alycia Sellie (Brooklyn College), and Beth Evans (Brooklyn College)

Program Materials:

Hats, Farms, and Bubbles slides

Videos shown during the presentation:

Epipheo. (2013). Why the News Isn’t Really the News. Youtube. http://www.youtube.com/watch?v=YoZNJsp3Kik

ExLibrisLtd. (2011). Primo ScholarRank plain and simple. YouTube. http://www.youtube.com/watch?v=YDly9qPpPYQ

Ted. (2011). Eli Pariser: Beware online “filter bubbles.” http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html

Additional materials mentioned during the presentation:

On The Media. (2013). Ads vs. Ad-Blockers. http://www.onthemedia.org/2013/may/10/ads-vs-ad-blockers

  • In response to the question about how modifying your web browser through extensions like ad blockers can have unintended consequences like hurting independent publishers.

This American Life. (2012). Forgive us our press passes. 468: Switcheroo. http://www.thisamericanlife.org/radio-archives/episode/468/switcheroo?act=2

  • Although we didn’t mention Journatic.com during our presentation, it’s another version of a content farm but instead of using SEO techniques to attract web traffic from a general audience, Journatic.com works with newspapers to outsource hyper-local articles to writers abroad who often publish under fake bylines.

Recommended Readings:

1) Content Farms

NOTE: Notice the use of SEO in the web address; the article is NOT about ESPN.

2) Algorithm-written Content

3) Crowdsourcing

Crowdsourcing Site Screenshots, by Beth Evans (Brooklyn): http://www.slideshare.net/myspacelibrarian/crowdsourcing-site-screenshots

Unexpected Authors and Their Impact on Scholarly Research

The LACUNY Emerging Technologies Committee, the LACUNY Scholarly Communications Roundtable, LILAC, and the Office of Library Services are delighted to announce our Spring program:

Computers and Crowds:
Unexpected Authors and Their Impact on Scholarly Research

Friday, May 17th; 9:30am – 12:30pm
Graduate School of Journalism, Room 308

Register online.

Please join us for an exciting half-day session that begins with an introduction to new content production models and ends with a moderated breakout discussions of specific topics in the field.

Part 1:
Hats, farms, and bubbles: How emerging marketing & content production models are making research more difficult (and what you and your students can do about it)

Google, and other search engines, have made tremendous progress organizing the world’s knowledge. However, accessing that knowledge is becoming increasingly difficult because of emerging marketing and content production models utilized by high-ranking sites like eHow.com and ExpertVillage.com. Search Engine Optimization (SEO), “content farms” and Google’s increasingly personalized search algorithms are making search engines less effective as academic research tools. Therefore students are exposed to more shallow, low quality results than ever before. In this session, learn more about the technologies behind these emerging marketing and content production models. Learn strategies faculty, students, and librarians can use to respond to new information environment.

Kate Peterson
Information Literacy Librarian, University of Minnesota-Twin Cities

Paul Zenke
DesignLab/Digital Humanities Initiative Project Assistant, University of Wisconsin-Madison

Part 2:
Three concurrent breakout conversations on content farms, algorithm-written content, and crowd sourcing. Recommended readings will be made available in advance on the Academic Commons.

Refreshments will be served!

Evaluating Strategies for Evaluating Sources

Many faculty members in the library and beyond strive to help students learn to evaluate the information sources they use, whether in print, or on websites, or presented as images, audio, or video. Evaluating sources is a core competency of information literacy, and is highlighted by the Association of College and Research Libraries in ACRL Information Literacy Standard 3:

The information literate student evaluates information and its sources critically and incorporates selected information into his or her knowledge base and value system.

I’ll be honest: Standard 3 has always been my favorite of the ACRL standards, and I spend lots of my instructional brainstorming time on ways to incorporate more discussion of evaluating sources into my teaching. One of the first things I read on the topic when I first became a librarian was Marc Meola’s article in portal: Libraries and the Academy called Chucking the Checklist. Meola suggests that librarians stop using checklists of evaluation criteria — often accuracy, expertise, currency, relevance, etc. — to teach students to evaluate websites. Instead, we can approach instruction on evaluating sources as an opportunity to discuss the library’s vetted resources like article databases, and to use comparison and corroboration to contrast websites and library resources.

I enjoy Meola’s article and agree that the checklist approach is simplistic, however, there’s often not enough time in our instructional sessions with students to delve as deeply into a discussion of the differences between information sources as Meola suggests. So I confess that I do use checklists, though I try to contextualize and discuss the criteria with students, either individually or as a group, while they search. I also like to frame this as source interrogation: what questions can students ask about the source, and what do the answers tell us?

At City Tech we started out (following the lead of many other academic libraries) by using a set of questions to ask about sources created by the Merriam Library at California State University, Chico. This list of questions is called the CRAAP Test — guaranteed to get a giggle out of even the sleepiest class — which stands for currency, relevance, authority, accuracy, and purpose. Each criteria includes several questions to ask about the source. It’s a long and thorough list, and it’s deservedly popular in academic libraries.

Last week I followed a Twitter link that led me to another set of criteria for evaluating sources, this one called the SMELL Test. This guide from PBS.org’s MediaShift website urges readers to consider the source, motivation, evidence, logic, and what was left out of the information they read about online. Since it’s presented in article form it’s not exactly a checklist per se, but I think the SMELL test would make an interesting article for students to read and discuss.

Finally, I thoroughly enjoyed this 13-minute TED Talk from journalist Markham Nolan on How to Separate Fact and Fiction Online. In it, Nolan details the tools and strategies that journalists use to check sources and verify information in images and video even as a news story is developing. For example, he discusses how photos of Hurricane Sandy were fact-checked. I think students often forget that they should evaluate their image, audio, and video sources as well as text-based sources, and I think this video can help us make that case.

Do you have strategies or materials that you use to help students learn to think critically about information sources? Share them in the comments if so!

RefWorks webinar for CUNY

RefWorks, the citation management software, recorded a webinar for Hunter on using the application. Since it’s tailored to a CUNY audience we thought it’d be great to share the links here.

Here’s the streaming recording.

And here’s the downloadable recording.


CUNYwide Research Management Workshop

This Thursday May 5th, all CUNY library faculty, faculty, and staff are invited to attend a day of workshops at Hunter College on the new Web of Knowledge, EndNote, and other research management tools from Thomson Reuters.  (more…)

Information Literacy Resources on the Wiki

The Library Information Literacy Advisory Committee is pleased to share resources from across CUNY for teaching information literacy. We’ve compiled these together on a page on the Commons wiki that’s inventively titled Information Literacy Resources @ CUNY. Look here for great tutorials and guides that anyone can use.

Here’s a direct link to the page:


Because this list is on the wiki, any Commons member can edit it, so please feel free to add your favorite IL resources, too! And if there are IL resources from other colleges and universities that you admire, let us know in the comments below. Thanks!