<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>iNode</title>
	<atom:link href="http://inodeblog.com/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>https://inodeblog.com</link>
	<description>Digital Programs and Systems</description>
	<lastBuildDate>Tue, 08 Jun 2021 17:28:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.4.3</generator>

<image>
	<url>https://inodeblog.com/wp-content/uploads/2017/03/cropped-cropped-funwick4-2-1-32x32.png</url>
	<title>iNode</title>
	<link>https://inodeblog.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>We Need To Talk about OER Discovery</title>
		<link>https://inodeblog.com/?p=1918</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Tue, 08 Jun 2021 13:50:37 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1918</guid>

					<description><![CDATA[Last November I was part of an Open Education Conference 2020 panel entitled &#8220;We Need To Talk about OER Discovery.&#8221; Six questions focused the discussion, with each panel member contributing... <a href="https://inodeblog.com/?p=1918">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[
<p>Last November I was part of an Open Education Conference 2020 panel entitled &#8220;We Need To Talk about OER Discovery.&#8221;    Six questions focused the discussion, with each panel member contributing their thoughts.  Here are my responses: </p>



<h2 class="wp-block-heading"><strong>How would you describe the current state of OER discovery?</strong></h2>



<p>Let me start by saying that six or seven years ago, surveys routinely showed that simply&nbsp;<strong>“finding”</strong>&nbsp;OER content was the most significant barrier to adoption. For several years after you could count on seeing some mention of&nbsp;<strong>“difficulty finding OERs”</strong>&nbsp;in articles and reports.<br><br>I’m happy to say that in recent years–assuming you stay away from commercial publisher’s brochures–you rarely see issues around OER discovery getting those 30 point headlines.<br><br>But what I do still see, three years after we launched our <a href="https://mom.gmu.edu/" target="_blank" rel="noreferrer noopener">Mason OER Metafinder</a> is that while it’s true we don’t talk so much about&nbsp;<strong><em>simple</em></strong>&nbsp;discovery anymore…it seems we’re all thinking more than ever about the need for&nbsp;<strong><em>more efficient</em></strong>&nbsp;discovery.&nbsp;<br><br>What do I mean by&nbsp;<strong><em>more efficient</em></strong>?</p>



<ul><li>higher signal to noise ratio in retrieval</li><li>less duplication of content in our search results</li><li>unambiguous usage rights for every item retrieved</li><li>and way to quickly assess the pedagogical “fit”</li></ul>



<p>If we had these last two bits of information–usage rights and pedagogical fit–we could easily slice and dice our result sets via facets.</p>



<p>If we ignore for a moment the traditional discovery solution–wherein the searcher dives in and out of various silos looking for appropriate material–there are two approaches to solving the discovery problem across multiple content providers.&nbsp;</p>



<ul><li>use a <strong><em>just-in-case</em></strong> system like MERLOT or SUNY’s OASIS. The “just-in-case” tag comes from supply chain management…you store a lot of inventory just in case someone needs it. Here, the inventory is the metadata that’s harvested from various OER content providers. That metadata is normalized to some degree…then indexed…and it’s that index that you’re searching). Results are limited to items the search system knows about.</li><li>or use a<strong><em> just-in-time</em></strong> system like the Mason OER Metafinder. Ours is a “just in time” system because, again drawing an analogy to supply chain management, we maintain no stock but rely on prompt delivery from our suppliers: OER content silos. Instead of searching a pre-built index, when you submit your query the metafinder launches <strong>up to</strong> 21 real-time parallel searches across each of the <strong>up to</strong> 21 sites that you have asked to search. It then collects, dedupes and ranks the <strong>top 100</strong> results from each of these sites, combining all into a single, faceted results set.</li></ul>



<p>For now, I’ll conclude by pointing out that each of these approaches has advantages and disadvantages and as you might expect, each poses dramatically different maintenance requirements. To help bridge the gap between these approaches, we include metadata aggregations like MERLOT and OASIS along with content providers as search targets in our Metafinder.&nbsp;</p>



<h2 class="wp-block-heading"><strong>What are the main challenges/specific needs can you identify at this time?</strong></h2>



<p>We have several interesting issues in the OER content world that complicate discovery.&nbsp;</p>



<ul><li>First, there’s very little standardization of metadata beyond Author and Title and publication date. And across repositories, even those simple and seemingly straightforward metadata elements tend to drift a bit.</li><li>Then, there’s <em>willful</em> duplication of content across repositories. That redundancy is useful I suppose in a world where repository sustainability is always a concern…but once you open up cross-repository searching, it poses complications. For example, looking at results in the OER Metafinder, I’ve noticed that sometimes the same content is in two or more two or more repositories but there will be a slight variations in the author/title metadata on each site. Hard to teach a machine to unravel that duplication or how to select the most appropriate copy.</li></ul>



<h2 class="wp-block-heading"><strong>What approach(es) do you think would best address these needs?</strong></h2>



<p>So from my vantage point–which is trying to offer a search engine that increases search efficiency – the key to fixing many of these issues is <strong><em>standardizing on a particular metadata schema for OER content</em></strong>…and then devoting time to enriching that descriptive metadata.</p>



<p>If I could just issue a decree, it would be that the community settle on a metadata schema that suits at least the basic needs of all interested parties. By that I mean let’s not follow our natural librarian impulse to over-engineer the solution before we deploy it but let’s focus on figuring out the minimum that improves on the current state of affairs but also offers an extensible design that can evolve and improve as we work with it. That simplicity will also speed adoption of the schema.</p>



<p>Then I’d give preference to those repositories and content providers that utilize the schema.&nbsp;</p>



<h2 class="wp-block-heading"><strong>What, if any, success stories do you know of?</strong></h2>



<p>I think one success I’ve noticed is the growing worldwide reach of OERs and the inter-connectedness of the world when it comes to OER discovery.</p>



<p>I try to track any library, libguide or webpage that provides a searchbox or link to our OER Metafinder. I post a list of those sites on our “About the Metafinder” page…and from the list link back to the page that links to our service. What that has turned into is a quick spot to view more than 400 OER-related libguides, websites or services. More than once I’ve heard from people who appreciate being able to so quickly find OER advocacy materials from literally around the world. For example, this past month, among the top 25 sites sending traffic to our Metafinder were sites in South Africa, Australia, Canada, Kenya and Taiwan and the Netherlands.</p>



<h2 class="wp-block-heading"><strong>How can we work to reduce silos in OER discovery initiatives?</strong></h2>



<p>In the absence of any sort of cross-repository search mechanism, it’s absolutely true that the problem of discovery becomes more difficult as the number of silos increases. There’s a limit to a searcher’s energy and diving in and out of silos all day can be exhausting. If, however, we have a more standardized way of surfacing the relevant content of each silo then the number of them is more a computer scaling issue than it is a burden to searchers. So we’re at something of a fork in the discovery road. Do we think about how to reduce the number of places you have to look or do we think about how we might build a single virtual OER database out of the many siloed repositories.</p>



<h2 class="wp-block-heading"><strong>What role, if any, does accessibility and equitable data play in OER discovery?</strong></h2>



<p>Equitable data or Data Equity, is all about finding and eliminating the ways that bias, assumptions, unfairness and prejudice can slip into a data project. I suppose you can find traces of those problematic impulses somewhere in the OER discovery universe but my sense of things is that the OER movement in general is already quite far ahead of many other activities in valuing the open and equitable. So are there areas where we can improve? I can think of one, and that would be making sure bias and prejudices are not reflected in the metadata we develop in hopes of improving discovery. Thinking about equitable access, I think we might also work to insure that our discovery platforms and our content delivery sources support the simplest, least-expensive device capable of reasonable function–rather than requiring an expensive computer to enjoy the best experience.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>HathiTrust ETAS Activity</title>
		<link>https://inodeblog.com/?p=1890</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Wed, 23 Sep 2020 18:15:00 +0000</pubDate>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[General]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1890</guid>

					<description><![CDATA[The Emergency Temporary Access Service provided to qualifying HathiTrust members has proven quite popular with George Mason University users.  Here’s a chart showing item circulation by month, beginning April, 2020... <a href="https://inodeblog.com/?p=1890">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><a href="https://inodeblog.com/wp-content/uploads/2021/03/image.png"><img fetchpriority="high" decoding="async" width="1024" height="600" src="https://inodeblog.com/wp-content/uploads/2021/03/image.png" alt="" class="wp-image-1913" srcset="https://inodeblog.com/wp-content/uploads/2021/03/image.png 1024w, https://inodeblog.com/wp-content/uploads/2021/03/image-300x176.png 300w, https://inodeblog.com/wp-content/uploads/2021/03/image-768x450.png 768w, https://inodeblog.com/wp-content/uploads/2021/03/image-624x366.png 624w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>


<p><a href="https://inodeblog.com/wp-content/uploads/2020/09/Mason-ETAS-ANNUAL.png"><br /></a>The <a href="https://www.hathitrust.org/ETAS-Description">Emergency Temporary Access Service</a> provided to qualifying HathiTrust members has proven quite popular with George Mason University users.  Here’s a chart showing item circulation by month, beginning April, 2020 and running through March, 2021.</p>
<p><a href="https://inodeblog.com/wp-content/uploads/2020/09/etas-4-1_9-22.png"> </a></p>


<p></p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Final Data for our DNS Query project</title>
		<link>https://inodeblog.com/?p=1815</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Tue, 10 Dec 2019 11:10:00 +0000</pubDate>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Copyright]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1815</guid>

					<description><![CDATA[Here’s the final set of numbers on our analysis of DNS query logs (detailed in an earlier post). Graph covers activity between July 3rd and December 9, 2019. This reflects... <a href="https://inodeblog.com/?p=1815">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>Here’s the final set of numbers on our analysis of DNS query logs (<a href="https://inodeblog.com/?p=1757">detailed in an earlier post</a>).</p>
<p>Graph covers activity between July 3rd and December 9, 2019.</p>
<p>This reflects only the DNS query activity on the campus network.  Mason affiliates using off-campus networks are not included in this chart.</p>
<p>We can assume, I think, that most activity going to SCI-HUB is all about finding content.   ResearchGate also fills this role but as a social networking platform, there are other reasons for traffic to their site.   We can see that ResearchGate and Google Scholar are heavily visited.   What we can’t readily see is the degree to which they serve as an alternative source for otherwise restricted content.</p>
<p><br /><img decoding="async" style="display: block; margin-left: auto; margin-right: auto;" title="2019-12-19_11-41-37.png" src="https://inodeblog.com/wp-content/uploads/2019/12/2019-12-19_11-41-37.png" alt="Final Count for DNS Queries" width="711" height="380" border="0" /></p>
<p> </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p> </p>
<p>You can download our dataset here (roughly 8 Mb):    <a href="https://tinybox.gmu.edu/files/DNS-QUERY-Final.txt">https://tinybox.gmu.edu/files/DNS-QUERY-Final.txt</a></p>
<p> </p>
<!-- /wp:paragraph -->]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Fun with our Traffic Counter&#8217;s API</title>
		<link>https://inodeblog.com/?p=1807</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Wed, 31 Jul 2019 01:34:17 +0000</pubDate>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Library Tech]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1807</guid>

					<description><![CDATA[We installed counters over all the entrances in Fenwick Library a while back.  Smart little devices that offer an API as well.  Click the image to check out this particular... <a href="https://inodeblog.com/?p=1807">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>We installed counters over all the entrances in Fenwick Library a while back.  Smart little devices that offer an API as well.  Click the image to check out this particular experiment.</p>
<p><a href="https://apis.gmu.edu/vea"><img decoding="async" style="float: left;" title="2019-07-30_21-10-39.png" src="https://inodeblog.com/wp-content/uploads/2019/07/2019-07-30_21-10-39.png" alt="2019 07 30 21 10 39" width="281" height="308" border="0" /></a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What does local use of Sci-Hub look like?</title>
		<link>https://inodeblog.com/?p=1757</link>
		
		<dc:creator><![CDATA[Wally Grotophorst]]></dc:creator>
		<pubDate>Mon, 15 Jul 2019 12:25:07 +0000</pubDate>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Library Tech]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1757</guid>

					<description><![CDATA[Product bundling thrives in markets with few competitive options.  Your cable company knows that and so do large academic publishers.   For years, they&#8217;ve sold collections of e-journals at a... <a href="https://inodeblog.com/?p=1757">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>Product bundling thrives in markets with few competitive options.  Your cable company knows that and so do large academic publishers.   For years, they&#8217;ve sold collections of e-journals at a discount over what you&#8217;d pay to subscribe to each individual title in the bundle (though you probably wouldn&#8217;t if given the choice).  That’s a big deal, right?</p>
<p>But as the cost these bundles has risen&#8211;far outpacing inflation&#8211;libraries have begun looking for alternatives. Some are letting their &#8220;big deals&#8221; expire while others are developing strategies to help inform those looming (and often fraught) renewal decisions.</p>
<p><a href="https://inodeblog.com/wp-content/uploads/2019/07/BSWAvywCMAA0ymB.png-large.png"><img loading="lazy" decoding="async" class="alignleft size-medium wp-image-1756" style="float: left;" src="https://inodeblog.com/wp-content/uploads/2019/07/BSWAvywCMAA0ymB.png-large-300x250.png" alt="" width="300" height="250" srcset="https://inodeblog.com/wp-content/uploads/2019/07/BSWAvywCMAA0ymB.png-large-300x250.png 300w, https://inodeblog.com/wp-content/uploads/2019/07/BSWAvywCMAA0ymB.png-large.png 598w" sizes="(max-width: 300px) 100vw, 300px" /></a> <a href="https://sparcopen.org">SPARC</a> (the Scholarly Publishing and Academic Resources Coalition) has been carefully <a href="https://sparcopen.org/our-work/big-deal-cancellation-tracking/">tracking this activity</a> and their work provides an easy way to keep up-to-date on most aspects of this issue.</p>
<p>But one question I&#8217;ve had for some time is what sort of gravitational pull are sites like Sci-Hub or ResearchGate exerting on the already disrupted orbits of users, libraries and publishers?  Put another way, if researchers are satisfying their content needs outside the library/publisher channel, shouldn’t we factor that into our strategy around these big deals?</p>
<p>I realize I’m not the first to ask who’s using Sci-Hub.  Here are just a few of the many articles that get at this topic:</p>
<ul>
<li><a href="https://www.sciencemag.org/news/2016/04/whos-downloading-pirated-papers-everyone">Who’s downloading pirated papers? EVERYONE</a> (<em>Science</em>, April 2016)</li>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5832410/">Sci-Hub provides access to nearly all scholarly literature</a> (<em>eLife</em>, March 2018)</li>
<li><a href="http://eprints.rclis.org/30981/">Shadow Libraries and You: Sci-Hub Usage and the Future of ILL</a> (ACRL Conf Paper, 2017)</li>
</ul>
<p>Each talks about usage activity and traffic patterns but in a way that is little more than anecdotal background noise if you&#8217;re trying to fashion a local strategy and need to focus on what <strong><em>your</em> </strong>local users are <strong><em>actually</em></strong> doing.  Simply asking who’s using these sites poses all sorts of problems.</p>
<p>I finally settled<span style="font-size: inherit;"> on analyzing <a href="https://www.cloudns.net/wiki/article/254/">DNS queries</a> to our campus nameservers as a reasonable metric.  When a user on our campus network points his browser at researchgate.net, our campus nameserver logs the transaction.  An imperfect measure to be sure (e.g., it ignores traffic to &#8220;shady&#8221; sites from off-campus affiliates using their ISP’s nameserver) but it does let me compare on-campus traffic to &#8220;pirate&#8221; sites with on-campus traffic to sites provided via our library&#8217;s subscriptions.</span></p>
<p>Mindful of privacy issues, I asked a friend in campus IT to take a list of 6 or 7 domains and derive an extract file from the DNS query logs, providing just date, time and query string for anything that matched the domain information I provided.  Here’s an excerpt of the result:</p>
<p><img loading="lazy" decoding="async" style="display: block; margin-left: auto; margin-right: auto;" title="2019-07-10_13-53-29.png" src="https://inodeblog.com/wp-content/uploads/2019/07/2019-07-10_13-53-29.png" alt="2019 07 10 13 53 29" width="662" height="342" border="0" /></p>
<p>Producing this extract is now part of a weekly cron job so I’ll be able to monitor the relative use of these sites over the coming months.  In this one particular instance, I can&#8217;t wait for the Fall term to begin… <em> [ update:  You can see subsequent months <a href="https://inodeblog.com/?p=1815">here</a> ]</em></p>
<p>So what did I find by monitoring DNS queries between July 3rd and July 13th?</p>
<p>The graph shows activity for users <strong>on the campus network</strong>.  A better name for this post might be, &#8220;What does local use of ResearchGate look like?&#8221;</p>
<p><a href="https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1.png"><img loading="lazy" decoding="async" class=" wp-image-1801 aligncenter" src="https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1-1024x247.png" alt="" width="840" height="203" srcset="https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1-1024x247.png 1024w, https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1-300x72.png 300w, https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1-768x185.png 768w, https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1-624x150.png 624w, https://inodeblog.com/wp-content/uploads/2019/07/2019-07-16_12-09-03-1-1.png 1660w" sizes="(max-width: 840px) 100vw, 840px" /></a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>You don&#8217;t always have to write code</title>
		<link>https://inodeblog.com/?p=1710</link>
		
		<dc:creator><![CDATA[Wally Grotophorst]]></dc:creator>
		<pubDate>Tue, 11 Jun 2019 01:06:55 +0000</pubDate>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Library Tech]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1710</guid>

					<description><![CDATA[Sometimes what appears to be a programming task doesn’t actually require firing up your editor. Consider this problem: Two fixed-length text files, one has 42,000 lines while the other has... <a href="https://inodeblog.com/?p=1710">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>Sometimes what appears to be a programming task doesn’t actually require firing up your editor.</p>
<p>Consider this problem: Two fixed-length text files, one has 42,000 lines while the other has 13,000.  In each file, a single line represents information about a particular user. If a person with status ‘<em>X</em>’ in the first file also appears in the 2nd file with a status of ‘<em>Y</em>’, we need to keep the line in the first file and delete the line for that user from the 2nd file. We can match up the user between files via the person’s ID# field which appears in both files.</p>
<p>Real word example: We have two different fixed-length files that we receive weekly from the campus computer center. For years we just sent those files (in Voyager SIF format) directly into our Voyager system to update patrons (students with one file, faculty and staff with the other). Moving to Alma we decided the best course for our accelerated implementation was to let the computer center continue producing those SIF files and we&#8217;d take on the task of converting the information into the XML form that Alma was expecting. That did require a bit of code but it’s been working pretty well.  But not perfectly&#8230;</p>
<p><span id="more-1710"></span></p>
<p>Some people appear in both files and the last file loaded sets the borrower status for that individual.  For example, the grad student who also works in a department somewhere on campus ends up having the same borrowing privileges as a wages worker in the food court once we overwrite the student status from Student Information System (SIS) file with the same user&#8217;s record in the file we get from HR that shows a status of “wages.”</p>
<p>Understandably, a student complained about this the other day and ever since I’ve been trying to think of a programmatic fix.  But before walking through my solution—and to make what I came up with a bit easier to follow&#8211;here’s a sample line from each of the two files</p>
<p><em>From the SIS file (42,000 lines)</em></p>
<p>00000 ro medm 1 2019.06.102020  03.2 54 G00009705 Jones Seneca T. 2205 Any Street, Anytown, Virginia [line continues…]</p>
<p><em>From the HR file (13,000 lines)</em></p>
<p>00000 po ireg 1 2019.06.102020 03.03 2803 G00223705  Everyman, Bob E. 1422 Stafford Street, Apt 7, Arlington, VA 22203 [line continues…]</p>
<p>There must be many ways to solve this problem. My first thought was a Perl program but my brute force algorithm (look for a grad student in the first file, then scan the 2nd file for a match) seemed too slow (stupid?) to pursue.  Also considered <a href="http://openrefine.org">OpenRefine</a> but that quickly got very fiddly and I was looking for a server-side solution I could add into the script that handles the conversion to XML prior to sending the data to Alma.  Finally, I thought, &#8220;<span style="font-size: inherit;">what about a shell script where awk and grep do the heavy lifting?&#8221;</span></p>
<p>My algorithm ended up looking something like this:</p>
<p>1. Examine each line of the Student Information System file and where we find “medm” on a line in the 3rd column (a code for grad students), extract the Mason ID number (in column 8), and write just that ID# to the output file.</p>
<p>2. Examine each line of the HR file and write an output file that contains just the Mason ID number for anyone with the status of “wages” (matching records have “ireg” in the 3rd column).</p>
<p>3. Go through both files and where the Mason ID number in file one matches a Mason ID number in the 2nd file, add that ID# to a third file (contains only ID numbers that are in both original files).</p>
<p>4. Finally, grep the two files using an inverted match (e.g., -vf ) writing out lines from the original HR file that <strong><em>are not in</em></strong> the file we created in step 3 (where ID appeared in both original files).  This is equivalent to deleting lines from file the HR.dat file for the wages person who appeared in SIS file as a grad student. Voilà! The grad status record doesn’t get overwritten when the processed HR file (hr_final.out) is loaded into Alma.</p>
<p>Four-step solution looks like this:</p>
<pre lang="“perl”">awk '{ if ($3 =="medm") { print $ 8}}' < SIS.dat > grad_IDs.txt

awk '{ if ($3 =="ireg") { print $ 8}}' <  HR.dat > wages_IDs.txt

awk -F\<strong>|</strong> 'NR==FNR{a[$1]++;next}a[$1]' grad_IDs.txt wages_IDs.txt > both.txt

grep -vf both.txt HR.dat > hr_final.out
</pre>
<p>Note: That third line has a pipe character after the -F\, not a capital &#8220;I&#8221; &#8230; and yes, for that third line I stand on the shoulders of giants who post answers on <a href="https://stackoverflow.com/questions/4780203/deleting-lines-from-one-file-which-are-in-another-file">Stack Overflow</a> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p>How long does it take this to run on a rather old RedHat 6.5 server?  1.5 seconds</p>
<p>Here’s a nice awk cheatsheet:</p>
<p><a href="https://www.cheatography.com/tme520/cheat-sheets/awk-english/pdf/">https://www.cheatography.com/tme520/cheat-sheets/awk-english/pdf/</a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>E-Content Usage Update for Fall 2017</title>
		<link>https://inodeblog.com/?p=1651</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Mon, 05 Feb 2018 16:15:43 +0000</pubDate>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Desktop Software]]></category>
		<guid isPermaLink="false">https://inodeblog.com/?p=1651</guid>

					<description><![CDATA[We have no perfect way of assessing e-content usage by our students even though we’re now spending 75% or more of our collections budget on this sort of material.  We... <a href="https://inodeblog.com/?p=1651">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>We have no perfect way of assessing e-content usage by our students even though we’re now spending 75% or more of our collections budget on this sort of material.  We do receive and analyze <a href="https://www.projectcounter.org/">COUNTER</a> statistics but COUNTER stats focus on <strong><em>what’s</em></strong> being used and collapse all activity by students, faculty and staff into a single number for each source.  Fine as far as it goes, but I&#8217;m also interested in <strong><em>who’s</em></strong> using content.   Not down to the individual (I value the library’s reputation for privacy) but at least to some meaningful though suitably-anonymous aggregation.  Until I get a better tool, here&#8217;s how I go about answering a question like “how do the different majors use our e-content collections?”</p>
<p><span id="more-1651"></span></p>
<p>The basic idea is this:  each line of our proxy server’s log file contains the user’s netID, the time of the transaction and the link that was proxied.   Another dataset I have access to&#8211;the student directory—includes both the netID and the declared major for each student.  Joining these two datasets on the netID, I can get a count of e-content activity for each major.</p>
<p>This works but obviously it’s not a very precise tool.  For example, I’m only analyzing traffic that goes through our proxy server (basically our off-campus users) and ignoring open-access e-content altogether (we don’t use our proxy server to provide access to those sorts of resources).     Six thousand of our 34,000 students live on campus so I’m comfortable assuming that off-campus use is a reasonable proxy for all usage but I realize this is a very blunt instrument.</p>
<p>To explain the graph below, that 986,024 figure for Psychology majors means in the semester-long  activity log file on the proxy server, there were 986,024 lines in the file that were generated by content coming through the server bound for a declared Psychology major.</p>
<p>The method I use to convert the proxy server log to an SQL database hasn’t changed much from <a href="https://inodeblog.com/?p=1158">the how-to I posted</a> a few years ago.  I did, however, make a small, simple and obvious refinement this year: before computing the per-major count, I eliminated lines from the log file that referenced things like javascript or css files and also tossed out glyph or icon entries.  Doing so removed a bit of noise from my results since some databases have much more graphically-rich interfaces and all those widgets, icons and glyphs skew my count.  Here’s a graphic representation of the top 50 or so e-content using majors:</p>
<p><a href="https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017.png"><img loading="lazy" decoding="async" height="1074" alt="" class="aligncenter size-full wp-image-1693" src="https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017.png" width="1043" srcset="https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017.png 1043w, https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017-291x300.png 291w, https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017-768x791.png 768w, https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017-994x1024.png 994w, https://inodeblog.com/wp-content/uploads/2018/02/Activity_By_Major_Fall_2017-624x643.png 624w" sizes="(max-width: 1043px) 100vw, 1043px" /></a></p>
<p>This next chart shows the number of students in each major.    Comparing this graph with the first one, you can see that some majors generate far higher levels of use than their numbers might indicate (e.g., Public Policy ranks 40th in the number of students majors but 15th in e-content usage).  <a href="https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017.png"> <img loading="lazy" decoding="async" height="1186" style="display: block; margin-left: auto; margin-right: auto;" alt="" class="aligncenter size-large wp-image-1668" src="https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017-703x1024.png" width="814" srcset="https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017-703x1024.png 703w, https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017-206x300.png 206w, https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017-768x1119.png 768w, https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017-624x909.png 624w, https://inodeblog.com/wp-content/uploads/2018/02/NumberPerMajorFall2017.png 897w" sizes="(max-width: 814px) 100vw, 814px" /></a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The OER Metafinder Origin Story</title>
		<link>https://inodeblog.com/?p=1521</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Wed, 06 Dec 2017 16:11:16 +0000</pubDate>
				<category><![CDATA[Library Tech]]></category>
		<guid isPermaLink="false">http://inodeblog.com/?p=1521</guid>

					<description><![CDATA[I am a relative newcomer to the topic of OERs (Open Educational Resources).  Not unaware of the topic—our Mason Publishing Group has been working with faculty interested in affordable educational materials for... <a href="https://inodeblog.com/?p=1521">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p><a href="https://mason.deepwebaccess.com/mason__MasonLibrariesOpenEducationResources_5f4/desktop/en/search.html"><img loading="lazy" decoding="async" height="150" alt="" class="alignleft wp-image-1585 size-thumbnail" src="https://inodeblog.com/wp-content/uploads/2017/12/2017-12-19_14-07-32-150x150.png" width="150" srcset="https://inodeblog.com/wp-content/uploads/2017/12/2017-12-19_14-07-32-150x150.png 150w, https://inodeblog.com/wp-content/uploads/2017/12/2017-12-19_14-07-32-180x180.png 180w, https://inodeblog.com/wp-content/uploads/2017/12/2017-12-19_14-07-32-60x60.png 60w" sizes="(max-width: 150px) 100vw, 150px" /></a> I am a relative newcomer to the topic of <a href="https://wiki.creativecommons.org/wiki/What_is_OER%3F">OERs</a> (Open Educational Resources).  Not unaware of the topic—our <a href="https://publishing.gmu.edu">Mason Publishing Group</a> has been working with faculty interested in affordable educational materials for some time now—but until late, I haven’t really been terribly involved in those efforts.</p>
<p>That changed one afternoon this summer as I grabbed my laptop and tagged along with them to a meeting with the Associate Provost for Undergraduate Education to talk about OERs.</p>
<p style="text-align: left;">As the meeting progressed (and moved ever further from my area of expertise) I started stealing moments to jump in and out of various OER aggregation sites, curious to see the sorts of resources already available on the net.</p>
<p>If you’ve spent much time with OERs, you won’t be surprised to hear that I discovered:</p>
<ul>
<li><em>many dissimilar aggregations of content;</em></li>
<li><em>so many wildly-different interfaces;</em></li>
<li><em>so much duplication across these aggregations;</em></li>
<li><em>and such inconsistent metadata.</em></li>
</ul>
<p style="text-align: left;">As I poked around, I could easily envision a faculty member—excited by idea of OERs—feeling the enthusiasm drain away as she dove in and out of the various content silos.   Soon I found myself thinking much less about OERs and far more about how to improve their discoverability as a way to improve OER adoption…</p>
<p><span id="more-1521"></span></p>
<p>Already familiar with Deep Web Technologies <a href="https://www.deepwebtech.com/products/explorit-everywhere/">Explorit</a> software (we use it for a number of local search engines like our popular <a href="https://mason.deepwebaccess.com/mason__MasonLibrariesScienceFinder_b/desktop/en/search.html">Sciences Metafinder</a>), I realized that discovery of OER materials presents a problem that’s tailor-made for a federated search solution.   Looking across OER sources we find:</p>
<ul>
<li>a large number of search targets (<em>a federated search would save hundreds of clicks</em>),</li>
<li>and fortunately each site is more-or-less focused (<em>that&#8217;s good, minimizes noise in retrieval sets</em>).</li>
<li>We see redundant content across many of these sources <em>(de-duping retrieval would be a huge win, too bad eccentric metadata makes that difficult)</em>, and</li>
<li>there&#8217;s a vertigo-inducing variety of search interfaces (<em>distilling</em> <em>that to one would be great, wouldn’t it?</em>).</li>
</ul>
<p>Beyond improving the discovery process, I realized building our own search engine would also give us the opportunity to take a more expansive view of what constitutes an OER&#8211;by searching the more common OER repositories but also hitting sites that offer quality, open educational content even if that isn’t their sole or even primary purpose. Sites like <a href="https://dp.la">DPLA</a>, <a href="https://www.hathitrust.org/">HathiTrust</a> (of particular value where the educator belongs to a HathiTrust member institution), <a href="https://archive.org/details/education">Internet Archive,</a> and <a href="https://www.wdl.org/">World Digital Library</a> to name a few.</p>
<p>I pitched the idea to <a href="https://www.deepwebtech.com/">Deep Web Technologies</a>. Abe Lederman, CEO was very enthusiastic and offered to help us turn the idea around quickly.  True to his word, within just a few weeks we had a powerful OER discovery service ready to go.</p>
<p>The <a href="https://mom.gmu.edu">Mason OER Metafinder</a> launches&#8230;</p>
<p>I tweeted once or twice about the Metafinder’s existence and shared the link with a few people&#8230;but outside Mason we weren’t seeing much interest or activity.  Some weeks later, Deep Web posted a story about the OER Metafinder on their <a href="https://www.deepwebtech.com/2017/10/george-mason-university-launches-search-engine-for-open-source-textbooks/">tech blog</a>.   That was picked up by <a href="http://newsbreaks.infotoday.com/Digest/Mason-OER-Metafinder-Searches-for-Open-Resources-121631.asp">Information Today</a> and other corporate PR news sites and the Metafinder began to build a small audience but still nothing dramatic. I didn&#8217;t feel we had failed exactly&#8211;the metafinder was impressing local audiences&#8211;but I was certainly disappointed. We couldn&#8217;t seem to get what I considered a great tool more widely used.</p>
<p>Three or four weeks passed then one morning I saw a message from Abe Lederman in my inbox, subject &#8220;OER Metafinder going viral?&#8221;   Abe said they had seen a spike, hundreds of hits, on the Metafinder in the past two hours&#8211;mostly from the Northeast US.  He asked, &#8220;Any idea why?&#8221;</p>
<p>I didn&#8217;t have a clue but later that afternoon, I got an email mentioning this post on the SPARC Libraries and OER Forum earlier in the day:</p>
<blockquote>
<p><em>I don’t remember seeing this announcement on any of our OER lists last month, or at OpenEd, but one of our library liaisons just forwarded it to me. It’s the announcement of an aggregated OER search engine created by George Mason and a web tech company, which looks, on the face of it, to be a “Google for OER”. It searches many open archival/book repositories (DPLA, HathiTrust, Internet Archive) as well as the standard OER ones (Merlot, OTL, OpenStax, etc.) and has some great limiters to narrow down results. Congrats GMU!</em></p>
</blockquote>
<p>This unsolicited mention on a listserv aimed at precisely the right group of people proved catalytic.  Within two days, I searched the <a href="https://community.libguides.com">LibGuides community site</a> and found 28 institutions already linking to our OER Metafinder.   Noticing that it was catching on with LibGuides users, I added sample <a href="https://publishing.gmu.edu/the-mason-oer-metafinder-widget/">search widget code</a> to our “About the Metafinder” page.   Searching <a href="https://community.libguides.com">https://community.libguides.com</a> and crawling the web with <a href="http://www.devontechnologies.com/products/devonagent/devonagent-pro.html">DEVONagent Pro</a> (a great personal research app for Mac, by the way), I see that just a few weeks after Abe&#8217;s initial &#8220;gone viral” email there are already <a href="https://publishing.gmu.edu/whos-using-the-mason-oer-metafinder">more than 100 sites</a> linking to our <a href="https://mom.gmu.edu">Metafinder</a>.</p>
<p><em><strong>7/2019 update:</strong></em>  We now have 348+ sites linking to MOM</p>
<p><a href="https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36.png"><img loading="lazy" decoding="async" height="356" style="display: block; margin-left: auto; margin-right: auto;" alt="" class="aligncenter size-large wp-image-1617" src="https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36-1024x583.png" width="625" srcset="https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36-1024x583.png 1024w, https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36-300x171.png 300w, https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36-768x438.png 768w, https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36-624x356.png 624w, https://inodeblog.com/wp-content/uploads/2017/12/2017-12-20_11-34-36.png 1269w" sizes="(max-width: 625px) 100vw, 625px" /></a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Dashboarding Google Analytics</title>
		<link>https://inodeblog.com/?p=1515</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Mon, 10 Apr 2017 15:05:59 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">http://inodeblog.com/?p=1515</guid>

					<description><![CDATA[One of our skunkworks projects involves taking real-time Google Analytics data and building a visually interesting dashboard to report out activity on various library sites. Click the image below to... <a href="https://inodeblog.com/?p=1515">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>One of our skunkworks projects involves taking real-time Google Analytics data and building a visually interesting dashboard to report out activity on various library sites.</p>
<p>Click the image below to take a peek at our ever-evolving sandbox:</p>
<p><a href="http://wizwah.gmu.edu/dashboard"><img loading="lazy" decoding="async" class="size-medium wp-image-1516 aligncenter" src="https://inodeblog.com/wp-content/uploads/2017/04/2017-04-10_11-04-06-300x237.png" alt="" width="300" height="237" srcset="https://inodeblog.com/wp-content/uploads/2017/04/2017-04-10_11-04-06-300x237.png 300w, https://inodeblog.com/wp-content/uploads/2017/04/2017-04-10_11-04-06-624x492.png 624w, https://inodeblog.com/wp-content/uploads/2017/04/2017-04-10_11-04-06.png 640w" sizes="(max-width: 300px) 100vw, 300px" /></a></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Three Social Media Library Services</title>
		<link>https://inodeblog.com/?p=1472</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Fri, 31 Mar 2017 14:50:04 +0000</pubDate>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Library Tech]]></category>
		<guid isPermaLink="false">http://inodeblog.com/?p=1472</guid>

					<description><![CDATA[A local library made news in 2010, announcing that it would archive every tweet ever posted.  With Twitter generating 500 million tweets a day, can we really be surprised that it&#8217;s... <a href="https://inodeblog.com/?p=1472">Read more &#187;</a>]]></description>
										<content:encoded><![CDATA[<p>A local library made news in 2010, announcing that it would archive every tweet ever posted.  With Twitter <a href="http://www.internetlivestats.com/twitter-statistics/">generating 500 million tweets</a> a day, can we really be surprised that it&#8217;s <a href="https://www.theatlantic.com/technology/archive/2016/08/can-twitter-fit-inside-the-library-of-congress/494339/">proving to be a challenge</a>?</p>
<p>Of course, that doesn&#8217;t mean there aren&#8217;t a host of smaller services we can build around social media. By way of example, here are three social media services we offer the Mason community. One&#8217;s pretty simple while the other two require a bit more infrastructure.</p>
<p><a style="font-size: 1rem;" href="http://tweet.gmu.edu" target="_blank" rel="noopener"><img loading="lazy" decoding="async" class="alignleft size-medium wp-image-1469" src="https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-14-25-284x300.png" alt="" width="284" height="300" srcset="https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-14-25-284x300.png 284w, https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-14-25-768x811.png 768w, https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-14-25-624x659.png 624w, https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-14-25.png 897w" sizes="(max-width: 284px) 100vw, 284px" /></a></p>
<p>&nbsp;</p>
<p><b>Mason Tweets</b> (<em><a href="http://tweet.gmu.edu">http://tweet.gmu.edu</a></em>)</p>
<p>This curated feed from &#8220;official&#8221; and &#8220;near-official&#8221; twitter accounts from across the university offers a quick and easy way to take the &#8220;Mason Nation&#8221; pulse.</p>
<p>To produce this service, we created a MasonTweeter account on Twitter to follow Mason-related feeds.  The web presence is simply a <a href="https://dev.twitter.com/web/embedded-timelines">page that embeds the MasonTweeter timeline</a>.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><a href="http://preztweets.gmu.edu" target="_blank" rel="noopener"><img loading="lazy" decoding="async" class="alignright size-medium wp-image-1470" src="https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-15-38-277x300.png" alt="" width="277" height="300" srcset="https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-15-38-277x300.png 277w, https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-15-38-768x833.png 768w, https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-15-38-624x677.png 624w, https://inodeblog.com/wp-content/uploads/2017/03/2017-03-31_09-15-38.png 861w" sizes="(max-width: 277px) 100vw, 277px" /></a></p>
<p><b>Preztweets</b> (<em><a href="http://preztweets.gmu.edu">http://preztweets.gmu.edu</a></em>)</p>
<p>An archive of every tweet from Mason&#8217;s President, Ángel Cabrera.</p>
<p>This service stems from a discussion I had with Dr. Cabrera a few years ago.  At that time, Twitter did not offer users an archive of their tweets (<a href="https://support.twitter.com/articles/20170160">they do now</a>), so we were looking into how we might save his tweets for future university historians.  We settled on a method that offers a searchable database of tweets stored locally in a MySQL database (suitable for future archiving).  Thanks to Andrew M. Whalen <a href="https://github.com/amwhalen/archive-my-tweets">for the code</a> that helped build this LAMP-based archiving service.</p>
<p>&nbsp;</p>
<p><a href="https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08.png"><img loading="lazy" decoding="async" class="alignleft size-medium wp-image-1509" src="https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08-262x300.png" alt="" width="262" height="300" border="2" srcset="https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08-262x300.png 262w, https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08-768x880.png 768w, https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08-893x1024.png 893w, https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08-624x715.png 624w, https://inodeblog.com/wp-content/uploads/2017/03/2017-04-06_10-38-08.png 991w" sizes="(max-width: 262px) 100vw, 262px" /></a></p>
<p><b>Social Feed Manager (SFM)</b> (<em><a href="https://gwu-libraries.github.io/sfm-ui/">https://gwu-libraries.github.io/sfm-ui/</a></em></p>
<p>Just the other day, I set up our most ambitious social media service yet: Social Feed Manager.</p>
<p>SFM is a Django application developed by George Washington University Libraries to collect social media data from Twitter. It connects to Twitter&#8217;s approved API to collect data in bulk and makes it possible for scholars, students, and librarians to identify, select, collect, and preserve Twitter data for research purposes. We&#8217;re running SFM in a Docker container (using <a href="https://www.docker.com/docker-mac">Docker for Mac</a>) which simplifies installation and abstracts away much of the underlying complexity.</p>
<p>We have added Social Feed Manager to the suite of data services we offer out of the new <a href="http://dsc.gmu.edu">Digital Scholarship Center</a> we&#8217;ve been shaking down in beta since late January.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
