<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://justin.kelly.au/feed.xml" rel="self" type="application/atom+xml" /><link href="https://justin.kelly.au/" rel="alternate" type="text/html" /><updated>2025-09-03T19:34:15+10:00</updated><id>https://justin.kelly.au/feed.xml</id><title type="html">Justin Kelly</title><subtitle>Justin Kelly</subtitle><entry><title type="html">PowerBI for Librarians 2023</title><link href="https://justin.kelly.au/powerbi-for-librarians-2023/" rel="alternate" type="text/html" title="PowerBI for Librarians 2023" /><published>2023-07-16T00:00:00+10:00</published><updated>2023-07-16T00:00:00+10:00</updated><id>https://justin.kelly.au/powerbi-for-librarians-2023</id><content type="html" xml:base="https://justin.kelly.au/powerbi-for-librarians-2023/"><![CDATA[<p><em>This page details the notes for the VALA 2023 TechCamp PowerBI Workshop</em></p>

<p>All files are available on GitHub at:</p>
<ul>
  <li><a href="https://github.com/justinkelly/vala_2023_techcamp_powerbi">https://github.com/justinkelly/vala_2023_techcamp_powerbi</a></li>
</ul>

<h3 id="introduction">Introduction</h3>

<ul>
  <li>About me</li>
  <li>What is PowerBI</li>
  <li>What the focus is today
    <ul>
      <li>80% will be about visualisation and dashboard creation</li>
    </ul>
  </li>
  <li>Links
    <ul>
      <li>Files that will be used in the workshop: bit.ly/vala23-powerbi-files</li>
      <li>Detailed notes for the workshop: bit.ly/vala23-powerbi-notes</li>
      <li>Slides: bit.ly/vala23-powerbi-slides</li>
    </ul>
  </li>
</ul>

<h3 id="what-is-powerbi-and-why-would-i-use-it">What is PowerBI and why would I use it?</h3>

<p>So we can go from Excel and CSV files, to a system that can produce meaningfull dashboards and reports, can used to to analyse data, create insights and are easily shareable.</p>

<p><img src="/img/2023-vala-dashboard.png" alt="2023-vala-dashboard.png" /></p>

<p><img src="/img/2021-powerbi-1-branch.png" alt="2021-powerbi-1-branch.png" /></p>

<p>Discuss the difference between Excel and PowerBi and why use 1 or the other</p>

<h3 id="why-are-we-doing-this">Why are we doing this?</h3>

<ul>
  <li>Why PowerBI?
    <ul>
      <li>Desktop is free, and its</li>
      <li>Microsoft makes available the professional and enterprise versions of PowerBI very cheaply to public institutions - meaning most universities will already have PowerBI available for library staff to use - or at least request access to</li>
    </ul>
  </li>
  <li>Why visualisation?
    <ul>
      <li>Librarians know the stories of how your libraries are operating and evolving</li>
      <li>But if we can’t present data to support our stories and feelings, we can’t have useful discussions with senior management etc. about the future of your libraries</li>
      <li>Give you tools and knowledge so you can have a positive impact for your libraries</li>
    </ul>
  </li>
  <li>Access to data is the main problem</li>
</ul>

<h3 id="links">Links</h3>

<ul>
  <li>This document: <a href="https://justin.kelly.org.au/powerbi-for-librarians-2023/">https://justin.kelly.org.au/powerbi-for-librarians-2023/</a></li>
  <li>Presentation: <a href="https://bit.ly/vala23-powerbi-slides">https://bit.ly/vala23-powerbi-slides</a></li>
  <li>Files
    <ul>
      <li><a href="https://github.com/justinkelly/vala_2023_techcamp_powerbi">https://github.com/justinkelly/vala_2023_techcamp_powerbi</a></li>
    </ul>
  </li>
</ul>

<p><strong>What we are going to do:</strong></p>

<p>Turn an excel/csv file into a maintainable, share-able and accurate dashboard, that allows you to answer complex queries simply using Power BI.</p>

<p>We’ll use the loans data that data.gov.uk has made available. In this case, loans for 98 Northern Ireland public libraries</p>

<ul>
  <li><a href="https://data.world/datagov-uk/47ee1f4e-861b-46d0-85ab-33be2cf6bc17">https://data.world/datagov-uk/47ee1f4e-861b-46d0-85ab-33be2cf6bc17</a></li>
</ul>

<h2 id="section-1">Section 1</h2>

<p><em>What is PowerBI, intro to its interface, and import a CSV</em></p>

<p><img src="/img/2021-powerbi2-1-home.png" alt="2021-powerbi2-1-home.png" /></p>

<h3 id="import-the-data">Import the data</h3>

<p>Import a CSV file using the <code class="language-plaintext highlighter-rouge">Text/CSV</code> option from the <code class="language-plaintext highlighter-rouge">Get Data</code> menu</p>

<p><img src="/img/2023-vala-import-csv-menu.png" alt="2023-vala-import-csv-menu.png" /></p>

<p>Import the Loans file <code class="language-plaintext highlighter-rouge">libraries-ni-annual-loans-summary-2021-2022.csv</code></p>

<p><img src="/img/2023-vala-import-csv.png" alt="2023-vala-import-csv.png" /></p>

<p>Review contents and select <code class="language-plaintext highlighter-rouge">Load</code></p>

<p>This will load into PowerBI the contents of the <code class="language-plaintext highlighter-rouge">libraries-ni-annual-loans-summary-2021-2022.csv</code> file, and will create a table in PowerBI called <code class="language-plaintext highlighter-rouge">libraries-ni-annual-loans-summary-2021-2022</code></p>

<p><img src="/img/2023-vala-import-csv-load.png" alt="2023-vala-import-csv-load.png" /></p>

<p>Save the file</p>

<p>If you’ve managed to get this far - congratulations - you’ve just completed one of the hardest probably - just getting the data and imported into PowerBI</p>

<h2 id="section-2">Section 2</h2>
<p><em>Create some reports</em></p>

<p>Go into <code class="language-plaintext highlighter-rouge">Report</code> view</p>

<p><img src="/img/2021-powerbi-14-section1-9.png" alt="2021-powerbi-14-section1-9.png" /></p>

<p>And now we’ll cover the most use PowerBI visualisations</p>

<p><img src="/img/2023-power-bi-visualizations.png" alt="2023-power-bi-visualizations.png" /></p>

<p>Click on the first visualisation in the top left of the <code class="language-plaintext highlighter-rouge">Visualizations</code> pane. This will add a blank <code class="language-plaintext highlighter-rouge">stacked bar chart</code> to the page</p>

<p><img src="/img/2023-vala-blank.png" alt="2023-vala-blank.png" /></p>

<p>Now click some data elements from the <code class="language-plaintext highlighter-rouge">Data</code> pane, such as <code class="language-plaintext highlighter-rouge">Item Category</code>, <code class="language-plaintext highlighter-rouge">Total</code>, and <code class="language-plaintext highlighter-rouge">Year</code>.</p>

<p>PowerBI will automatically place the different elements into the <code class="language-plaintext highlighter-rouge">Y-axis</code>, <code class="language-plaintext highlighter-rouge">X-axis</code> and <code class="language-plaintext highlighter-rouge">Legend</code> section of the <code class="language-plaintext highlighter-rouge">Visualizations</code> pane.</p>

<p>Try moving the different data elements into the different axis and legend sections to test the results.</p>

<p>Once you have data displaying in your visual, go through each of the visualisation options and adjust the data elements to experiment.</p>

<p>Stacked bar chart
<img src="/img/2023-vala-stacked-bar-chart.png" alt="2023-vala-stacked-bar-chart.png" /></p>

<p>Stacked column chart
<img src="/img/2023-vala-stacked-column-chart.png" alt="2023-vala-stacked-column-chart.png" /></p>

<p>Clustered bar chart
<img src="/img/2023-vala-clustered-bar-chart.png" alt="2023-vala-clustered-bar-chart.png" /></p>

<p>Clustered column chart
<img src="/img/2023-vala-clustered-column-chart.png" alt="2023-vala-clustered-column-chart.png" /></p>

<p>100% Stacked bar chart
<img src="/img/2023-vala-100-percent-stacked-bar-chart.png" alt="2023-vala-100-percent-stacked-bar-chart.png" /></p>

<p>100% Stacked column chart
<img src="/img/2023-vala-100-percent-stacked-column-chart.png" alt="2023-vala-100-percent-stacked-column-chart.png" /></p>

<p>Line chart
<img src="/img/2023-vala-line-chart.png" alt="2023-vala-line-chart.png" /></p>

<p>Pie charts
<img src="/img/2023-vala-pie.png" alt="2023-vala-pie.png" /></p>

<p>Donut charts
<img src="/img/2023-vala-donut.png" alt="2023-vala-donut.png" /></p>

<p>Ribbon
<img src="/img/2023-vala-ribbon.png" alt="2023-vala-ribbon.png" /></p>

<p>Treemap
<img src="/img/2023-vala-treemap.png" alt="2023-vala-treemap.png" /></p>

<p>Waterfall
<img src="/img/2023-vala-waterfall.png" alt="2023-vala-waterfall.png" /></p>

<p>Decomposition Tree
<img src="/img/2023-vala-decomposition-tree.png" alt="2023-vala-decomposition-tree.png" /></p>

<p>Card
<img src="/img/2023-vala-card.png" alt="2023-vala-card.png" /></p>

<p>Multi card
<img src="/img/2023-vala-multi-card-row.png" alt="2023-vala-multi-card-row.png" /></p>

<p>Save the file</p>

<h2 id="section-3">Section 3</h2>
<p><em>Dashboard creation</em></p>

<p>We have covered the most used visualisations in PowerBI, now we’ll start to use them to create dashboards.</p>

<p>Dashboards are pages in PowerBI that can contain 1 or more visualisation - such as in the image below</p>

<p><img src="/img/2023-vala-dashboard-filter-cat.png" alt="2023-vala-dashboard-filter-cat.png" /></p>

<p>To make a new dashboard, click on the + icon at the bottom of the page - refer image below</p>

<p><img src="/img/2023-vala-dashboard-new.png" alt="2023-vala-dashboard-new.png" /></p>

<p>And we’ll bring in 3 different visuals into the 1 page</p>

<p>For the first visual, click the <code class="language-plaintext highlighter-rouge">Stacked column chart</code> icon and put <code class="language-plaintext highlighter-rouge">Year</code> in the <code class="language-plaintext highlighter-rouge">X-axis</code> and <code class="language-plaintext highlighter-rouge">Total</code> in the <code class="language-plaintext highlighter-rouge">Y-axis</code>.</p>

<p><img src="/img/2023-vala-dashboard-visual-1.png" alt="2023-vala-dashboard-visual-1.png" /></p>

<p>Next, click the <code class="language-plaintext highlighter-rouge">Stacked bar chart</code> icon and drag it below (or anywhere you prefer) the first visual and put <code class="language-plaintext highlighter-rouge">Static_Library_Name</code> in the <code class="language-plaintext highlighter-rouge">Y-axis</code>, <code class="language-plaintext highlighter-rouge">Total</code> in the <code class="language-plaintext highlighter-rouge">Y-axis</code>, and <code class="language-plaintext highlighter-rouge">Static_Library_Name</code> in the <code class="language-plaintext highlighter-rouge">Legend</code>.</p>

<p><img src="/img/2023-vala-dashboard-visual-2.png" alt="2023-vala-dashboard-visual-2.png" /></p>

<p>Next, click the <code class="language-plaintext highlighter-rouge">Stacked bar chart</code> icon again and drag it to the left (or anywhere you prefer) of the first visual and expand the height to the full length of the dashboard. Put <code class="language-plaintext highlighter-rouge">Item_Category</code> in the <code class="language-plaintext highlighter-rouge">Y-axis</code>, <code class="language-plaintext highlighter-rouge">Total</code> in the <code class="language-plaintext highlighter-rouge">Y-axis</code>, and <code class="language-plaintext highlighter-rouge">Item_Category</code> in the <code class="language-plaintext highlighter-rouge">Legend</code>.</p>

<p><img src="/img/2023-vala-dashboard-visual-3.png" alt="2023-vala-dashboard-visual-3.png" /></p>

<p>Done, you’ve created your first dashboard!</p>

<h2 id="section-4">Section 4</h2>
<p><em>Make it pretty and add filters and slicers</em></p>

<p>Now we have the base dashboard created, the next step is to format the dashboard to improve the usability and visuals</p>

<p><strong>Change the title of a visualisation:</strong></p>

<ul>
  <li>Click the visualisation</li>
  <li>In the <code class="language-plaintext highlighter-rouge">Visualizations</code> pane, select the <code class="language-plaintext highlighter-rouge">format</code> section</li>
  <li>Open the <code class="language-plaintext highlighter-rouge">Title</code> section and change the <code class="language-plaintext highlighter-rouge">Text</code> to the desired title you want</li>
</ul>

<p><img src="/img/2023-vala-step1-title.png" alt="2023-vala-step1-title.png" /></p>

<p><strong>Filter a visualisation to top 5 items:</strong></p>

<ul>
  <li>Click on the visualisation</li>
  <li>In the <code class="language-plaintext highlighter-rouge">Fitlers</code> pane select <code class="language-plaintext highlighter-rouge">Filters on this visual</code></li>
  <li>Change <code class="language-plaintext highlighter-rouge">Filter type</code> to <code class="language-plaintext highlighter-rouge">Top N</code></li>
  <li>Enter <code class="language-plaintext highlighter-rouge">5</code> in the <code class="language-plaintext highlighter-rouge">Show items</code> section</li>
  <li>In <code class="language-plaintext highlighter-rouge">By value</code> drag in <code class="language-plaintext highlighter-rouge">Total</code> from the data pane</li>
</ul>

<p><img src="/img/2023-vala-step1-filter-top5.png" alt="2023-vala-step1-filter-top5.png" /></p>

<p><strong>Add your logo to the top of the page</strong></p>

<p>Download the logo if you don’t have it from <a href="https://github.com/justinkelly/vala_2023_techcamp_powerbi/blob/main/vala-logo.png">https://github.com/justinkelly/vala_2023_techcamp_powerbi/blob/main/vala-logo.png</a></p>

<p><img src="/img/vala-logo-.png" alt="vala-logo-.png" /></p>

<p>And from the <code class="language-plaintext highlighter-rouge">Insert</code> menu, select <code class="language-plaintext highlighter-rouge">Image</code> refer image below
<img src="/img/2023-vala-insert-image.png" alt="2023-vala-insert-image.png" /></p>

<p>Select your logo, then resize and drag it to your preferred location</p>

<p><strong>Change the background of the dashboard</strong></p>

<p>Click on an empty space in your dashboard, select the <code class="language-plaintext highlighter-rouge">Visualizations</code> pane, then the <code class="language-plaintext highlighter-rouge">Format page</code> tab, then expand the <code class="language-plaintext highlighter-rouge">Wallpaper</code> section. Here you can select a colour, or image, and the dashboard background.</p>

<p><img src="/img/2023-vala-insert-wallpaper.png" alt="2023-vala-insert-wallpaper.png" /></p>

<p><strong>Add rounded borders to the reports</strong></p>

<p>Click on a visualisation in your dashboard, then click <code class="language-plaintext highlighter-rouge">Format visual</code> from the <code class="language-plaintext highlighter-rouge">Visualizations</code> pane. Next, select the <code class="language-plaintext highlighter-rouge">General</code> tab, and expand the <code class="language-plaintext highlighter-rouge">Effects</code> section. Here, expand <code class="language-plaintext highlighter-rouge">Visual board</code> drag the slider below <code class="language-plaintext highlighter-rouge">Rounded corners</code> to adjust the size of the visuals rounded borders.</p>

<p><img src="/img/2023-vala-insert-rounded-borders.png" alt="2023-vala-insert-rounded-borders.png" /></p>

<p><strong>Add slicers</strong></p>

<p><img src="/img/2023-vala-slicer.png" alt="2023-vala-slicer.png" /></p>

<p>Slicers are very similar to filters, except they have a much easier to use and look much better on your dashboard.</p>

<p>To add a slicer to your dashboard, click on the <code class="language-plaintext highlighter-rouge">Slicer</code> icon in the visualisations list, then drag the data element you want to filter all reports on into the <code class="language-plaintext highlighter-rouge">Field</code> section. In our example, we’ll drag <code class="language-plaintext highlighter-rouge">Year</code> into the <code class="language-plaintext highlighter-rouge">Field</code> section.</p>

<p><img src="/img/2023-vala-insert-slicer.png" alt="2023-vala-insert-slicer.png" /></p>

<p><strong>Bring it all together</strong></p>

<p>Once these formatting updates have been done, the dashboard should be easier to understand, nicer looking, and easier to use.</p>

<p><img src="/img/2023-vala-dashboard.png" alt="2023-vala-dashboard.png" /></p>

<h2 id="section-5">Section 5</h2>
<p><em>Data manipulation</em></p>

<p>Review the data and notice the pattern in the Item Category names. How can we make all these categories simpler and easier to read?</p>

<p>To automatically group the item categories into groups of similar items, we can take the first word from each item category to create.</p>

<p>Such as group <code class="language-plaintext highlighter-rouge">Junior Fiction</code>, <code class="language-plaintext highlighter-rouge">Junior Fiction Spoken Word</code> and <code class="language-plaintext highlighter-rouge">Junior Non-Ficton</code> under a parent category of <code class="language-plaintext highlighter-rouge">Junior</code>.</p>

<p>To do this, from the <code class="language-plaintext highlighter-rouge">Home</code> menu, select <code class="language-plaintext highlighter-rouge">Transform data</code> and then <code class="language-plaintext highlighter-rouge">Transform data</code> from the sub-menu.</p>

<p><img src="/img/2023-vala-transform.png" alt="2023-vala-transform.png" /></p>

<p>Right click on <code class="language-plaintext highlighter-rouge">Item_Category</code> and select <code class="language-plaintext highlighter-rouge">Duplicate Column</code>. This will enable us to edit a copy of this column to create the new parent group.</p>

<p><img src="/img/2023-vala-duplicate-column.png" alt="2023-vala-duplicate-column.png" /></p>

<p>To grad just the first word of the item category, select <code class="language-plaintext highlighter-rouge">Split Column</code> from the menu, and then <code class="language-plaintext highlighter-rouge">By Delimiter</code></p>

<p><img src="/img/2023-vala-split-delim.png" alt="2023-vala-split-delim.png" /></p>

<p>Choose <code class="language-plaintext highlighter-rouge">Space</code> as the delimiter and select <code class="language-plaintext highlighter-rouge">Left-most delimiter</code> in the <code class="language-plaintext highlighter-rouge">Split at</code> section, then click OK</p>

<p><img src="/img/2023-vala-split-delim-space.png" alt="2023-vala-split-delim-space.png" /></p>

<p>This will create a new column which has only the first work of then Item Category. To clean up the data, delete the duplicate item category which has the full item category</p>

<p><img src="/img/2023-vala-split-delim-space-remove.png" alt="2023-vala-split-delim-space-remove.png" /></p>

<p>The result will be a new field with a parent category automatically created, with the original field - now as a child category</p>

<p><img src="/img/2023-vala-split-delim-space-save.png" alt="2023-vala-split-delim-space-save.png" /></p>

<p>Then Close &amp; Apply, and Save the file.</p>

<p><img src="/img/2021-powerbi-11-section1-6.png" alt="2021-powerbi-11-section1-6.png" /></p>

<h2 id="section-6">Section 6</h2>
<p><em>Final changes</em></p>

<p>To use the new Parent Item Category - review the visual in the dashboard and adjust the <code class="language-plaintext highlighter-rouge">Item Categories</code> visual to use the new parent item category.</p>

<p><img src="/img/2023-vala-dashboard-parent.png" alt="2023-vala-dashboard-parent.png" /></p>

<p>Then test adding item category and the parent item category into the same visual to further analyse the results.</p>

<p><img src="/img/2023-vala-dashboard-parent-item-visual.png" alt="2023-vala-dashboard-parent-item-visual.png" /></p>

<p><img src="/img/2023-vala-dashboard-parent-item.png" alt="2023-vala-dashboard-parent-item.png" /></p>

<p>Congratulations - You have finished creating and customising a dashboard in PowerBI!!</p>

<h2 id="section-7">Section 7</h2>
<p><em>Extra</em></p>

<p>If you want to test yourself further, try and import the <code class="language-plaintext highlighter-rouge">emily-bronte-records-british-library.csv</code> file from  <a href="https://github.com/justinkelly/vala_2023_techcamp_powerbi/blob/main/emily-bronte-records-british-library.csv">https://github.com/justinkelly/vala_2023_techcamp_powerbi/blob/main/emily-bronte-records-british-library.csv</a> into PowerBI and see if you can figure out how to count the distinct number of ISBNs in this dataset.</p>

<p>Noting that there can be multiple ISBNs in a single field</p>

<p>Refer below for steps on how to do this</p>

<p><img src="/img/2023-vala-emily-isbn.png" alt="2023-vala-emily-isbn.png" /></p>

<p><img src="/img/2023-vala-emily-split.png" alt="2023-vala-emily-split.png" /></p>

<p><img src="/img/2023-vala-emily-count.png" alt="2023-vala-emily-count.png" /></p>]]></content><author><name></name></author><category term="vala," /><category term="powerbi" /><summary type="html"><![CDATA[This page details the notes for the VALA 2023 TechCamp PowerBI Workshop All files are available on GitHub at: https://github.com/justinkelly/vala_2023_techcamp_powerbi Introduction About me What is PowerBI What the focus is today 80% will be about visualisation and dashboard creation Links Files that will be used in the workshop: bit.ly/vala23-powerbi-files Detailed notes for the workshop: bit.ly/vala23-powerbi-notes Slides: bit.ly/vala23-powerbi-slides What is PowerBI and why would I use it? So we can go from Excel and CSV files, to a system that can produce meaningfull dashboards and reports, can used to to analyse data, create insights and are easily shareable. Discuss the difference between Excel and PowerBi and why use 1 or the other Why are we doing this? Why PowerBI? Desktop is free, and its Microsoft makes available the professional and enterprise versions of PowerBI very cheaply to public institutions - meaning most universities will already have PowerBI available for library staff to use - or at least request access to Why visualisation? Librarians know the stories of how your libraries are operating and evolving But if we can’t present data to support our stories and feelings, we can’t have useful discussions with senior management etc. about the future of your libraries Give you tools and knowledge so you can have a positive impact for your libraries Access to data is the main problem Links This document: https://justin.kelly.org.au/powerbi-for-librarians-2023/ Presentation: https://bit.ly/vala23-powerbi-slides Files https://github.com/justinkelly/vala_2023_techcamp_powerbi What we are going to do: Turn an excel/csv file into a maintainable, share-able and accurate dashboard, that allows you to answer complex queries simply using Power BI. We’ll use the loans data that data.gov.uk has made available. In this case, loans for 98 Northern Ireland public libraries https://data.world/datagov-uk/47ee1f4e-861b-46d0-85ab-33be2cf6bc17 Section 1 What is PowerBI, intro to its interface, and import a CSV Import the data Import a CSV file using the Text/CSV option from the Get Data menu Import the Loans file libraries-ni-annual-loans-summary-2021-2022.csv Review contents and select Load This will load into PowerBI the contents of the libraries-ni-annual-loans-summary-2021-2022.csv file, and will create a table in PowerBI called libraries-ni-annual-loans-summary-2021-2022 Save the file If you’ve managed to get this far - congratulations - you’ve just completed one of the hardest probably - just getting the data and imported into PowerBI Section 2 Create some reports Go into Report view And now we’ll cover the most use PowerBI visualisations Click on the first visualisation in the top left of the Visualizations pane. This will add a blank stacked bar chart to the page Now click some data elements from the Data pane, such as Item Category, Total, and Year. PowerBI will automatically place the different elements into the Y-axis, X-axis and Legend section of the Visualizations pane. Try moving the different data elements into the different axis and legend sections to test the results. Once you have data displaying in your visual, go through each of the visualisation options and adjust the data elements to experiment. Stacked bar chart Stacked column chart Clustered bar chart Clustered column chart 100% Stacked bar chart 100% Stacked column chart Line chart Pie charts Donut charts Ribbon Treemap Waterfall Decomposition Tree Card Multi card Save the file Section 3 Dashboard creation We have covered the most used visualisations in PowerBI, now we’ll start to use them to create dashboards. Dashboards are pages in PowerBI that can contain 1 or more visualisation - such as in the image below To make a new dashboard, click on the + icon at the bottom of the page - refer image below And we’ll bring in 3 different visuals into the 1 page For the first visual, click the Stacked column chart icon and put Year in the X-axis and Total in the Y-axis. Next, click the Stacked bar chart icon and drag it below (or anywhere you prefer) the first visual and put Static_Library_Name in the Y-axis, Total in the Y-axis, and Static_Library_Name in the Legend. Next, click the Stacked bar chart icon again and drag it to the left (or anywhere you prefer) of the first visual and expand the height to the full length of the dashboard. Put Item_Category in the Y-axis, Total in the Y-axis, and Item_Category in the Legend. Done, you’ve created your first dashboard! Section 4 Make it pretty and add filters and slicers Now we have the base dashboard created, the next step is to format the dashboard to improve the usability and visuals Change the title of a visualisation: Click the visualisation In the Visualizations pane, select the format section Open the Title section and change the Text to the desired title you want Filter a visualisation to top 5 items: Click on the visualisation In the Fitlers pane select Filters on this visual Change Filter type to Top N Enter 5 in the Show items section In By value drag in Total from the data pane Add your logo to the top of the page Download the logo if you don’t have it from https://github.com/justinkelly/vala_2023_techcamp_powerbi/blob/main/vala-logo.png And from the Insert menu, select Image refer image below Select your logo, then resize and drag it to your preferred location Change the background of the dashboard Click on an empty space in your dashboard, select the Visualizations pane, then the Format page tab, then expand the Wallpaper section. Here you can select a colour, or image, and the dashboard background. Add rounded borders to the reports Click on a visualisation in your dashboard, then click Format visual from the Visualizations pane. Next, select the General tab, and expand the Effects section. Here, expand Visual board drag the slider below Rounded corners to adjust the size of the visuals rounded borders. Add slicers Slicers are very similar to filters, except they have a much easier to use and look much better on your dashboard. To add a slicer to your dashboard, click on the Slicer icon in the visualisations list, then drag the data element you want to filter all reports on into the Field section. In our example, we’ll drag Year into the Field section. Bring it all together Once these formatting updates have been done, the dashboard should be easier to understand, nicer looking, and easier to use. Section 5 Data manipulation Review the data and notice the pattern in the Item Category names. How can we make all these categories simpler and easier to read? To automatically group the item categories into groups of similar items, we can take the first word from each item category to create. Such as group Junior Fiction, Junior Fiction Spoken Word and Junior Non-Ficton under a parent category of Junior. To do this, from the Home menu, select Transform data and then Transform data from the sub-menu. Right click on Item_Category and select Duplicate Column. This will enable us to edit a copy of this column to create the new parent group. To grad just the first word of the item category, select Split Column from the menu, and then By Delimiter Choose Space as the delimiter and select Left-most delimiter in the Split at section, then click OK This will create a new column which has only the first work of then Item Category. To clean up the data, delete the duplicate item category which has the full item category The result will be a new field with a parent category automatically created, with the original field - now as a child category Then Close &amp; Apply, and Save the file. Section 6 Final changes To use the new Parent Item Category - review the visual in the dashboard and adjust the Item Categories visual to use the new parent item category. Then test adding item category and the parent item category into the same visual to further analyse the results. Congratulations - You have finished creating and customising a dashboard in PowerBI!! Section 7 Extra If you want to test yourself further, try and import the emily-bronte-records-british-library.csv file from https://github.com/justinkelly/vala_2023_techcamp_powerbi/blob/main/emily-bronte-records-british-library.csv into PowerBI and see if you can figure out how to count the distinct number of ISBNs in this dataset. Noting that there can be multiple ISBNs in a single field Refer below for steps on how to do this]]></summary></entry><entry><title type="html">Transcribing &amp;amp; enriching WW1 handwritten diaries from the State Library of Victoria with AI</title><link href="https://justin.kelly.au/transcribing-enriching-ww1-handwritten-diaries-from-the-state-library-of-victoria-with-ai/" rel="alternate" type="text/html" title="Transcribing &amp;amp; enriching WW1 handwritten diaries from the State Library of Victoria with AI" /><published>2021-11-04T00:00:00+11:00</published><updated>2021-11-04T00:00:00+11:00</updated><id>https://justin.kelly.au/transcribing-enriching-ww1-handwritten-diaries-from-the-state-library-of-victoria-with-ai</id><content type="html" xml:base="https://justin.kelly.au/transcribing-enriching-ww1-handwritten-diaries-from-the-state-library-of-victoria-with-ai/"><![CDATA[<h2 id="summary">Summary</h2>

<ul>
  <li>For legible pages from World War 1 handwritten diaries held at the State Library of Victoria, AI services are able to correctly transcribe them at an level between 10% to 49% accuracy.</li>
  <li>For pages with a low level of accuracy, sections transcribed correctly add to the narrative of the diaries and further our understanding of the authors experience.</li>
  <li>Furthermore, even with a low level of accuracy, text analysis AI is able to enrich the transcription with very useful geographic location and entity recognition information.</li>
  <li>At a little over 1 cent (AUD) per page, AI based transcription and enrichment is a very cost efficient way to improve the accessibility of a library’s digital collection.</li>
</ul>

<h2 id="abstract">Abstract</h2>

<p>Investigate the cost and accuracy of using AI to automatically transcribe and enrich handwritten World War 1 diaries</p>

<h2 id="background">Background</h2>

<p>In 2019, I received the <a href="https://www.slv.vic.gov.au/interact-with-us/fellowships/digital-fellowship/2019-fellow-justin-kelly">SLV Digital Fellowship</a> to investigate computer vision and related AI technologists to automatically enrich the metadata for SLV digital records.</p>

<p>The result of this fellowship is <a href="http://biblio.ai/">Biblio.ai</a> - an open source web service that connects to libraries digital repositories, extracts the available digital objects and uses AI and 
computer vision to automatically enhance the metadata of the records - with the primary focus on accessibility.</p>

<h2 id="introduction">Introduction</h2>

<p>State Library of Victoria has a collection of scanned handwritten diaries from Australian soldiers in World War 1. These records all have detailed bibliographic summaries - but none have been transcribed. In this blog post we’ll detail 3 selected pages from the 80 page diary and review and score the AI transcribe and enrichment.</p>

<p>Thanks to <a href="https://www.linkedin.com/in/shona-dewar-47b65074">Shona Dewar</a> from SLV for sending through a selection of diaries! In this blog post the diary to be analysed is of Private Edgar Atheling Davis, who served in two periods between 1915 and 1919. Available online at <a href="http://handle.slv.vic.gov.au/10381/311907">http://handle.slv.vic.gov.au/10381/311907</a></p>

<h2 id="first-selected-page---file-25-of-83">First selected page - file 25 of 83</h2>

<p><img src="/img/2021-11-file-1.png" alt="2021-11-file-1.png" /></p>

<p>Source record:</p>

<p><a href="https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637141">https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637141</a></p>

<p>AI results:</p>

<p><a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637141">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637141</a></p>

<p>Text transcription component of AI results:</p>

<p><img src="/img/2021-11-file-1-text.png" alt="2021-11-file-1-text.png" /></p>

<p><strong>Score</strong></p>

<ul>
  <li>Words correct: 8</li>
  <li>Words total: 80</li>
  <li>Percent correct: 10%</li>
</ul>

<p><strong>Summary</strong></p>

<p>Even with the low level of accuracy - being able to transcribe the key section <code class="language-plaintext highlighter-rouge">occasional artillery duels</code> adds to the narrative of the document. The focus of <a href="http://Biblio.ai">Biblio.ai</a> is on accessibility - any untranscribed documents are 100% inaccessible to the visually impaired - so I count any correct automated transcription a positive for users with accessibility limitations.</p>

<h2 id="2nd-selected-page---file-38-of-83">2nd selected page - file 38 of 83</h2>

<p><img src="/img/2021-11-file-2.png" alt="2021-11-file-2.png" /></p>

<p>Source image:</p>

<p><a href="https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637103">https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637103</a></p>

<p>AI results:</p>

<p><a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637103">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637103</a></p>

<p>Text transcription component of AI results:</p>

<p><img src="/img/2021-11-file-2-text.png" alt="2021-11-file-2-text.png" /></p>

<p><strong>Score:</strong></p>

<ul>
  <li>Words correct: 11</li>
  <li>Words total: 75</li>
  <li>Percent correct: 15%</li>
</ul>

<p><strong>Enrichment</strong></p>

<p>On this page there was a number of references to what looks like locations</p>

<p><img src="/img/2021-11-file-2-locations.png" alt="2021-11-file-2-locations.png" /></p>

<p><code class="language-plaintext highlighter-rouge">Charleroi</code> (highlighted in red above) was automatically identified as a geographic location - while <code class="language-plaintext highlighter-rouge">La Harve</code> was identified just from the text <code class="language-plaintext highlighter-rouge">Harve</code> (highlighted in blue above) in the document</p>

<p>AI text analysis - locations</p>

<p><img src="/img/2021-11-file-2-locations-ai.png" alt="2021-11-file-2-locations-ai.png" /></p>

<p>What about the text highlighted in yellow? This was transcribed as <code class="language-plaintext highlighter-rouge">Havefleur</code> - which is not a known location.</p>

<p>With the automatically identified location information and some analysis on Google Maps we can try and find <code class="language-plaintext highlighter-rouge">Havefleur</code>  and other possible locations mentioned in the diary page.</p>

<p>What was transcribed as <code class="language-plaintext highlighter-rouge">Havefleur</code> is most likely the French town of <code class="language-plaintext highlighter-rouge">Harfleur</code></p>

<p><img src="/img/2021-11-file-2-locations-map.png" alt="2021-11-file-2-locations-map.png" /></p>

<p><strong>Summary</strong></p>

<p>Text analysis AI with automatic geographic location recognition proves crucial in furthering our understanding of the document. Even when only a small percent of words are transcribed correctly the extra information AI can add brings improved levels of comprehension with the contents of the document.</p>

<p>We must also remembers not only visually impaired users, but users from non English speaking backgrounds and younger learners - who may not be familiar with historical cursive script and European geography. Enabling a document to be enriched with information such as Google Map links for mentioned locations improved the user experience and bring a new depth to the digital image.</p>

<h2 id="3rd-selected-page---file-22-of-83">3rd selected page - file 22 of 83</h2>

<p><img src="/img/2021-11-file-3.png" alt="2021-11-file-3.png" /></p>

<p>Source image: <a href="https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637169">https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637169</a></p>

<p>AI results: <a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637169">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637169</a></p>

<p>Text transcription component of AI results:</p>

<p><img src="/img/2021-11-file-3-text.png" alt="2021-11-file-3-text.png" /></p>

<p><strong>Score</strong></p>

<ul>
  <li>Words correct: 41</li>
  <li>Words total: 84</li>
  <li>Percent correct: 49%</li>
</ul>

<p><strong>Summary</strong></p>

<p>While this page had a high level of accuracy of 49% - for me the most important piece of  information of automatic identification of <code class="language-plaintext highlighter-rouge">Salisbury</code> as a geographic location. With that information, we can make educated guessed at the incorrectly transcribed word - such as <code class="language-plaintext highlighter-rouge">White house</code> refers the the ancient White horse symbol in Salisbury and <code class="language-plaintext highlighter-rouge">bathedral</code> is actually refers to the Salisbury cathedral.</p>

<h2 id="cost">Cost</h2>

<p>Using <a href="http://Biblio.ai">Biblio.ai</a> and the underlying MS Azure services the cost to automatically transribe and enrich the entire 83 pages of Private Edgar Atheling Davis WW1 diary is 86 cents AUD. A little over 1 cent AUD per page.</p>

<p><img src="/img/2021-11-cost.png" alt="2021-11-cost.png" /></p>

<h2 id="conclusion">Conclusion</h2>

<p>With the focus on improving the accessiblity of a libraries digital collection and enriching metdata automatically (and not on 100% perfect word-for-word transcription) -  AI services enabled by Biblio.AI can play an important role in improving the library experience - especially for members with visaully impairments, young learners and people from non-Enlighs speaking backgrounds.</p>

<p>The diary analysised in this blog post, most likely would never be manually transcribed. Manual transcription of historical documents such as this diary is a very time consuiming and expensive process - and normally limited to historical important documents - or works by important public figures.</p>

<p>By using AI services like Biblio.AI, a library can automatically transcribe (at less than perfect levels of accuracy) their entire digital collection for just over 1 cent AUD per page. From the examples provided above I hope to present the clear picture that when focused on acessibility - any automated transcription and enrichment - no matter the level of accuracy - is better than none.</p>]]></content><author><name></name></author><category term="biblo.ai," /><category term="slv," /><category term="AI" /><summary type="html"><![CDATA[Summary For legible pages from World War 1 handwritten diaries held at the State Library of Victoria, AI services are able to correctly transcribe them at an level between 10% to 49% accuracy. For pages with a low level of accuracy, sections transcribed correctly add to the narrative of the diaries and further our understanding of the authors experience. Furthermore, even with a low level of accuracy, text analysis AI is able to enrich the transcription with very useful geographic location and entity recognition information. At a little over 1 cent (AUD) per page, AI based transcription and enrichment is a very cost efficient way to improve the accessibility of a library’s digital collection. Abstract Investigate the cost and accuracy of using AI to automatically transcribe and enrich handwritten World War 1 diaries Background In 2019, I received the SLV Digital Fellowship to investigate computer vision and related AI technologists to automatically enrich the metadata for SLV digital records. The result of this fellowship is Biblio.ai - an open source web service that connects to libraries digital repositories, extracts the available digital objects and uses AI and computer vision to automatically enhance the metadata of the records - with the primary focus on accessibility. Introduction State Library of Victoria has a collection of scanned handwritten diaries from Australian soldiers in World War 1. These records all have detailed bibliographic summaries - but none have been transcribed. In this blog post we’ll detail 3 selected pages from the 80 page diary and review and score the AI transcribe and enrichment. Thanks to Shona Dewar from SLV for sending through a selection of diaries! In this blog post the diary to be analysed is of Private Edgar Atheling Davis, who served in two periods between 1915 and 1919. Available online at http://handle.slv.vic.gov.au/10381/311907 First selected page - file 25 of 83 Source record: https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637141 AI results: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637141 Text transcription component of AI results: Score Words correct: 8 Words total: 80 Percent correct: 10% Summary Even with the low level of accuracy - being able to transcribe the key section occasional artillery duels adds to the narrative of the document. The focus of Biblio.ai is on accessibility - any untranscribed documents are 100% inaccessible to the visually impaired - so I count any correct automated transcription a positive for users with accessibility limitations. 2nd selected page - file 38 of 83 Source image: https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637103 AI results: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637103 Text transcription component of AI results: Score: Words correct: 11 Words total: 75 Percent correct: 15% Enrichment On this page there was a number of references to what looks like locations Charleroi (highlighted in red above) was automatically identified as a geographic location - while La Harve was identified just from the text Harve (highlighted in blue above) in the document AI text analysis - locations What about the text highlighted in yellow? This was transcribed as Havefleur - which is not a known location. With the automatically identified location information and some analysis on Google Maps we can try and find Havefleur and other possible locations mentioned in the diary page. What was transcribed as Havefleur is most likely the French town of Harfleur Summary Text analysis AI with automatic geographic location recognition proves crucial in furthering our understanding of the document. Even when only a small percent of words are transcribed correctly the extra information AI can add brings improved levels of comprehension with the contents of the document. We must also remembers not only visually impaired users, but users from non English speaking backgrounds and younger learners - who may not be familiar with historical cursive script and European geography. Enabling a document to be enriched with information such as Google Map links for mentioned locations improved the user experience and bring a new depth to the digital image. 3rd selected page - file 22 of 83 Source image: https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=FL19637169 AI results: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL19637169 Text transcription component of AI results: Score Words correct: 41 Words total: 84 Percent correct: 49% Summary While this page had a high level of accuracy of 49% - for me the most important piece of information of automatic identification of Salisbury as a geographic location. With that information, we can make educated guessed at the incorrectly transcribed word - such as White house refers the the ancient White horse symbol in Salisbury and bathedral is actually refers to the Salisbury cathedral. Cost Using Biblio.ai and the underlying MS Azure services the cost to automatically transribe and enrich the entire 83 pages of Private Edgar Atheling Davis WW1 diary is 86 cents AUD. A little over 1 cent AUD per page. Conclusion With the focus on improving the accessiblity of a libraries digital collection and enriching metdata automatically (and not on 100% perfect word-for-word transcription) - AI services enabled by Biblio.AI can play an important role in improving the library experience - especially for members with visaully impairments, young learners and people from non-Enlighs speaking backgrounds. The diary analysised in this blog post, most likely would never be manually transcribed. Manual transcription of historical documents such as this diary is a very time consuiming and expensive process - and normally limited to historical important documents - or works by important public figures. By using AI services like Biblio.AI, a library can automatically transcribe (at less than perfect levels of accuracy) their entire digital collection for just over 1 cent AUD per page. From the examples provided above I hope to present the clear picture that when focused on acessibility - any automated transcription and enrichment - no matter the level of accuracy - is better than none.]]></summary></entry><entry><title type="html">Using AI to automatically transcribe 1890s handwritten Jenkins Diaries from the State Library of Victoria</title><link href="https://justin.kelly.au/using-ai-to-automatically-transcribe-1890s-handwritten-jenkins-diaries-from-the-state-library-of-victoria/" rel="alternate" type="text/html" title="Using AI to automatically transcribe 1890s handwritten Jenkins Diaries from the State Library of Victoria" /><published>2021-09-20T00:00:00+10:00</published><updated>2021-09-20T00:00:00+10:00</updated><id>https://justin.kelly.au/using-ai-to-automatically-transcribe-1890s-handwritten-jenkins-diaries-from-the-state-library-of-victoria</id><content type="html" xml:base="https://justin.kelly.au/using-ai-to-automatically-transcribe-1890s-handwritten-jenkins-diaries-from-the-state-library-of-victoria/"><![CDATA[<h2 id="summary">Summary</h2>

<ul>
  <li>21 of the 25 volumes of the Jenkins diaries have been transcribed at near perfect level of accuracy over an approximate <strong>13 years</strong> by State Library of Victoria (SLV) staff and volunteers.</li>
  <li>3 volumes where transcribes by the AI based web service <a href="https://biblio.ai">Biblio.ai</a> in approximately <strong>5 hours</strong> at a level of accuracy of between 40% and 83% (excluding impossible to read, totally damaged or extremely hard to read pages) - at a cost of under $10 Australian Dollars.</li>
  <li>For the purpose of accessibility and quantity rather than accuracy and quality, AI based handwriting to text presents a new solutions to complement manual transcriptions of historical documents.</li>
</ul>

<h2 id="abstract">Abstract</h2>

<p>Investigation of the purpose, speed, cost and accuracy of using AI to automatically transcript 1890s era handwritten diaries of an itinerant labourer in gold rush era Central Victoria.</p>

<h2 id="introduction">Introduction</h2>

<p>In 1997, the State Library of Victoria (SLV) acquired 25 volumes of the handwritten diaries of Joseph Jenkins - itinerant labourer in gold rush era Central Victoria. In 2018, an internal SLV project was started by staff and volunteers to manually transcribe the Jenkins diaries. In the last 13 years, 21 of the 25 volumes have been transcribed - with an accuracy of around 99% of words being transcribed.</p>

<p>In 2019, I received the SLV Digital Fellowship ( <a href="https://www.slv.vic.gov.au/interact-with-us/fellowships/digital-fellowship/2019-fellow-justin-kelly">https://www.slv.vic.gov.au/interact-with-us/fellowships/digital-fellowship/2019-fellow-justin-kelly</a> ) to look at using computer vision and related AI technologists to automatically enrich the metadata for SLV digital records.</p>

<p>The result of this fellowship is <a href="http://biblio.ai">Biblio.ai</a> - an open source web service that connects to libraries digital repositories, extracts the available digital objects and uses AI and computer vision to automatically enhance the metadata of the records - with the primary focus on accessilibity.</p>

<p>One of the many features of <a href="http://biblio.ai">Biblio.a</a> is the ability to convert handwriting to text, using Microsoft Azure Cognitive Services. Given the importance of the Jenkins diaries to this history of Victoria - I though these diaries would be a prefect test subject to document, analyse and measure the affectiveness of AI based transcriptions of historical handwritten doucments.</p>

<p>4 pages of handwritting where chosen as test records, ranging from easy (cursive, historical and clear) to very hard to read (cursive, historical, words joined together, and squashed)</p>

<h2 id="easy">Easy</h2>

<p>First up is a letter from Volume 1 - to Joseph Jenkins</p>

<p>Original document: <a href="https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371699&amp;mode=browse">https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371699&amp;mode=browse</a></p>

<p><img src="/img/2021-09-jenkins-page1.png" alt="2021-09-jenkins-page1.png" /></p>

<p>AI API result: <a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371699">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371699</a></p>

<h3 id="ai-transcription">AI transcription</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Pheola Victoria
Aug 184 1592
just a few lines to 
let you know S received your kind 
letter to-day after waiting for a long 
time to hear from you. I' was 
asthonished to tear of My Brother 
- in law to blame me wrongful for  
Squandering my Brother property 
what I did notget. Jagn surprised 
that they would not to me Swrote  
home as I told you before and gave  
them full particulars and ( got  
no answer I told them their was  
a will lift but being concealed so  
long it would cost Ignore to get it 
</code></pre></div></div>

<h3 id="results">Results</h3>

<ul>
  <li>Original document -
    <ul>
      <li>Words: 102</li>
    </ul>
  </li>
  <li>AI transcription:
    <ul>
      <li>Words correct: 85</li>
      <li>Percent correct: 83%</li>
    </ul>
  </li>
</ul>

<h2 id="medium">Medium</h2>

<p>Next up are letters and diary entries by Joseph with increasing level of difficulty to read</p>

<p>Original document: <a href="https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371649&amp;mode=browse">https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371649&amp;mode=browse</a></p>

<p><img src="/img/2021-09-jenkins-page2.png" alt="2021-09-jenkins-page2.png" /></p>

<p>AI API result: <a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371649">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371649</a></p>

<h3 id="api-results">API results</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The Parapet have now been build higher than ever 
War young flock. for this year tourists of to lamb. 
10 paluss of colts and another wheely to come 
I loves as far, the winter has been moist 
durm, thay, and color. this mouth has not
in little warmer. Hto have forty anche 
wherethe heart, all been Down, the hohate 
Blades are very backward. 
The Diary I send two papers off with the 
</code></pre></div></div>

<h3 id="results-1">Results</h3>

<ul>
  <li>Original document:
    <ul>
      <li>Words: 94</li>
    </ul>
  </li>
  <li>AI transcription:
    <ul>
      <li>Words correct: 49</li>
      <li>Percent correct: 52%</li>
    </ul>
  </li>
</ul>

<h2 id="hard">Hard</h2>

<p>Original document: <a href="https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371576&amp;mode=browse">https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371576&amp;mode=browse</a></p>

<p><img src="/img/2021-09-jenkins-page3.png" alt="2021-09-jenkins-page3.png" /></p>

<p>AI API result: <a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371576">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371576</a></p>

<h3 id="ai-transcrtiption">AI transcrtiption</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sie, Inving on the b the of fuly last year  
ligh a certain quorumcript in your 
Care whoon the consider ation than ! 
would Either call for them after wounds 
Or lund Marups for you to return 
there . Many times I called and ual 
a huepenper for them, but would wuts 
five you haw fam peing to 
Return. they was intended for monitors 
"lie the Leader " at forth and were
that he loos anceable with balls 
Content, bat too lengthy and hudded 
in too many subjects for any 
newspaper, and that it would 
take the lashole hace in the seadue 
It would fill nearly 3' columns 
</code></pre></div></div>

<h3 id="results-2">Results</h3>

<ul>
  <li>Original document:
    <ul>
      <li>Words: 127</li>
    </ul>
  </li>
  <li>AI transcription:
    <ul>
      <li>Correct words: 58</li>
      <li>Percent correct: 46%</li>
    </ul>
  </li>
</ul>

<h2 id="very-hard">Very hard</h2>

<p><a href="https://viewer.slv.vic.gov.au/?entity=IE20353400&amp;file=FL20371855&amp;mode=browse">https://viewer.slv.vic.gov.au/?entity=IE20353400&amp;file=FL20371855&amp;mode=browse</a></p>

<p><img src="/img/2021-09-jenkins-page4.png" alt="2021-09-jenkins-page4.png" /></p>

<p>AI API result: <a href="https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371855">https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371855</a></p>

<h3 id="ai-transcription-1">AI transcription</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I have to get oh to attend to onature. I slapher gain lentils b a - one five 
fell and gloomy. we had a towerat T so I had an Excuse feetbe late  
She wh work the became live and pleasant before croon ) came home 
the for dinner and took full two hever to cook and lak any  
we. diviner Ido not feel thong enough to walk down to the transtion 
abo to I took a heaver job that wanted to be scouronly rum 
an I came homeEarly and Inade myself Ready for hash 

</code></pre></div></div>

<h3 id="results-3">Results</h3>

<ul>
  <li>Original document:
    <ul>
      <li>Words: 110</li>
    </ul>
  </li>
  <li>AI transcription:
    <ul>
      <li>Words correct: 58</li>
      <li>Percent correct : 53%</li>
    </ul>
  </li>
</ul>

<h2 id="cost">Cost</h2>

<p>3 full volumes of the Jenkins diaries plus two thousand other SLV images were processed at a total cost of $8.11 Australian Dollars.</p>

<p><a href="https://biblio.ai">Biblio.ai</a> - Open source web service create by me to connect libraries, APIs, and AI services: $0</p>

<p><a href="https://azure.microsoft.com/en-us/services/cognitive-services/#overview">Micrsoft Azure Congitive Services</a> to do a number of different requests per image: $8.11 Australian Dollars</p>

<p><img src="/img/2021-09-jenkins-cost.png" alt="2021-09-jenkins-cost.png" /></p>

<p>Database and API services provided the open source web server <a href="https://supabase.io/">Supabase.io</a>: $0</p>

<p>Serverless functions - the Biblio.ai API results - from <a href="https://www.netlify.com/">Netlify</a>: $0</p>

<h2 id="quantity-or-quality">Quantity or Quality?</h2>

<p>My main focus of my SLV Digital Fellowship is how increase the accessiblity of digital items and collections - particulary focused on people with visual, hearing or learning difficultities - using AI and realted technologies.</p>

<p>As a result of my reaserch in this area and the development of <a href="http://biblio.ai">biblio.ai</a> - it is my personal opinion that anything that can be done (no matter if it’s not perfect) to improve accessibility is a positive result.</p>

<p>For people with visual or learning difficulttiues, a Library having historical handwritten documents with no transcriptions, is the same is having no documents at atll.</p>

<p>Any transcription - with appropriate caveats/notices - is better than none.</p>

<p>Many Librarians are hesitant to make public something that is not perfect, when focused around items of limited quantity - this makes sense, but when a library such as SLV has hundreds of thousands of items undescribed, a more pragmatic is recommended.</p>

<h2 id="conclusion">Conclusion</h2>

<p>For the purpose of accessibility and quantity rather than accuracy and quality, AI based handwriting to text presents a new solutions to complement manual transcriptions of historical documents.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Summary 21 of the 25 volumes of the Jenkins diaries have been transcribed at near perfect level of accuracy over an approximate 13 years by State Library of Victoria (SLV) staff and volunteers. 3 volumes where transcribes by the AI based web service Biblio.ai in approximately 5 hours at a level of accuracy of between 40% and 83% (excluding impossible to read, totally damaged or extremely hard to read pages) - at a cost of under $10 Australian Dollars. For the purpose of accessibility and quantity rather than accuracy and quality, AI based handwriting to text presents a new solutions to complement manual transcriptions of historical documents. Abstract Investigation of the purpose, speed, cost and accuracy of using AI to automatically transcript 1890s era handwritten diaries of an itinerant labourer in gold rush era Central Victoria. Introduction In 1997, the State Library of Victoria (SLV) acquired 25 volumes of the handwritten diaries of Joseph Jenkins - itinerant labourer in gold rush era Central Victoria. In 2018, an internal SLV project was started by staff and volunteers to manually transcribe the Jenkins diaries. In the last 13 years, 21 of the 25 volumes have been transcribed - with an accuracy of around 99% of words being transcribed. In 2019, I received the SLV Digital Fellowship ( https://www.slv.vic.gov.au/interact-with-us/fellowships/digital-fellowship/2019-fellow-justin-kelly ) to look at using computer vision and related AI technologists to automatically enrich the metadata for SLV digital records. The result of this fellowship is Biblio.ai - an open source web service that connects to libraries digital repositories, extracts the available digital objects and uses AI and computer vision to automatically enhance the metadata of the records - with the primary focus on accessilibity. One of the many features of Biblio.a is the ability to convert handwriting to text, using Microsoft Azure Cognitive Services. Given the importance of the Jenkins diaries to this history of Victoria - I though these diaries would be a prefect test subject to document, analyse and measure the affectiveness of AI based transcriptions of historical handwritten doucments. 4 pages of handwritting where chosen as test records, ranging from easy (cursive, historical and clear) to very hard to read (cursive, historical, words joined together, and squashed) Easy First up is a letter from Volume 1 - to Joseph Jenkins Original document: https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371699&amp;mode=browse AI API result: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371699 AI transcription Pheola Victoria Aug 184 1592 just a few lines to let you know S received your kind letter to-day after waiting for a long time to hear from you. I' was asthonished to tear of My Brother - in law to blame me wrongful for Squandering my Brother property what I did notget. Jagn surprised that they would not to me Swrote home as I told you before and gave them full particulars and ( got no answer I told them their was a will lift but being concealed so long it would cost Ignore to get it Results Original document - Words: 102 AI transcription: Words correct: 85 Percent correct: 83% Medium Next up are letters and diary entries by Joseph with increasing level of difficulty to read Original document: https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371649&amp;mode=browse AI API result: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371649 API results The Parapet have now been build higher than ever War young flock. for this year tourists of to lamb. 10 paluss of colts and another wheely to come I loves as far, the winter has been moist durm, thay, and color. this mouth has not in little warmer. Hto have forty anche wherethe heart, all been Down, the hohate Blades are very backward. The Diary I send two papers off with the Results Original document: Words: 94 AI transcription: Words correct: 49 Percent correct: 52% Hard Original document: https://viewer.slv.vic.gov.au/?entity=IE20347212&amp;file=FL20371576&amp;mode=browse AI API result: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371576 AI transcrtiption Sie, Inving on the b the of fuly last year ligh a certain quorumcript in your Care whoon the consider ation than ! would Either call for them after wounds Or lund Marups for you to return there . Many times I called and ual a huepenper for them, but would wuts five you haw fam peing to Return. they was intended for monitors "lie the Leader " at forth and were that he loos anceable with balls Content, bat too lengthy and hudded in too many subjects for any newspaper, and that it would take the lashole hace in the seadue It would fill nearly 3' columns Results Original document: Words: 127 AI transcription: Correct words: 58 Percent correct: 46% Very hard https://viewer.slv.vic.gov.au/?entity=IE20353400&amp;file=FL20371855&amp;mode=browse AI API result: https://api.biblio.ai/.netlify/functions/data-html?metadata_key=metadata_identifier_file_id&amp;metadata_value=FL20371855 AI transcription I have to get oh to attend to onature. I slapher gain lentils b a - one five fell and gloomy. we had a towerat T so I had an Excuse feetbe late She wh work the became live and pleasant before croon ) came home the for dinner and took full two hever to cook and lak any we. diviner Ido not feel thong enough to walk down to the transtion abo to I took a heaver job that wanted to be scouronly rum an I came homeEarly and Inade myself Ready for hash Results Original document: Words: 110 AI transcription: Words correct: 58 Percent correct : 53% Cost 3 full volumes of the Jenkins diaries plus two thousand other SLV images were processed at a total cost of $8.11 Australian Dollars. Biblio.ai - Open source web service create by me to connect libraries, APIs, and AI services: $0 Micrsoft Azure Congitive Services to do a number of different requests per image: $8.11 Australian Dollars Database and API services provided the open source web server Supabase.io: $0 Serverless functions - the Biblio.ai API results - from Netlify: $0 Quantity or Quality? My main focus of my SLV Digital Fellowship is how increase the accessiblity of digital items and collections - particulary focused on people with visual, hearing or learning difficultities - using AI and realted technologies. As a result of my reaserch in this area and the development of biblio.ai - it is my personal opinion that anything that can be done (no matter if it’s not perfect) to improve accessibility is a positive result. For people with visual or learning difficulttiues, a Library having historical handwritten documents with no transcriptions, is the same is having no documents at atll. Any transcription - with appropriate caveats/notices - is better than none. Many Librarians are hesitant to make public something that is not perfect, when focused around items of limited quantity - this makes sense, but when a library such as SLV has hundreds of thousands of items undescribed, a more pragmatic is recommended. Conclusion For the purpose of accessibility and quantity rather than accuracy and quality, AI based handwriting to text presents a new solutions to complement manual transcriptions of historical documents.]]></summary></entry><entry><title type="html">How to set the default shell in Linux</title><link href="https://justin.kelly.au/how-to-set-the-default-shell-in-linux/" rel="alternate" type="text/html" title="How to set the default shell in Linux" /><published>2021-05-15T00:00:00+10:00</published><updated>2021-05-15T00:00:00+10:00</updated><id>https://justin.kelly.au/how-to-set-the-default-shell-in-linux</id><content type="html" xml:base="https://justin.kelly.au/how-to-set-the-default-shell-in-linux/"><![CDATA[<p>One thing I always forget when logging into a new Linux server is is how to set the default shell.</p>

<p>The below command sets the default to bash</p>

<p><code class="language-plaintext highlighter-rouge">chsh -s /bin/bash</code></p>

<p>From the man page for chsh</p>

<blockquote>
  <p>The chsh command changes the user login shell. This determines the name of the users initial login command. A normal user may only change the login shell for her own account, the superuser may change the login shell for any account</p>
</blockquote>

<p>Replace <code class="language-plaintext highlighter-rouge">/bin/bash</code> with the path to your preferred shell</p>]]></content><author><name></name></author><category term="linux" /><summary type="html"><![CDATA[One thing I always forget when logging into a new Linux server is is how to set the default shell. The below command sets the default to bash chsh -s /bin/bash From the man page for chsh The chsh command changes the user login shell. This determines the name of the users initial login command. A normal user may only change the login shell for her own account, the superuser may change the login shell for any account Replace /bin/bash with the path to your preferred shell]]></summary></entry><entry><title type="html">Introduction to PowerBI for librarians</title><link href="https://justin.kelly.au/introduction-to-powerbi-for-librarians/" rel="alternate" type="text/html" title="Introduction to PowerBI for librarians" /><published>2021-04-10T00:00:00+10:00</published><updated>2021-04-10T00:00:00+10:00</updated><id>https://justin.kelly.au/introduction-to-powerbi-for-librarians</id><content type="html" xml:base="https://justin.kelly.au/introduction-to-powerbi-for-librarians/"><![CDATA[<p><em>This page details the notes for the VALA21 TechCamp PowerBI Workshop</em></p>

<p>All files are available on GitHub at:</p>
<ul>
  <li><a href="https://github.com/justinkelly/vala_2021_techcamp_powerbi">https://github.com/justinkelly/vala_2021_techcamp_powerbi</a></li>
</ul>

<h3 id="introduction">Introduction</h3>

<ul>
  <li>About me</li>
  <li>What is PowerBI</li>
  <li><a href="https://bit.ly/vala21_powerbi_pres">https://bit.ly/vala21_powerbi_pres</a></li>
  <li>What the focus is today
    <ul>
      <li>80% will be about data, data formats and data structure</li>
      <li>Thinking about what’s in the files, how do we connect them, and if there is no current way to join them - how to we make a way, and what are the limitations</li>
    </ul>
  </li>
</ul>

<h3 id="what-is-powerbi-and-why-would-i-use-it">What is PowerBI and why would I use it?</h3>

<p>So we can go from a disparate bunch of Excel and CSV files, to a connected system/database</p>

<p><img src="/img/2021-powerbi2-0-why.png" alt="2021-powerbi2-0-why.png" /></p>

<p>That can produce meaningfull reports and dashboards</p>

<p><img src="/img/2021-powerbi-1-branch.png" alt="2021-powerbi-1-branch.png" /></p>

<p>Discuss difference between Excel and PowerBi and why use 1 or the other</p>

<h3 id="why-are-we-doing-this">Why are we doing this?</h3>

<ul>
  <li>Librarians know the stories of how your libraries are operating and evolving</li>
  <li>But if we can’t present data to support our stories and feelings we can’t have useful discussions with senior management etc. about the future of your libraries</li>
  <li>Give you tools and knowledge so you can have a positive impact for your libraries</li>
</ul>

<h3 id="links">Links</h3>

<ul>
  <li>This document: <a href="https://justin.kelly.org.au/vala21-techcamp-powerbi-workshop/">https://justin.kelly.org.au/vala21-techcamp-powerbi-workshop/</a></li>
  <li>Presentation: <a href="https://bit.ly/vala21_powerbi_pres">https://bit.ly/vala21_powerbi_pres</a></li>
  <li>Files
    <ul>
      <li><a href="https://bit.ly/vala21_powerbi">https://bit.ly/vala21_powerbi</a></li>
      <li><a href="https://github.com/justinkelly/vala_2021_techcamp_powerbi">https://github.com/justinkelly/vala_2021_techcamp_powerbi</a></li>
    </ul>
  </li>
</ul>

<h3 id="problem">Problem:</h3>

<p>In many cases we in libraries want to do some data analysis on our library data but for 1 of many reasons we can’t - such as:</p>

<ul>
  <li>The LMS has a limited reporting functionality</li>
  <li>Only specific people allowed to use the LMS admin or reporting functionality</li>
  <li>LMS reporting/data is siloed - where you can analyses data separately - but not linked
    <ul>
      <li>ie - can view all the bibliographic data - but not join this with the loans or the purchase info</li>
      <li>So can’t get a full view of whats happening in your library</li>
    </ul>
  </li>
  <li>LMS admins/system people have said that to do reporting you need to do SQL and need permission from IT
    <ul>
      <li>SQL is to hard and you have to know to much about that data</li>
      <li>IT says you can’t get permission for SQL access and should talk to the LMS admins/reporting - going round in circles not getting anywhere</li>
    </ul>
  </li>
</ul>

<p>In many cases we resort to using the only tool we have available (MS Excel) and trying to stitch together lots of different files together (or copy and paste large xls file contents to manual append data)  into one big hot mess</p>

<h3 id="purpose">Purpose:</h3>

<p>We can overcome the obstacles that are in the way of a librarian from doing the data work required in their role.</p>

<h3 id="scenario">Scenario</h3>

<p>Can’t get access to useful library reporting systems - but the LMS admins have been nice enough to give you a bunch of seemly random data extracts covering catalogue structure, bibliographic info, loans and purchase data</p>

<p>We know that the data in the different files are related but when you open the files the data formats are all different and the info seems all mixed up</p>

<p>How do we unscramble the mess of data we’ve been given and turn it into a system that we can derive insights and analysis.</p>

<p>When we do the analysis we’ll do some examples using possible roles within a library such as subject librarian, branch librarian and university librarian.</p>

<p>The data we are going to use today is a mix of</p>

<ul>
  <li>Real bibliographic info based on Newcastle (UK) Libraries catalogue titles
    <ul>
      <li><a href="https://data.world/datagov-uk/364b7c8f-3917-49bb-aae2-8e504a6fde11">https://data.world/datagov-uk/364b7c8f-3917-49bb-aae2-8e504a6fde11</a></li>
    </ul>
  </li>
  <li>Dewey Decimal System
    <ul>
      <li><a href="https://github.com/Alacritous/DDS_directory_structure">https://github.com/Alacritous/DDS_directory_structure</a></li>
    </ul>
  </li>
  <li>Randomised loans, prices and quantity data</li>
  <li>Selected Melbourne University library locations</li>
  <li>This data has some limitations that can’t be solved - what do we do in this case?</li>
</ul>

<p><strong>What we are going to do:</strong></p>

<p>Turn a bunch of excel and csv files into a maintainable, share-able and accurate dashboard, that allows you to answer complex queries simply using Power BI.</p>

<p>The important aspects are not PowerBI and dashboards - 80% of what we will cover today is data related and can be used with other tools and environments</p>

<p><strong>Branch librarian for the Baillieu Library</strong>
<em>want to see the change in collection use for my branch</em></p>

<p><img src="/img/2021-powerbi-1-branch.png" alt="2021-powerbi-1-branch.png" /></p>

<p><strong>University Librarian</strong>
<em>I want to look at collection mix over all locations</em></p>

<p><img src="/img/2021-powerbi-2-unilib.png" alt="2021-powerbi-2-unilib.png" /></p>

<p><strong>Subject Librarian for Physics at Melbourne University</strong>
<em>I want to see the detailed analysis within Physics for each branch</em></p>

<p><img src="/img/2021-powerbi-3-subject.png" alt="2021-powerbi-3-subject.png" /></p>

<p><strong>Files we have</strong></p>

<p>1 file that describe the structure/hierarchy of the collection</p>

<p>This this example we are using the 3 level of Dewey - but you can also think of similar internal structure that your organisation may use to define collections, parent collections and sub-collection</p>

<p>List of branch locations</p>

<p>2 files that list loans, purchase cost, a holding quantity, per year for each location - 1 file per year</p>

<h2 id="section-1">Section 1</h2>

<p><em>What is PowerBI and intro to it’s interface</em></p>

<p><img src="/img/2021-powerbi2-1-home.png" alt="2021-powerbi2-1-home.png" /></p>

<p><em>Create the library data hierarchy using the Dewey system</em></p>

<p>1 files <code class="language-plaintext highlighter-rouge">dewey.csv</code></p>

<p><img src="/img/2021-powerbi2-1-dewey_file.png" alt="2021-powerbi2-1-dewey_file.png" /></p>

<h3 id="import-the-data">Import the data</h3>

<p>Import the Dewey file <code class="language-plaintext highlighter-rouge">dewey.csv</code></p>

<p><img src="/img/2021-powerbi2-1-import_csv.png" alt="2021-powerbi2-1-import_csv.png" /></p>

<p>Review contents and select <code class="language-plaintext highlighter-rouge">Load</code></p>

<p>This will load into PowerBI the contents of the <code class="language-plaintext highlighter-rouge">dewey.csv</code> file, and will create a table in PowerBI called <code class="language-plaintext highlighter-rouge">dewey</code></p>

<p><img src="/img/2021-powerbi2-1-dewey_table.png" alt="2021-powerbi2-1-dewey_table.png" /></p>

<p>Intro to Transform Data</p>

<p>Within the Model view in PowerBI select the <code class="language-plaintext highlighter-rouge">Transform Data</code> menu from the top ribbon menu</p>

<p><img src="/img/2021-powerbi-31-section3-1.png" alt="2021-powerbi-31-section3-1.png" /></p>

<p><em>Discuss issue with the dewey column, and how and why will be fix this. Note that the change is done once but is now automated and repeatable</em></p>

<p>We now need to create a new column that correctly formats the dewey code to 3 characters. Name this new column <code class="language-plaintext highlighter-rouge">dewey_code</code></p>

<p>From the <code class="language-plaintext highlighter-rouge">Add columns</code> menu select <code class="language-plaintext highlighter-rouge">Column From Examples</code>
<img src="/img/2021-powerbi2-2-add_example.png" alt="2021-powerbi2-2-add_example.png" /></p>

<p><em>discuss what this is, and why is usefull and how is alternative to programming</em></p>

<p>Go through steps to format into 3 character dewet</p>

<p><strong>Why and how?</strong></p>

<p>We need a way to link the dwey file</p>

<p><img src="/img/2021-powerbi2-1-dewey.png" alt="2021-powerbi2-1-dewey.png" /></p>

<p>Save the file</p>

<p>Then Close &amp; Apply</p>

<p><img src="/img/2021-powerbi-11-section1-6.png" alt="2021-powerbi-11-section1-6.png" /></p>

<p>Save file</p>

<p>Go into <code class="language-plaintext highlighter-rouge">Report</code> view</p>

<p><img src="/img/2021-powerbi-14-section1-9.png" alt="2021-powerbi-14-section1-9.png" /></p>

<p>From the Visualizations section select <code class="language-plaintext highlighter-rouge">Table</code></p>

<p><img src="/img/2021-powerbi-15-section1-10.png" alt="2021-powerbi-15-section1-10.png" /></p>

<p>We will now drag into the Table the columns codes and names from the dewey table</p>

<p><img src="/img/2021-powerbi-17-section1-12.png" alt="2021-powerbi-17-section1-12.png" /></p>

<p>Section 1 - Create the library data hierarchy using the Dewey system is now done</p>

<h2 id="section-2">Section 2</h2>
<p><em>Import the loans files and link them to the dewey hierarchy</em></p>

<p>Import the 2020 loans xlsx file</p>

<p><img src="/img/2021-powerbi2-2-import-2020.png" alt="2021-powerbi2-2-import-2020.png" /></p>

<p>Name the new table <code class="language-plaintext highlighter-rouge">2020</code></p>

<p>Repeat the process and import the loans 2021 xlsx file</p>

<p>Name the new table <code class="language-plaintext highlighter-rouge">2021</code></p>

<p><img src="/img/2021-powerbi2-3-loan_tables.png" alt="2021-powerbi2-3-loan_tables.png" /></p>

<p><em>Discuss what happened, file content not changed, what will happen next and why</em></p>

<h2 id="section-3">Section 3</h2>
<p><em>Merging files - one of the higlight featuers of PowerBI</em></p>

<p>Discuss why this is key and very important</p>

<p>Merge the 2 loans files together using <code class="language-plaintext highlighter-rouge">Append Queries as New</code></p>

<p><img src="/img/2021-powerbi-19-section2-1.png" alt="2021-powerbi-19-section2-1.png" /></p>

<p><img src="/img/2021-powerbi2-2-append.png" alt="2021-powerbi2-2-append.png" /></p>

<p>Name the new table <code class="language-plaintext highlighter-rouge">loans</code> and go into the Transforn Data section</p>

<h2 id="section-4">Section 4</h2>
<p><em>Data programming</em></p>

<p>Add a new column to format the call number to 3 digit Dewey format, using <code class="language-plaintext highlighter-rouge">Add Column</code> and then <code class="language-plaintext highlighter-rouge">Columns from examples</code></p>

<p><img src="/img/2021-powerbi2-2-add_example.png" alt="2021-powerbi2-2-add_example.png" /></p>

<p><img src="/img/2021-powerbi2-2-loan_fields.png" alt="2021-powerbi2-2-loan_fields.png" /></p>

<p><em>Discuss what happened and why this is important!</em></p>

<p>Rename the new columns <code class="language-plaintext highlighter-rouge">dewey_code</code></p>

<p><em>Discuss why</em></p>

<p>Rename any other columns and change <code class="language-plaintext highlighter-rouge">year</code> to Date format</p>
<ul>
  <li><em>Date needs to be changed from a Number to Text and then to Date</em></li>
</ul>

<p>Right click on <code class="language-plaintext highlighter-rouge">year</code> and slect <code class="language-plaintext highlighter-rouge">Date</code> from the <code class="language-plaintext highlighter-rouge">Change Type</code> menu</p>

<p><img src="/img/2021-powerbi2-2-date_fields.png" alt="2021-powerbi2-2-date_fields.png" /></p>

<p><em>Discuss why</em></p>

<p>Save</p>

<p>Close and Apply</p>

<h2 id="section-5">Section 5</h2>
<p><em>Link the files</em></p>

<p><em>Discuss what we are about to do and why, and how this is diffrerent from Excel</em></p>

<p>Go into the <code class="language-plaintext highlighter-rouge">Model</code> section</p>

<p><img src="/img/2021-powerbi-22-section2-4.png" alt="2021-powerbi-22-section2-4.png" /></p>

<p>And link the <code class="language-plaintext highlighter-rouge">loans</code> table to the <code class="language-plaintext highlighter-rouge">dewey</code> table using <code class="language-plaintext highlighter-rouge">dewey_code</code> field in both tables. Noting that if we have namesd the fielkds correctly PowerBI may have already done this for us.</p>

<p><img src="/img/2021-powerbi2-3-loans-link.png" alt="2021-powerbi2-3-loans-link.png" /></p>

<p>Make the <code class="language-plaintext highlighter-rouge">2020</code> and <code class="language-plaintext highlighter-rouge">2021</code> tables hidden in the reports</p>

<p><em>discuss how and why</em></p>

<p><img src="/img/2021-powerbi2-2-hides.png" alt="2021-powerbi2-2-hides.png" /></p>

<p><em>Discuss what just happened and again why it’s important</em></p>

<p>Save</p>

<h2 id="section-6">Section 6</h2>
<p><em>Visualisation</em></p>

<p>Replicate each of the 3 graphs from the intro/presentation</p>

<p>Go into <code class="language-plaintext highlighter-rouge">Report</code> view</p>

<p><img src="/img/2021-powerbi-23-section2-5.png" alt="2021-powerbi-23-section2-5.png" /></p>

<p>From the <code class="language-plaintext highlighter-rouge">Visualizations</code> section select <code class="language-plaintext highlighter-rouge">Table</code></p>

<p>Discuss filters and level of fitlers</p>

<p><img src="/img/2021-powerbi-18-section1-fitlers-13.png" alt="2021-powerbi-18-section1-fitlers-13.png" /></p>

<p>Discuss grouping, what and why and possible better solution in step 8</p>

<p>Right click on Location in the fields section and select New Grouping</p>

<p><img src="/img/2021-powerbi-2-section-6-grouping.png" alt="2021-powerbi-2-section-6-grouping.png" /></p>

<h2 id="section-7">Section 7</h2>
<p><em>Calculations</em></p>

<p><em>This step is optional - can be ignored if taking to long or to hard</em></p>

<p>Add calculations for turnover and cost per use</p>

<p>Create calculations for Total Loans, Loan Ration and Price Per Use</p>

<p><img src="/img/2021-powerbi-29-section2-11.png" alt="2021-powerbi-29-section2-11.png" /></p>

<p><img src="/img/2021-powerbi-30-section2-12.png" alt="2021-powerbi-30-section2-12.png" /></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Turnover = sum(loans[Loans])/SUM(loans[Copies])
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PricePerUse = sum(loans[Price])/sum(loans[Loans])
</code></pre></div></div>

<p>Questions</p>

<ul>
  <li>What right about this</li>
  <li>What’s wrong about this</li>
  <li>What’s the problems</li>
  <li>How can we fix the problems</li>
  <li>Delete calculations</li>
</ul>

<p>Do data visualistation and discuss limitation of CostPU and Turnover given underlying data issues</p>

<h2 id="section-8">Section 8</h2>
<p><em>Add library locations</em></p>

<p><em>This step is optional - can be ignored if taking to long or to hard</em></p>

<p>Import the <code class="language-plaintext highlighter-rouge">libraries.csv</code> file</p>

<p>Create a new column that we will use to link locations to loans</p>

<p><img src="/img/2021-powerbi-2-section-8-location.png" alt="2021-powerbi-2-section-8-location.png" /></p>

<p>In the model view link the libraries to the loans</p>

<p><img src="/img/2021-powerbi2-3-libraries_model.png" alt="2021-powerbi2-3-libraries_model.png" /></p>]]></content><author><name></name></author><category term="vala," /><category term="powerbi," /><category term="power_bi" /><summary type="html"><![CDATA[This page details the notes for the VALA21 TechCamp PowerBI Workshop All files are available on GitHub at: https://github.com/justinkelly/vala_2021_techcamp_powerbi Introduction About me What is PowerBI https://bit.ly/vala21_powerbi_pres What the focus is today 80% will be about data, data formats and data structure Thinking about what’s in the files, how do we connect them, and if there is no current way to join them - how to we make a way, and what are the limitations What is PowerBI and why would I use it? So we can go from a disparate bunch of Excel and CSV files, to a connected system/database That can produce meaningfull reports and dashboards Discuss difference between Excel and PowerBi and why use 1 or the other Why are we doing this? Librarians know the stories of how your libraries are operating and evolving But if we can’t present data to support our stories and feelings we can’t have useful discussions with senior management etc. about the future of your libraries Give you tools and knowledge so you can have a positive impact for your libraries Links This document: https://justin.kelly.org.au/vala21-techcamp-powerbi-workshop/ Presentation: https://bit.ly/vala21_powerbi_pres Files https://bit.ly/vala21_powerbi https://github.com/justinkelly/vala_2021_techcamp_powerbi Problem: In many cases we in libraries want to do some data analysis on our library data but for 1 of many reasons we can’t - such as: The LMS has a limited reporting functionality Only specific people allowed to use the LMS admin or reporting functionality LMS reporting/data is siloed - where you can analyses data separately - but not linked ie - can view all the bibliographic data - but not join this with the loans or the purchase info So can’t get a full view of whats happening in your library LMS admins/system people have said that to do reporting you need to do SQL and need permission from IT SQL is to hard and you have to know to much about that data IT says you can’t get permission for SQL access and should talk to the LMS admins/reporting - going round in circles not getting anywhere In many cases we resort to using the only tool we have available (MS Excel) and trying to stitch together lots of different files together (or copy and paste large xls file contents to manual append data) into one big hot mess Purpose: We can overcome the obstacles that are in the way of a librarian from doing the data work required in their role. Scenario Can’t get access to useful library reporting systems - but the LMS admins have been nice enough to give you a bunch of seemly random data extracts covering catalogue structure, bibliographic info, loans and purchase data We know that the data in the different files are related but when you open the files the data formats are all different and the info seems all mixed up How do we unscramble the mess of data we’ve been given and turn it into a system that we can derive insights and analysis. When we do the analysis we’ll do some examples using possible roles within a library such as subject librarian, branch librarian and university librarian. The data we are going to use today is a mix of Real bibliographic info based on Newcastle (UK) Libraries catalogue titles https://data.world/datagov-uk/364b7c8f-3917-49bb-aae2-8e504a6fde11 Dewey Decimal System https://github.com/Alacritous/DDS_directory_structure Randomised loans, prices and quantity data Selected Melbourne University library locations This data has some limitations that can’t be solved - what do we do in this case? What we are going to do: Turn a bunch of excel and csv files into a maintainable, share-able and accurate dashboard, that allows you to answer complex queries simply using Power BI. The important aspects are not PowerBI and dashboards - 80% of what we will cover today is data related and can be used with other tools and environments Branch librarian for the Baillieu Library want to see the change in collection use for my branch University Librarian I want to look at collection mix over all locations Subject Librarian for Physics at Melbourne University I want to see the detailed analysis within Physics for each branch Files we have 1 file that describe the structure/hierarchy of the collection This this example we are using the 3 level of Dewey - but you can also think of similar internal structure that your organisation may use to define collections, parent collections and sub-collection List of branch locations 2 files that list loans, purchase cost, a holding quantity, per year for each location - 1 file per year Section 1 What is PowerBI and intro to it’s interface Create the library data hierarchy using the Dewey system 1 files dewey.csv Import the data Import the Dewey file dewey.csv Review contents and select Load This will load into PowerBI the contents of the dewey.csv file, and will create a table in PowerBI called dewey Intro to Transform Data Within the Model view in PowerBI select the Transform Data menu from the top ribbon menu Discuss issue with the dewey column, and how and why will be fix this. Note that the change is done once but is now automated and repeatable We now need to create a new column that correctly formats the dewey code to 3 characters. Name this new column dewey_code From the Add columns menu select Column From Examples discuss what this is, and why is usefull and how is alternative to programming Go through steps to format into 3 character dewet Why and how? We need a way to link the dwey file Save the file Then Close &amp; Apply Save file Go into Report view From the Visualizations section select Table We will now drag into the Table the columns codes and names from the dewey table Section 1 - Create the library data hierarchy using the Dewey system is now done Section 2 Import the loans files and link them to the dewey hierarchy Import the 2020 loans xlsx file Name the new table 2020 Repeat the process and import the loans 2021 xlsx file Name the new table 2021 Discuss what happened, file content not changed, what will happen next and why Section 3 Merging files - one of the higlight featuers of PowerBI Discuss why this is key and very important Merge the 2 loans files together using Append Queries as New Name the new table loans and go into the Transforn Data section Section 4 Data programming Add a new column to format the call number to 3 digit Dewey format, using Add Column and then Columns from examples Discuss what happened and why this is important! Rename the new columns dewey_code Discuss why Rename any other columns and change year to Date format Date needs to be changed from a Number to Text and then to Date Right click on year and slect Date from the Change Type menu Discuss why Save Close and Apply Section 5 Link the files Discuss what we are about to do and why, and how this is diffrerent from Excel Go into the Model section And link the loans table to the dewey table using dewey_code field in both tables. Noting that if we have namesd the fielkds correctly PowerBI may have already done this for us. Make the 2020 and 2021 tables hidden in the reports discuss how and why Discuss what just happened and again why it’s important Save Section 6 Visualisation Replicate each of the 3 graphs from the intro/presentation Go into Report view From the Visualizations section select Table Discuss filters and level of fitlers Discuss grouping, what and why and possible better solution in step 8 Right click on Location in the fields section and select New Grouping Section 7 Calculations This step is optional - can be ignored if taking to long or to hard Add calculations for turnover and cost per use Create calculations for Total Loans, Loan Ration and Price Per Use Turnover = sum(loans[Loans])/SUM(loans[Copies]) PricePerUse = sum(loans[Price])/sum(loans[Loans]) Questions What right about this What’s wrong about this What’s the problems How can we fix the problems Delete calculations Do data visualistation and discuss limitation of CostPU and Turnover given underlying data issues Section 8 Add library locations This step is optional - can be ignored if taking to long or to hard Import the libraries.csv file Create a new column that we will use to link locations to loans In the model view link the libraries to the loans]]></summary></entry><entry><title type="html">VALA21 TechCamp PowerBI Workshop</title><link href="https://justin.kelly.au/vala21-techcamp-powerbi-workshop/" rel="alternate" type="text/html" title="VALA21 TechCamp PowerBI Workshop" /><published>2021-04-09T00:00:00+10:00</published><updated>2021-04-09T00:00:00+10:00</updated><id>https://justin.kelly.au/vala21-techcamp-powerbi-workshop</id><content type="html" xml:base="https://justin.kelly.au/vala21-techcamp-powerbi-workshop/"><![CDATA[<meta http-equiv="refresh" content="0; URL=https://justin.kelly.org.au/introduction-to-powerbi-for-librarians/" />]]></content><author><name></name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Making audio recording accessible to people with hearing impairments.</title><link href="https://justin.kelly.au/making-audio-recording-accessible-to-people-with-hearing-impairments/" rel="alternate" type="text/html" title="Making audio recording accessible to people with hearing impairments." /><published>2021-02-13T00:00:00+11:00</published><updated>2021-02-13T00:00:00+11:00</updated><id>https://justin.kelly.au/making-audio-recording-accessible-to-people-with-hearing-impairments</id><content type="html" xml:base="https://justin.kelly.au/making-audio-recording-accessible-to-people-with-hearing-impairments/"><![CDATA[<p><img src="/img/2021-NYPL.png" alt="NYPL" /></p>

<h2 id="summary">Summary</h2>

<p>For automated transcription of audio recordings to text, with the focusing on accessibility and historical recording - Azure from Microsoft currently is the best performing cloud service.</p>

<p>For state and national libraies - automated transcriptions are now accurate enough that they can be used as is (for search results and in the user interface)</p>

<h2 id="introduction">Introduction</h2>

<p>As part of my State Library of Victoria Digital Fellowship - <a href="https://biblio.ai">https://biblio.ai</a> I’ve been working on using the latest cloud based AI technologies to make to a libraries digital items (images, scanned historical documents, video, etc) accessible to people with various disabilities - such as automated transcription of historical handwritten documents to text - to enable visually impaired people to use/understand/access documents that were previously stored as images and currently transcribed - thus totally unusable for anyone with a visual impairment.</p>

<p>My previous blog post goes into details to compare various handwriting o text services available: <a href="https://justin.kelly.org.au/benchmarking-computer-vision-transcription-of-historical-handwritten-documents/">https://justin.kelly.org.au/benchmarking-computer-vision-transcription-of-historical-handwritten-documents/</a></p>

<p>My current focus is how to make audio recordings (such as historical recordings or oral histories - that state and national libraries collect) accessible to people with hearing impairments. Prior work in this area by the <a href="http://transcribe.oralhistory.nypl.org">New Your Public Library</a> uses speech-to-text software software to convert audio to text, and provide an interface for a user to updated transcription.</p>

<p>The original NYPL project started from a grant in 2015 from the <a href="https://knightfoundation.org/grants/together-we-listen">Knight Foundation</a>. Given that even in the last year there has been big improvements in live audio to text (such as MS Teams and Zoom automated live captions) it’s as great time to:</p>

<ol>
  <li>Review the accuracy of the current best audio to text services</li>
  <li>If they are accurate enough, can libraries not run crowdsourced editing projects - as just use the automated transcripts as is.</li>
</ol>

<p>The current leaders in audio to text are:</p>

<ul>
  <li>Azure from Microsoft (which powers the MS Team living caption feature</li>
  <li><a href="http://otter.ai">Otter.ai</a> (heavily used and integrated with Zoom)</li>
  <li>and Google Cloud Platform (GCP).</li>
</ul>

<p>Taking a randomly selected audio recording from the State Library of South Australia - Oral History collection - which happened to be an interview with Leonard Cohen - details below:</p>

<ul>
  <li>“Interview with Leonard Cohen [sound recording] Interviewer: Nigel Russell [Excerpt]”</li>
  <li><a href="https://digital.collections.slsa.sa.gov.au/nodes/view/6368?keywords=">https://digital.collections.slsa.sa.gov.au/nodes/view/6368?keywords=</a></li>
</ul>

<p><a href="https://digital.collections.slsa.sa.gov.au/nodes/view/6368?keywords="><img src="/img/2021-SLSA.png" alt="SLSA" /></a></p>

<p>Lets see how accurate each service is (as of 1 Feb 2021) in automatically transcribing the above audio recording to text</p>

<h2 id="azure">Azure</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Well, this is a lyric that is based on an old Chinese text worked on the translation with a lesson Oriental scholar. It's originally a training manual for monks and the metaphor that's used as an ox herder who is lossed his ox and I translated into a western. Cowboy style. Goes like this. Say a prayer for the cowboy. His mirrors run away and he'll walk till he finds her his darling, his stray. But the rivers in flood and the roads are awash and the bridges break up in the panic of loss and there's nothing to follow. There's nowhere to go she's gone like the summer she's gone like this snow and the crickets are breaking his heart with their song as the day caves in and the night is all wrong. Did he dream? Was it she who went galloping past and bent down the Fern and broke open the grass and printed the mud with the iron and the gold that he nailed to her feet when he was the Lord? And although she goes grazing a minute away, he tracks for all night and he tracks her all day. Oh, blind, her presence, except to compare his injury here with her punishment there. 
</code></pre></div></div>

<ul>
  <li>Mistakes = 4 words out of 214</li>
  <li>Accuracy = 98% on spoken American English</li>
</ul>

<h3 id="otterai">Otter.ai</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>This is a lyric that is based on an old Chinese text. Worked on the translation with a    an oriental scholar. It's originally a training manual for monks. And the metaphor that's used is an ox herder who has lost his ox and I translate it into a Western cowboy style. goes like this. Say a prayer for the cowboy. His mirrors are in a way and they'll walk till he finds her. His darling is stray. But the river is in flood and the roads are awash and the bridges break up in the Panic of loss. And there's nothing to follow. There's nowhere to go. She's gone like the summer. She's gone like the snow and the crickets are breaking as hard with their song as the day caved in. And the night is all wrong. Did he dream was it she who went galloping past and bent down the fern and broke open the grass and painted the mud with the iron and the gold that he nailed to her feet when he was the Lord. And although she goes grazing a minute away, he tracks for all night and he tracks all day or blind   her presence except to compare his injury here with her punishment there.
</code></pre></div></div>

<ul>
  <li>Mistakes = 8 words out of 214</li>
  <li>Accuracy = 96% on spoken American English</li>
</ul>

<h3 id="google-gcp">Google GCP</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>all this is a Lyrica that is based on an old Chinese text I worked on the translation with a o
Sun Oriental scholar it's originally a training manual for monks and the metaphor that used is an ox herder who has lost his ox-eye translated into a a western cowboy style.    goes like this. say a prayer for the Cowboys mirrors run away and I walked Ali finds her his darling his stray but the rivers in flood in the roads are the wash and the bridge is break up in the Panic of loss and there's nothing to follow there's nowhere to go she's gone like the summer she's gone like the snow and the Crickets are breaking his heart with their song as the day caves in and the night is all wrong
did he dream was it she   went Galloping past and bent down the fern and broke open the grass and printed the mud with the iron in the gold that he nailed to her feet when he was the Lord
and although she goes grazing a minute away he tracks her all night and he tracks for all day o blind and her presence except to compare his injury here with their punishment there"
</code></pre></div></div>

<ul>
  <li>Mistakes = 14 mistaken words out of 214</li>
  <li>Accuracy = 93% on spoken American English</li>
</ul>

<h3 id="manually">Manually</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ah, this is a lyric that is based on an old Chinese text, ah worked on the translation with a with a oriental scholar. It's originally a training manual for monks and the metaphor that's used is an ox herder who has lost his ox and I translated into a western cowboy style ah it goes like this. Say a prayer for the cowboy his mirrors run away and he'll walk till he find her, his darling, his stray, but the rivers in flood and the roads are awash and the bridges break up and the panic of loss  and there's nothing to follow, there's nowhere to go, she's gone like the summer, shes gone like the snow, and crickets are breaking his heart with their song, as the day caves in and the night is all wrong. Did he dream? was it she who went galloping past and bent down the fern and broke open the grass and printed the mud with the iron and the gold, that he nailed to her feet when he was the lord. And although she goes grazing a minute away, he tracks her all night and he tracks her all day. Oh, blind to her presence except to compare his injury here with her punishment there.
</code></pre></div></div>

<p>It took me about 30 minutes to go through the audio a number of times on super slow speed to transcribe the audio manually to text :( and I actually had to refer the the MS Azure transcription  to check a few words I had difficulties with.</p>

<h2 id="conclusion">Conclusion</h2>

<p>All 3 services can do real-time transcriptions - meaning its took them about 1:30 minutes to transcribe(including grammar) the 1:30 of audio reording and each performed at above 90% accuracy.</p>

<p>Given that we are looking to see if these can be used as is (with no or minimal human review) Azure from Microsoft performed amazingly well at 98% correct words and most of the grammar correct.</p>

<p>These results are done on a single recording, in clear spoken American English - and are not representive of a full audio collection of a Library - even with this limitation I would happily recommend imperfect Azure transcription compared to having none transcripted audio records.</p>

<p>Does a Library still need do run a crowdsourced editing project for audio recordings?</p>

<p>If 100% perfect transcription is required with audio comments and speaker splitting etc. and their is funding available - then yes an editing project will still be required.</p>

<p>If a library wants to make as much digital content as accessible as possible to as wide an audience as possible for minimal expenditure - then No - just use Azure transcripts as in with notes added into any interface to highlight they these are auto generated and it’ll be a huge accessibility improvement for audio recordings.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Summary For automated transcription of audio recordings to text, with the focusing on accessibility and historical recording - Azure from Microsoft currently is the best performing cloud service. For state and national libraies - automated transcriptions are now accurate enough that they can be used as is (for search results and in the user interface) Introduction As part of my State Library of Victoria Digital Fellowship - https://biblio.ai I’ve been working on using the latest cloud based AI technologies to make to a libraries digital items (images, scanned historical documents, video, etc) accessible to people with various disabilities - such as automated transcription of historical handwritten documents to text - to enable visually impaired people to use/understand/access documents that were previously stored as images and currently transcribed - thus totally unusable for anyone with a visual impairment. My previous blog post goes into details to compare various handwriting o text services available: https://justin.kelly.org.au/benchmarking-computer-vision-transcription-of-historical-handwritten-documents/ My current focus is how to make audio recordings (such as historical recordings or oral histories - that state and national libraries collect) accessible to people with hearing impairments. Prior work in this area by the New Your Public Library uses speech-to-text software software to convert audio to text, and provide an interface for a user to updated transcription. The original NYPL project started from a grant in 2015 from the Knight Foundation. Given that even in the last year there has been big improvements in live audio to text (such as MS Teams and Zoom automated live captions) it’s as great time to: Review the accuracy of the current best audio to text services If they are accurate enough, can libraries not run crowdsourced editing projects - as just use the automated transcripts as is. The current leaders in audio to text are: Azure from Microsoft (which powers the MS Team living caption feature Otter.ai (heavily used and integrated with Zoom) and Google Cloud Platform (GCP). Taking a randomly selected audio recording from the State Library of South Australia - Oral History collection - which happened to be an interview with Leonard Cohen - details below: “Interview with Leonard Cohen [sound recording] Interviewer: Nigel Russell [Excerpt]” https://digital.collections.slsa.sa.gov.au/nodes/view/6368?keywords= Lets see how accurate each service is (as of 1 Feb 2021) in automatically transcribing the above audio recording to text Azure Well, this is a lyric that is based on an old Chinese text worked on the translation with a lesson Oriental scholar. It's originally a training manual for monks and the metaphor that's used as an ox herder who is lossed his ox and I translated into a western. Cowboy style. Goes like this. Say a prayer for the cowboy. His mirrors run away and he'll walk till he finds her his darling, his stray. But the rivers in flood and the roads are awash and the bridges break up in the panic of loss and there's nothing to follow. There's nowhere to go she's gone like the summer she's gone like this snow and the crickets are breaking his heart with their song as the day caves in and the night is all wrong. Did he dream? Was it she who went galloping past and bent down the Fern and broke open the grass and printed the mud with the iron and the gold that he nailed to her feet when he was the Lord? And although she goes grazing a minute away, he tracks for all night and he tracks her all day. Oh, blind, her presence, except to compare his injury here with her punishment there. Mistakes = 4 words out of 214 Accuracy = 98% on spoken American English Otter.ai This is a lyric that is based on an old Chinese text. Worked on the translation with a an oriental scholar. It's originally a training manual for monks. And the metaphor that's used is an ox herder who has lost his ox and I translate it into a Western cowboy style. goes like this. Say a prayer for the cowboy. His mirrors are in a way and they'll walk till he finds her. His darling is stray. But the river is in flood and the roads are awash and the bridges break up in the Panic of loss. And there's nothing to follow. There's nowhere to go. She's gone like the summer. She's gone like the snow and the crickets are breaking as hard with their song as the day caved in. And the night is all wrong. Did he dream was it she who went galloping past and bent down the fern and broke open the grass and painted the mud with the iron and the gold that he nailed to her feet when he was the Lord. And although she goes grazing a minute away, he tracks for all night and he tracks all day or blind her presence except to compare his injury here with her punishment there. Mistakes = 8 words out of 214 Accuracy = 96% on spoken American English Google GCP all this is a Lyrica that is based on an old Chinese text I worked on the translation with a o Sun Oriental scholar it's originally a training manual for monks and the metaphor that used is an ox herder who has lost his ox-eye translated into a a western cowboy style. goes like this. say a prayer for the Cowboys mirrors run away and I walked Ali finds her his darling his stray but the rivers in flood in the roads are the wash and the bridge is break up in the Panic of loss and there's nothing to follow there's nowhere to go she's gone like the summer she's gone like the snow and the Crickets are breaking his heart with their song as the day caves in and the night is all wrong did he dream was it she went Galloping past and bent down the fern and broke open the grass and printed the mud with the iron in the gold that he nailed to her feet when he was the Lord and although she goes grazing a minute away he tracks her all night and he tracks for all day o blind and her presence except to compare his injury here with their punishment there" Mistakes = 14 mistaken words out of 214 Accuracy = 93% on spoken American English Manually Ah, this is a lyric that is based on an old Chinese text, ah worked on the translation with a with a oriental scholar. It's originally a training manual for monks and the metaphor that's used is an ox herder who has lost his ox and I translated into a western cowboy style ah it goes like this. Say a prayer for the cowboy his mirrors run away and he'll walk till he find her, his darling, his stray, but the rivers in flood and the roads are awash and the bridges break up and the panic of loss and there's nothing to follow, there's nowhere to go, she's gone like the summer, shes gone like the snow, and crickets are breaking his heart with their song, as the day caves in and the night is all wrong. Did he dream? was it she who went galloping past and bent down the fern and broke open the grass and printed the mud with the iron and the gold, that he nailed to her feet when he was the lord. And although she goes grazing a minute away, he tracks her all night and he tracks her all day. Oh, blind to her presence except to compare his injury here with her punishment there. It took me about 30 minutes to go through the audio a number of times on super slow speed to transcribe the audio manually to text :( and I actually had to refer the the MS Azure transcription to check a few words I had difficulties with. Conclusion All 3 services can do real-time transcriptions - meaning its took them about 1:30 minutes to transcribe(including grammar) the 1:30 of audio reording and each performed at above 90% accuracy. Given that we are looking to see if these can be used as is (with no or minimal human review) Azure from Microsoft performed amazingly well at 98% correct words and most of the grammar correct. These results are done on a single recording, in clear spoken American English - and are not representive of a full audio collection of a Library - even with this limitation I would happily recommend imperfect Azure transcription compared to having none transcripted audio records. Does a Library still need do run a crowdsourced editing project for audio recordings? If 100% perfect transcription is required with audio comments and speaker splitting etc. and their is funding available - then yes an editing project will still be required. If a library wants to make as much digital content as accessible as possible to as wide an audience as possible for minimal expenditure - then No - just use Azure transcripts as in with notes added into any interface to highlight they these are auto generated and it’ll be a huge accessibility improvement for audio recordings.]]></summary></entry><entry><title type="html">Benchmarking computer vision transcription of historical handwritten documents</title><link href="https://justin.kelly.au/benchmarking-computer-vision-transcription-of-historical-handwritten-documents/" rel="alternate" type="text/html" title="Benchmarking computer vision transcription of historical handwritten documents" /><published>2020-01-07T00:00:00+11:00</published><updated>2020-01-07T00:00:00+11:00</updated><id>https://justin.kelly.au/benchmarking-computer-vision-transcription-of-historical-handwritten-documents</id><content type="html" xml:base="https://justin.kelly.au/benchmarking-computer-vision-transcription-of-historical-handwritten-documents/"><![CDATA[<h2 id="summary">Summary</h2>

<p>For automatic transcription of most historical handwritten cursive script documents, Microsoft Azure is the most accurate computer vision provider, with the exception for very hard to read documents where Google Cloud leads.</p>

<p><a href="#benchmark-results">Scaled scores</a>:</p>

<ul>
  <li>MS Azure: 91</li>
  <li>Google Cloud: 89</li>
  <li>Amazon AWS: 11</li>
</ul>

<p><img src="/img/george_swinburne_transcript.png" width="600" alt="" />
<em>Sample of the MS Azure transcription of the George Swinburne diary</em></p>

<h3 id="table-of-contents">Table of contents</h3>

<ul>
  <li><a href="#introduciton">Introduction</a></li>
  <li><a href="#benchmarking">Benchmarking</a></li>
  <li><a href="#benchmark-results">Results</a></li>
  <li><a href="#images">Images</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
  <li><a href="#next-steps">Next steps</a></li>
  <li><a href="#notes">Notes and acknowledgements</a></li>
</ul>

<p><a id="introduction"></a></p>
<h2 id="introduction">Introduction</h2>

<p>Thanks to the generous funding of the State Library of Victoria’s (SLV) <a href="https://www.slv.vic.gov.au/about-us/fellowships/digital-fellowship/current-fellow-justin-kelly">Digital Fellowship program</a>, I have been able to start work on the development of an open source service ( <a href="https://biblio.ai">biblio.ai</a> ) for institutional libraries to use the latest in computer vision technologies to automatically enrich the metadata of their image collections.</p>

<p>The 3 main areas of automatic metadata enrichment will be:</p>

<ul>
  <li>Transcription of handwriting and text in images ( handwritten notes, typed text, even words in signs,etc.. )</li>
  <li>Object detection for tagging/labeling (pick out the main objects in an image - enable users to browser a collection via tags etc..)</li>
  <li>Automatic descriptions of images (adding human readable descriptions in bulk to an image collection, where existing records may have little or no metadata/cataloguing)</li>
</ul>

<p>The purpose of metadata enrichment with computer vision is:</p>

<ul>
  <li>Discoverability. Enable the discoverability and re-use of a libraries digital collections by a much greater audience.</li>
  <li>Accessibility. Without rich metadata a libraries image collection is inaccessible to vision impaired users.</li>
  <li>Automate the hard stuff for libraries. Institutional libraries such as the SLV have millions of scanned images, but only a small fraction of these (the really important works) get transcribed (it’s expensive and very time consuming)</li>
</ul>

<p>An example of how time consuming transcription can be. The SLV’s volunteer run project to transcribe the 25 volumes of the diaries of Joseph Jenkins <em>(an itinerant labourer in gold rush era Central Victoria)</em>, has so far taken 10 years to get through 15 volumes. The <a href="https://www.youtube.com/watch?v=Olil-MGljA0">YouTube video</a> and document linked below have some great details on this transcription project.</p>

<ul>
  <li><a href="https://www.youtube.com/watch?v=Olil-MGljA0">https://www.youtube.com/watch?v=Olil-MGljA0</a></li>
  <li><a href="http://handle.slv.vic.gov.au/10381/32073">http://handle.slv.vic.gov.au/10381/32073</a></li>
</ul>

<p>The first step in this project is to benchmark the available computer vision technologies to identify which the most suitable for the various types of collections and institutions.</p>

<p>By far, the most useful thing computer vision can do for institutions libraries with image collections is the transcription of words (specifically cursive handwriting) in images to text. Making huge collections of images and documents searchable and accessible.</p>

<p>Automatic transcription of typed text documents via standard OCR software has been available for years. It is only very recent that transcription of cursive handwritten images has started to move from purely academic space to being generally available from leading cloud providers.</p>

<p>For conversion of scanned handwritten documents to text there are currently (Jan 2020) 4 main providers.</p>

<ul>
  <li><a href="https://azure.microsoft.com/en-gb/services/cognitive-services/computer-vision/">Microsoft Azure Computer Vision</a></li>
  <li><a href="https://aws.amazon.com/rekognition/">Amazon AWS Rekognition</a></li>
  <li><a href="https://cloud.google.com/vision/">Google Cloud Vision</a></li>
  <li><a href="https://www.abbyy.com/en-au/cloud-ocr-sdk/">ABBYY Cloud OCR</a></li>
</ul>

<p>Notes: unfortunately ABBYY is suitable only for form style block letter handwriting conversion and the leading open source OCR solution <a href="https://github.com/tesseract-ocr/tesseract">Tesseract</a> is currently <a href="https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-use-tesseract-for-handwriting-recognition">not easily suitable for general handwriting recognition</a>. Both of these options have been excluded in the following tests.</p>

<p><a id="benchmarking"></a></p>
<h2 id="benchmarking">Benchmarking</h2>

<p>Images of various degrees of difficulty to read/transcribe were selected from SLVs and Swinburne Universities image collections. If the image had an existing transcription - this was used as the base to compare the computer vision transcriptions against. If a transcription didn’t exist I manually transcribe the image to use as the base.</p>

<p>Each automated transciption was analysed for the number of words there were correctly transcribed. Correctly being defined as all letters in a word matching the base transcription.</p>

<p>The number of words correct per transciption was then converted to a percentage and then scaled based on the perceived difficulty of the text.</p>

<p><a id="benchmark-results"></a></p>
<h2 id="benchmark-results">Benchmark results</h2>

<div class="table-responsive">

  <table class="table table-striped table-sm">
    <thead>
      <tr>
        <th style="text-align: left">Document</th>
        <th style="text-align: left">Age (years)</th>
        <th style="text-align: left">Difficulty</th>
        <th style="text-align: left">MS Azure</th>
        <th style="text-align: left">Google Cloud</th>
        <th style="text-align: left">Amazon AWS</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="text-align: left"><a href="#swin-diary-a">Swinburne: George Swinburne’s diary, 1882 page 4</a></td>
        <td style="text-align: left">138</td>
        <td style="text-align: left">Hard</td>
        <td style="text-align: left">79%</td>
        <td style="text-align: left">62%</td>
        <td style="text-align: left">7%</td>
      </tr>
      <tr>
        <td style="text-align: left"><a href="#swin-letter-a">Swinburne: Letter from aboard S.S. Shropshire</a></td>
        <td style="text-align: left">100</td>
        <td style="text-align: left">Medium</td>
        <td style="text-align: left">78%</td>
        <td style="text-align: left">66%</td>
        <td style="text-align: left">4%</td>
      </tr>
      <tr>
        <td style="text-align: left"><a href="#slv-diary-a">SLV: Diary of Charles Evans, 1853 - page 7</a></td>
        <td style="text-align: left">167</td>
        <td style="text-align: left">Very Hard</td>
        <td style="text-align: left">60%</td>
        <td style="text-align: left">75%</td>
        <td style="text-align: left">3%</td>
      </tr>
      <tr>
        <td style="text-align: left"><a href="#slv-postcard-a">SLV: Christmas 1888. A page from an illustrated diary</a></td>
        <td style="text-align: left">132</td>
        <td style="text-align: left">Easy</td>
        <td style="text-align: left">95%</td>
        <td style="text-align: left">92%</td>
        <td style="text-align: left">49%</td>
      </tr>
      <tr>
        <td style="text-align: left">Average</td>
        <td style="text-align: left"> </td>
        <td style="text-align: left"> </td>
        <td style="text-align: left">78%</td>
        <td style="text-align: left">74%</td>
        <td style="text-align: left">16%</td>
      </tr>
    </tbody>
  </table>

</div>

<p>Scaled for difficulty: Easy = .5, Medium = 1, Hard = 1.5, Very Hard = 2</p>

<ul>
  <li>MS Azure: 91</li>
  <li>Google Cloud: 89</li>
  <li>Amazon AWS: 11</li>
</ul>

<p><a id="swin-diary-a"></a><a id="images"></a></p>
<h3 id="george-swinburnes-diary-1882-to-1886-page-4">George Swinburne’s diary, 1882 to 1886 page 4</h3>

<ul>
  <li>Age: 138 years</li>
  <li>Perceived difficulty: Hard (cursive, clear writing on clean paper)</li>
  <li>URL: <a href="https://commons.swinburne.edu.au/items/0903ff8d-ddc3-41f9-96b6-928eee4d1bde/1/">https://commons.swinburne.edu.au/items/0903ff8d-ddc3-41f9-96b6-928eee4d1bde/1/</a></li>
</ul>

<p>Preview:
<img src="/img/george_swinburne_diary.png" width="400" alt="" /></p>

<h4 id="human-transcription">Human transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>3rd Feb 1882

This is my 21st birthday! I am a man! very significant that; as father says, I will have to think and act for myself now. Well, with God's help I will endeavour to do right I live a thruthtful, righteous &amp; sober life. It is my aim ever to better myself, although I sometimes miserably, oh! miserably fail. I've got a temper, which wants conquering, &amp; a heart, that often harbours most wicked &amp; vile thoughts, cleansing. God strengthen me in my warfare &amp; help me to live the life I have to live nobly &amp; well, always prepared to die.

I have been a Sunday School Teacher for about 7 years, that has done me a power of good, &amp; my feeble efforts at working for Christ on the Sunday, have often fortified me for the next week. One needs spiritual armour on the Quayside. I see more of what people term 'life' on the Quay, than I expected when I first went, but I am fortunate in being a good office, where I have been for 7 years (come 18th May). The best way to conquer evil thoughts is to work hard, at least,
</code></pre></div></div>

<p>193 Words</p>

<h4 id="ms-azure-transcription">MS Azure transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>god Feb 1882   
This is my bit birthday! I'm aman   
very significant that, as father days, Swill have
to think and act for myself now. Well, with 
God's help I will endeavour to do right Hive a 
truthful, righteous asober life. It is my aim 
ever to bett mapelf, although Isorhetimes 
miserably, oh miserably foul. lie got a 
temper, which wants conquering, Dad heart that 
often harbours most wicked Write thoughts, 
cleansing. God strengthen me in my warfare 
thell she to live the life I have to live nobly twell, 
always prepared to did. 

Is have been a Sunday School Teacher for   
about 4 years, that has done me a power of  
good, Imy feeble efforts at working for Christ 
low the Sunday, have often fortified me for 
the next weekd. ones needdo spirituall armour oh the 
Wayside. Isee more of what people term life on 
the away, than Sexplated when I first went, but 
Sam fortunate in being in a good office , where 
I have been for yyears / come 18th may). 
The best way to conquer edit thoughts is to work hard, at least, 
</code></pre></div></div>
<p>152 words correct = 79%</p>

<h4 id="google-cloud-transcription">Google Cloud transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grd Feb. 1982   
This is my 214 birthday! So am   
very significant thatli as father day, will have  
Etichoke land act for mopelfmou. Well, with  
Gods heff I will endeavour/t dorught Mive a  
truthful, righteous tsober life. It is my aim  
ever te letter mpelf, although Seorhetines  
miserables, oh! miserably fail. I've gota   
tempe, Thich warto coslabering, Mal heart that 
often harbouro most wicked Bila thought, 
dansing. Lod shengthens me in my was fase  
Shelf she to live the life I have to like no hd, Puell 

I have been a Sunday School Teacher for 
about y years, that has done me a powerlof 
good, ding feeble effato at working for Christ  
on the Sunday, have often fortified me for 
the next weeld. Onei neelde Sikituallarmour on the 
Luapide. Isee mme of what people term 'life on 
the way, than Iexpleted when I first went, but 
I am fortunate in being in a good office, where  
Shave been for Y yearo come 1st May). \"The best way 
to conquer evil thought in to work hard, at least, 
</code></pre></div></div>

<p>119 words correct = 62%</p>

<h4 id="aws-rekognition-transcription">AWS Rekognition transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grd Hef 1882
This 2/2 Lirthday! Sim
thal is my La ther vill aman. Have
y t thile uign ft as dayi
Lod and act for mou Mell wth
hel will endeawour t doreg ht rlive a
LuLh ighleos soher e L Itisa aime
ever l Le ttod mspey althhughe aohe times
serafla
</code></pre></div></div>
<p>14 words correct = 7%</p>

<p><a id="swin-letter-a"></a></p>
<h3 id="swinburne--letter-from-aboard-ss-shropshire">Swinburne : Letter from aboard S.S. Shropshire</h3>

<ul>
  <li>Age: 100 years</li>
  <li>Perceived difficulty: Medium (cursive script, well spaced, pretty clear)</li>
  <li>Image URL: <a href="https://commons.swinburne.edu.au/items/cd53e247-3e39-458e-8582-9fa0a2a2e120/1/">https://commons.swinburne.edu.au/items/cd53e247-3e39-458e-8582-9fa0a2a2e120/1/</a></li>
</ul>

<p>Preview:
<img src="/img/swinburne-letter.jpg" width="400" alt="" /></p>

<p><em>Note: Image is of undetermined copyright status and is provided here for educational purposes</em></p>

<h4 id="human-transcription-1">Human transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S.S. Shropshire
At Sea
4th March 19120

Mr F.W. Green.
Dear Sir,

We are off to Freemantle for another 40,000 Boxes of apples after loading 110,00 at Hobart. We did not call at Melbourne again. We will land 6,000 Boxes of Butter at Freemantle from Sydney.

Will you please remember me to Mr Waters &amp; to my fiends at My College and with best wishes to yourself I remain

Yours faithfully
John Duncan
</code></pre></div></div>
<p>Word count: 77</p>

<h4 id="ms-azure-transcription-1">MS Azure transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S.S.  
Shropshire
my. N. Green.  
Aff March. 19120, 
Dear Dien We are off- to Freemantle 
for another 40.000 Boxes of apples after 
loading 1 10.000, at Nobrash, wel did not call at 
Melbourne, again. We will land 6.000. Boxes 
of Butter atfreemantle from Sydney . 
Will you please remember me 
to Mr Waters &amp; bol my friends at Me  
College. and with best wishes to 
yourself 
I pemain
yours faithfully 
John. Duncan 
</code></pre></div></div>

<p>60 words correct = 78%</p>

<h4 id="google-cloud-transcription-1">Google Cloud transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S.s. Ohropshire  
lat sea  
4 Wareh. 19120, 
Mr J. W. Green. 
Dear Sire We are off to freemantle 
for another 40,000 Bosces of apples after 
Loading 110000, at Hobart, wobel did not eallah 
Melbfturne, again. He will land 6.000. Boces 
of Buter and Freemande from Sydney. 
Will you fleebe femember me 
to Me Waters e to my friends at the 
College and with Post wishes to : 
yourself 
Yours faithfully 
seman
can
</code></pre></div></div>

<p>51 words correct = 66%</p>

<h4 id="aws-rekognition-transcription-1">AWS Rekognition transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S.S. ehrefiuhi ea
Woes areh. 191 20.
Greeen
We al f b Freemanl
or anothe 40:0 Noba: Boeis
ples olidl plo aplen oallo
Hoarding Melo frne 1/10000, al
gain. Me u'llland 6.00D. Bocoo
BBuMer a antle ylny.
Iill fe bpe me
</code></pre></div></div>
<p>3 words correct = 4%</p>

<p><a id="slv-postcard-a"></a></p>
<h3 id="slv-christmas-1888-a-page-from-an-illustrated-diary">SLV: Christmas 1888. A page from an illustrated diary</h3>

<ul>
  <li>Age: 132 years</li>
  <li>Perceived difficulty: Easy - not cursive script</li>
  <li>URL: <a href="http://handle.slv.vic.gov.au/10381/119196">http://handle.slv.vic.gov.au/10381/119196</a></li>
</ul>

<p>Preview:
<img src="/img/slv_postcard.jpg" width="400" alt="" /></p>

<h4 id="human-transcript-guessed-words-mark-in-">Human transcript (guessed words mark in “”):</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Christmas. 1888. A page from "my" illustrated diary. Drawn by Uncle Charlie. Belgrave. Victoria. Australia. for Christopher Fry.
With every good wish for a h"app"y and prosperous future.
Out party from S"t. Mary's Fie"lds passing through Belgrave. Leicestershire to a meet of the Pychley "Hound"s.
</code></pre></div></div>
<p>Word cout of translatable words: 39</p>

<h4 id="ms-azure-transcription-2">MS Azure transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Christmas. 1888. A page from T illustrated diary. Drawn by Uncle Charlie. Belgrave. Victoria. Australia. for Chridapher Fry.
With every good wish for a har y and prosperous future.
Our Party from Sri Lids passing Through Betgrave. Leicestershire to a meet of the Pychley
</code></pre></div></div>
<p>37 words correct = 95%</p>

<h4 id="google-cloud-transcription-2">Google Cloud transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Christmas. 1888. A page from illustrated diary.   7
Drawn by Uncle Charlie. Belgrave. Victoria. Australia. for Chruida pher Fry,  9
With every good wish for a har y and prosperous fuhre,  8
Our Party from Sm i ds passing through Belgrave Leicestershire to a meet of the Py chley  12
</code></pre></div></div>

<p>36 words correct = 92%</p>

<h4 id="aws-rekognition-transcription-2">AWS Rekognition transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Chrishmas. 1888. pace from illustrated diary. Drawn by uincle Chaplie. Belgrave. Victoria Australia. for Chridlapher Fry.
With every dood wish and prosperous fuhre.      
Our parly jrom passing Through graveleiceshershire te mectof Hhe Pychlay     
</code></pre></div></div>

<p>39 words correct = 49%</p>

<p><a id="slv-diary-a"></a></p>
<h3 id="slv--diary-of-charles-evans-1853-september-24--1855-january-21---page-7">SLV : Diary of Charles Evans, 1853 September 24 -1855 January 21 - page 7</h3>

<ul>
  <li>Age: 167 years</li>
  <li>Perceived difficulty: Very hard cursive, messy on paper with background ‘noise’</li>
  <li>URL: <a href="http://handle.slv.vic.gov.au/10381/236326">http://handle.slv.vic.gov.au/10381/236326</a></li>
</ul>

<p>For more information of the UNESCO listed Charles Evans diary:
<a href="https://blogs.slv.vic.gov.au/news/unesco-adds-gold-rush-diaries-to-australian-memory-of-the-world-register/">https://blogs.slv.vic.gov.au/news/unesco-adds-gold-rush-diaries-to-australian-memory-of-the-world-register/</a></p>

<p>Preview:
<img src="/img/slv_diary.jpg" width="400" alt="" /></p>

<h4 id="transcript">Transcript</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Saturday Sep 24th 1853

         Very busy all day preparing our establish-
         ment previous to it being transferred to
         two young men who will probably make
         more of it than we have as they are quite
         conversant with the business. They came
         up in the evening and took possession
         after paying us 100£ in Cash and 3 months
         bill for 50£ for the stock tents good
         will &amp;c - As for the "good will" they
         have that as far as I am concerned
         in every sense of the term, for am heartily
         sick of the fumes of the oven and dispensing
         lolly-sticks and comfits to the colonial
         small fry. We celebrated the event with
         a glass and a song, and made the night
         of our parting with the old tent as merry
         a one as any we had passed in it.

Sunday Sep 25th
         Andrew Martin and two of his friends
         paid us a visit.
</code></pre></div></div>

<p>153 words</p>

<h4 id="ms-azure-transcription-3">MS Azure transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I wesday defy Saturday def 14- 
the 185 3
very busy all day prepare our bolablinds 
two young new who will probably make
more of it than we have, as they are quite
conversant with the bushup. they carne 
chop in the evening aunt took badsession 
after paying u. 108 I in Cash and 3 month. 
will for fot for the Mock teats good 
will to - he for the good well" they 
has that and far and i am concernis 
In every devise of he term, for Aam heart  
sick of the farmer of the oven and disforcing  
lolly- sticks and coutilt to the &amp; colonial 
small fry.. We celebrates the event with 
a glass and a song, and moises the might  
It our cartry with the ots tent at Marry 
hair us a visit. 
</code></pre></div></div>

<p>92 words correct = 60%</p>

<h4 id="google-cloud-transcription-3">Google Cloud transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Tuesday Sess Saturday Ach 24th 1853 

Veg betyr all day prepare our bestabbits 
Mente previous t ih beung transferred to 
two young men who will prosrably make 
Mone of it than we have, as they are quite 
conversant with the basiese – They came 
ich in the evening and took possession 
after payung us 100 L in Cash And 3 month, 
fill for 50th for the slock that good 
will fe. – Act for the good well\" They 
Lind that as far as I am concerned a 
in every sense of the teren, for all am hearts 
sick of the frames of the oven and dispersing
lolly-sticks and confit to the &amp; colonial
small fry.. We celebrated he event with
a glass and a song, and made the night
of our partag with the old tent at Merry
a one as aug me had passed in it

Andrew Martin and two de le friend 
paid usa vitit.  1
</code></pre></div></div>

<p>115 words correct = 75%</p>

<h4 id="aws-rekognition-transcription-3">AWS Rekognition transcription</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>K 1833
Lasblay sf Satu' ay ode! 24- bsabhlk
rry ndy ale Pay bhef ary houfere our
arut pnevrro! eny prake,
Fur men wl toll rfrohl
Mo Y 1 Han wve Lave bs a Haey one ga
cone nsan wl e looN hub. dbussion Hy cone
uf hy Hh om
</code></pre></div></div>

<p>4 words correct = 3%</p>

<p><a id="conclusion"></a></p>
<h2 id="conclusion">Conclusion</h2>

<p>For automatic transcription of most historical handwritten cursive script documents, Microsoft Azure is the most accurate computer vision provider, with the exception for very hard to read documents where Google Cloud leads.</p>

<p>For more information on the project behind this post please refer to <a href="https://biblio.ai">biblio.ai</a>. Feel free to contact me if you have any queries or comments.</p>

<p><a id="next-steps"></a></p>
<h2 id="next-steps">Next steps</h2>

<p>Given how well Google Cloud performed with the very hard to read Charles Evans diary I’ll take some samples from the super hard Jenkins diaries for analysis. Anything above 0% accuracy for alot of these pages would be an amzaing result.</p>

<p>Text extraction from historical maps the SLV have will then be the next area for benchmarking.</p>

<p><a id="notes"></a></p>
<h2 id="notes-and-acknowledgements">Notes and acknowledgements</h2>

<p>Thanks to the team at the State Library of Victoria and Swinburne University Library for sourcing and assitance with the images!</p>

<p>Note that the copyright of some of these images may be undetermined status and is provided here for educational purposes.</p>

<p>Source code of this page is available on <a href="https://github.com/justinkelly/justinkelly.github.io/blob/master/_posts/2020-01-07-benchmarking-computer-vision-transcription-of-historical-handwritten-documents.md">GitHub</a>. If you spot any issues with this post either post in the comments or in GitHub.</p>

<p>Sourse code used to do the transcriptions for the various cloud providers will be available:  <a href="https://github.com/justinkelly/biblio.ai">https://github.com/justinkelly/biblio.ai</a></p>]]></content><author><name></name></author><category term="biblio.ai" /><summary type="html"><![CDATA[Summary For automatic transcription of most historical handwritten cursive script documents, Microsoft Azure is the most accurate computer vision provider, with the exception for very hard to read documents where Google Cloud leads. Scaled scores: MS Azure: 91 Google Cloud: 89 Amazon AWS: 11 Sample of the MS Azure transcription of the George Swinburne diary Table of contents Introduction Benchmarking Results Images Conclusion Next steps Notes and acknowledgements Introduction Thanks to the generous funding of the State Library of Victoria’s (SLV) Digital Fellowship program, I have been able to start work on the development of an open source service ( biblio.ai ) for institutional libraries to use the latest in computer vision technologies to automatically enrich the metadata of their image collections. The 3 main areas of automatic metadata enrichment will be: Transcription of handwriting and text in images ( handwritten notes, typed text, even words in signs,etc.. ) Object detection for tagging/labeling (pick out the main objects in an image - enable users to browser a collection via tags etc..) Automatic descriptions of images (adding human readable descriptions in bulk to an image collection, where existing records may have little or no metadata/cataloguing) The purpose of metadata enrichment with computer vision is: Discoverability. Enable the discoverability and re-use of a libraries digital collections by a much greater audience. Accessibility. Without rich metadata a libraries image collection is inaccessible to vision impaired users. Automate the hard stuff for libraries. Institutional libraries such as the SLV have millions of scanned images, but only a small fraction of these (the really important works) get transcribed (it’s expensive and very time consuming) An example of how time consuming transcription can be. The SLV’s volunteer run project to transcribe the 25 volumes of the diaries of Joseph Jenkins (an itinerant labourer in gold rush era Central Victoria), has so far taken 10 years to get through 15 volumes. The YouTube video and document linked below have some great details on this transcription project. https://www.youtube.com/watch?v=Olil-MGljA0 http://handle.slv.vic.gov.au/10381/32073 The first step in this project is to benchmark the available computer vision technologies to identify which the most suitable for the various types of collections and institutions. By far, the most useful thing computer vision can do for institutions libraries with image collections is the transcription of words (specifically cursive handwriting) in images to text. Making huge collections of images and documents searchable and accessible. Automatic transcription of typed text documents via standard OCR software has been available for years. It is only very recent that transcription of cursive handwritten images has started to move from purely academic space to being generally available from leading cloud providers. For conversion of scanned handwritten documents to text there are currently (Jan 2020) 4 main providers. Microsoft Azure Computer Vision Amazon AWS Rekognition Google Cloud Vision ABBYY Cloud OCR Notes: unfortunately ABBYY is suitable only for form style block letter handwriting conversion and the leading open source OCR solution Tesseract is currently not easily suitable for general handwriting recognition. Both of these options have been excluded in the following tests. Benchmarking Images of various degrees of difficulty to read/transcribe were selected from SLVs and Swinburne Universities image collections. If the image had an existing transcription - this was used as the base to compare the computer vision transcriptions against. If a transcription didn’t exist I manually transcribe the image to use as the base. Each automated transciption was analysed for the number of words there were correctly transcribed. Correctly being defined as all letters in a word matching the base transcription. The number of words correct per transciption was then converted to a percentage and then scaled based on the perceived difficulty of the text. Benchmark results Document Age (years) Difficulty MS Azure Google Cloud Amazon AWS Swinburne: George Swinburne’s diary, 1882 page 4 138 Hard 79% 62% 7% Swinburne: Letter from aboard S.S. Shropshire 100 Medium 78% 66% 4% SLV: Diary of Charles Evans, 1853 - page 7 167 Very Hard 60% 75% 3% SLV: Christmas 1888. A page from an illustrated diary 132 Easy 95% 92% 49% Average     78% 74% 16% Scaled for difficulty: Easy = .5, Medium = 1, Hard = 1.5, Very Hard = 2 MS Azure: 91 Google Cloud: 89 Amazon AWS: 11 George Swinburne’s diary, 1882 to 1886 page 4 Age: 138 years Perceived difficulty: Hard (cursive, clear writing on clean paper) URL: https://commons.swinburne.edu.au/items/0903ff8d-ddc3-41f9-96b6-928eee4d1bde/1/ Preview: Human transcription 3rd Feb 1882 This is my 21st birthday! I am a man! very significant that; as father says, I will have to think and act for myself now. Well, with God's help I will endeavour to do right I live a thruthtful, righteous &amp; sober life. It is my aim ever to better myself, although I sometimes miserably, oh! miserably fail. I've got a temper, which wants conquering, &amp; a heart, that often harbours most wicked &amp; vile thoughts, cleansing. God strengthen me in my warfare &amp; help me to live the life I have to live nobly &amp; well, always prepared to die. I have been a Sunday School Teacher for about 7 years, that has done me a power of good, &amp; my feeble efforts at working for Christ on the Sunday, have often fortified me for the next week. One needs spiritual armour on the Quayside. I see more of what people term 'life' on the Quay, than I expected when I first went, but I am fortunate in being a good office, where I have been for 7 years (come 18th May). The best way to conquer evil thoughts is to work hard, at least, 193 Words MS Azure transcription god Feb 1882 This is my bit birthday! I'm aman very significant that, as father days, Swill have to think and act for myself now. Well, with God's help I will endeavour to do right Hive a truthful, righteous asober life. It is my aim ever to bett mapelf, although Isorhetimes miserably, oh miserably foul. lie got a temper, which wants conquering, Dad heart that often harbours most wicked Write thoughts, cleansing. God strengthen me in my warfare thell she to live the life I have to live nobly twell, always prepared to did. Is have been a Sunday School Teacher for about 4 years, that has done me a power of good, Imy feeble efforts at working for Christ low the Sunday, have often fortified me for the next weekd. ones needdo spirituall armour oh the Wayside. Isee more of what people term life on the away, than Sexplated when I first went, but Sam fortunate in being in a good office , where I have been for yyears / come 18th may). The best way to conquer edit thoughts is to work hard, at least, 152 words correct = 79% Google Cloud transcription grd Feb. 1982 This is my 214 birthday! So am very significant thatli as father day, will have Etichoke land act for mopelfmou. Well, with Gods heff I will endeavour/t dorught Mive a truthful, righteous tsober life. It is my aim ever te letter mpelf, although Seorhetines miserables, oh! miserably fail. I've gota tempe, Thich warto coslabering, Mal heart that often harbouro most wicked Bila thought, dansing. Lod shengthens me in my was fase Shelf she to live the life I have to like no hd, Puell I have been a Sunday School Teacher for about y years, that has done me a powerlof good, ding feeble effato at working for Christ on the Sunday, have often fortified me for the next weeld. Onei neelde Sikituallarmour on the Luapide. Isee mme of what people term 'life on the way, than Iexpleted when I first went, but I am fortunate in being in a good office, where Shave been for Y yearo come 1st May). \"The best way to conquer evil thought in to work hard, at least, 119 words correct = 62% AWS Rekognition transcription grd Hef 1882 This 2/2 Lirthday! Sim thal is my La ther vill aman. Have y t thile uign ft as dayi Lod and act for mou Mell wth hel will endeawour t doreg ht rlive a LuLh ighleos soher e L Itisa aime ever l Le ttod mspey althhughe aohe times serafla 14 words correct = 7% Swinburne : Letter from aboard S.S. Shropshire Age: 100 years Perceived difficulty: Medium (cursive script, well spaced, pretty clear) Image URL: https://commons.swinburne.edu.au/items/cd53e247-3e39-458e-8582-9fa0a2a2e120/1/ Preview: Note: Image is of undetermined copyright status and is provided here for educational purposes Human transcription S.S. Shropshire At Sea 4th March 19120 Mr F.W. Green. Dear Sir, We are off to Freemantle for another 40,000 Boxes of apples after loading 110,00 at Hobart. We did not call at Melbourne again. We will land 6,000 Boxes of Butter at Freemantle from Sydney. Will you please remember me to Mr Waters &amp; to my fiends at My College and with best wishes to yourself I remain Yours faithfully John Duncan Word count: 77 MS Azure transcription S.S. Shropshire my. N. Green. Aff March. 19120, Dear Dien We are off- to Freemantle for another 40.000 Boxes of apples after loading 1 10.000, at Nobrash, wel did not call at Melbourne, again. We will land 6.000. Boxes of Butter atfreemantle from Sydney . Will you please remember me to Mr Waters &amp; bol my friends at Me College. and with best wishes to yourself I pemain yours faithfully John. Duncan 60 words correct = 78% Google Cloud transcription S.s. Ohropshire lat sea 4 Wareh. 19120, Mr J. W. Green. Dear Sire We are off to freemantle for another 40,000 Bosces of apples after Loading 110000, at Hobart, wobel did not eallah Melbfturne, again. He will land 6.000. Boces of Buter and Freemande from Sydney. Will you fleebe femember me to Me Waters e to my friends at the College and with Post wishes to : yourself Yours faithfully seman can 51 words correct = 66% AWS Rekognition transcription S.S. ehrefiuhi ea Woes areh. 191 20. Greeen We al f b Freemanl or anothe 40:0 Noba: Boeis ples olidl plo aplen oallo Hoarding Melo frne 1/10000, al gain. Me u'llland 6.00D. Bocoo BBuMer a antle ylny. Iill fe bpe me 3 words correct = 4% SLV: Christmas 1888. A page from an illustrated diary Age: 132 years Perceived difficulty: Easy - not cursive script URL: http://handle.slv.vic.gov.au/10381/119196 Preview: Human transcript (guessed words mark in “”): Christmas. 1888. A page from "my" illustrated diary. Drawn by Uncle Charlie. Belgrave. Victoria. Australia. for Christopher Fry. With every good wish for a h"app"y and prosperous future. Out party from S"t. Mary's Fie"lds passing through Belgrave. Leicestershire to a meet of the Pychley "Hound"s. Word cout of translatable words: 39 MS Azure transcription Christmas. 1888. A page from T illustrated diary. Drawn by Uncle Charlie. Belgrave. Victoria. Australia. for Chridapher Fry. With every good wish for a har y and prosperous future. Our Party from Sri Lids passing Through Betgrave. Leicestershire to a meet of the Pychley 37 words correct = 95% Google Cloud transcription Christmas. 1888. A page from illustrated diary. 7 Drawn by Uncle Charlie. Belgrave. Victoria. Australia. for Chruida pher Fry, 9 With every good wish for a har y and prosperous fuhre, 8 Our Party from Sm i ds passing through Belgrave Leicestershire to a meet of the Py chley 12 36 words correct = 92% AWS Rekognition transcription Chrishmas. 1888. pace from illustrated diary. Drawn by uincle Chaplie. Belgrave. Victoria Australia. for Chridlapher Fry. With every dood wish and prosperous fuhre. Our parly jrom passing Through graveleiceshershire te mectof Hhe Pychlay 39 words correct = 49% SLV : Diary of Charles Evans, 1853 September 24 -1855 January 21 - page 7 Age: 167 years Perceived difficulty: Very hard cursive, messy on paper with background ‘noise’ URL: http://handle.slv.vic.gov.au/10381/236326 For more information of the UNESCO listed Charles Evans diary: https://blogs.slv.vic.gov.au/news/unesco-adds-gold-rush-diaries-to-australian-memory-of-the-world-register/ Preview: Transcript Saturday Sep 24th 1853 Very busy all day preparing our establish- ment previous to it being transferred to two young men who will probably make more of it than we have as they are quite conversant with the business. They came up in the evening and took possession after paying us 100£ in Cash and 3 months bill for 50£ for the stock tents good will &amp;c - As for the "good will" they have that as far as I am concerned in every sense of the term, for am heartily sick of the fumes of the oven and dispensing lolly-sticks and comfits to the colonial small fry. We celebrated the event with a glass and a song, and made the night of our parting with the old tent as merry a one as any we had passed in it. Sunday Sep 25th Andrew Martin and two of his friends paid us a visit. 153 words MS Azure transcription I wesday defy Saturday def 14- the 185 3 very busy all day prepare our bolablinds two young new who will probably make more of it than we have, as they are quite conversant with the bushup. they carne chop in the evening aunt took badsession after paying u. 108 I in Cash and 3 month. will for fot for the Mock teats good will to - he for the good well" they has that and far and i am concernis In every devise of he term, for Aam heart sick of the farmer of the oven and disforcing lolly- sticks and coutilt to the &amp; colonial small fry.. We celebrates the event with a glass and a song, and moises the might It our cartry with the ots tent at Marry hair us a visit. 92 words correct = 60% Google Cloud transcription Tuesday Sess Saturday Ach 24th 1853 Veg betyr all day prepare our bestabbits Mente previous t ih beung transferred to two young men who will prosrably make Mone of it than we have, as they are quite conversant with the basiese – They came ich in the evening and took possession after payung us 100 L in Cash And 3 month, fill for 50th for the slock that good will fe. – Act for the good well\" They Lind that as far as I am concerned a in every sense of the teren, for all am hearts sick of the frames of the oven and dispersing lolly-sticks and confit to the &amp; colonial small fry.. We celebrated he event with a glass and a song, and made the night of our partag with the old tent at Merry a one as aug me had passed in it Andrew Martin and two de le friend paid usa vitit. 1 115 words correct = 75% AWS Rekognition transcription K 1833 Lasblay sf Satu' ay ode! 24- bsabhlk rry ndy ale Pay bhef ary houfere our arut pnevrro! eny prake, Fur men wl toll rfrohl Mo Y 1 Han wve Lave bs a Haey one ga cone nsan wl e looN hub. dbussion Hy cone uf hy Hh om 4 words correct = 3% Conclusion For automatic transcription of most historical handwritten cursive script documents, Microsoft Azure is the most accurate computer vision provider, with the exception for very hard to read documents where Google Cloud leads. For more information on the project behind this post please refer to biblio.ai. Feel free to contact me if you have any queries or comments. Next steps Given how well Google Cloud performed with the very hard to read Charles Evans diary I’ll take some samples from the super hard Jenkins diaries for analysis. Anything above 0% accuracy for alot of these pages would be an amzaing result. Text extraction from historical maps the SLV have will then be the next area for benchmarking. Notes and acknowledgements Thanks to the team at the State Library of Victoria and Swinburne University Library for sourcing and assitance with the images! Note that the copyright of some of these images may be undetermined status and is provided here for educational purposes. Source code of this page is available on GitHub. If you spot any issues with this post either post in the comments or in GitHub. Sourse code used to do the transcriptions for the various cloud providers will be available: https://github.com/justinkelly/biblio.ai]]></summary></entry><entry><title type="html">Australian Marraige Law data by electoral region</title><link href="https://justin.kelly.au/marriage-law/" rel="alternate" type="text/html" title="Australian Marraige Law data by electoral region" /><published>2017-11-15T00:00:00+11:00</published><updated>2017-11-15T00:00:00+11:00</updated><id>https://justin.kelly.au/marriage-law</id><content type="html" xml:base="https://justin.kelly.au/marriage-law/"><![CDATA[<p>The Australian Bureau of Statistics (ABS) published their data set of the 2017 Australian Marriage Law vote today in Excel format.</p>

<p>So far I’ve only seen the results of this election by State. Personally I find the results by federal electoral region more interesting.</p>

<p>Thankfully importing ABS data and Australian Electoral Commision(AEC) map files into Tableau is straight forward so we can easily analysis the YES vote per federal electoral</p>

<p>Here is the Australia wide view - you can zoom in to a specifc area to get more details, further below are the zoomed in Melbourne and Sydney maps for comparison.</p>

<div class="tableauPlaceholder" id="viz1510709030958" style="position: relative"><noscript><a href="#"><img alt=" " src="https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;20&#47;2017AustralianMarriageLawvote&#47;Australia&#47;1_rss.png" style="border: none" /></a></noscript><object class="tableauViz" style="display:none;"><param name="host_url" value="https%3A%2F%2Fpublic.tableau.com%2F" /> <param name="embed_code_version" value="3" /> <param name="path" value="views&#47;2017AustralianMarriageLawvote&#47;Australia?:embed=y&amp;:display_count=y&amp;publish=yes" /> <param name="toolbar" value="yes" /><param name="static_image" value="https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;20&#47;2017AustralianMarriageLawvote&#47;Australia&#47;1.png" /> <param name="animate_transition" value="yes" /><param name="display_static_image" value="yes" /><param name="display_spinner" value="yes" /><param name="display_overlay" value="yes" /><param name="display_count" value="yes" /><param name="filter" value="publish=yes" /></object></div>
<script type="text/javascript"> var divElement = document.getElementById('viz1510709030958'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='1000px';vizElement.style.height='850px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>

<h2 id="melbourne">Melbourne</h2>

<div class="tableauPlaceholder" id="viz1510708804922" style="position: relative"><noscript><a href="#"><img alt=" " src="https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;20&#47;2017AustralianMarriageLawvote&#47;Melbourne&#47;1_rss.png" style="border: none" /></a></noscript><object class="tableauViz" style="display:none;"><param name="host_url" value="https%3A%2F%2Fpublic.tableau.com%2F" /> <param name="embed_code_version" value="3" /> <param name="site_root" value="" /><param name="name" value="2017AustralianMarriageLawvote&#47;Melbourne" /><param name="tabs" value="yes" /><param name="toolbar" value="yes" /><param name="static_image" value="https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;20&#47;2017AustralianMarriageLawvote&#47;Melbourne&#47;1.png" /> <param name="animate_transition" value="yes" /><param name="display_static_image" value="yes" /><param name="display_spinner" value="yes" /><param name="display_overlay" value="yes" /><param name="display_count" value="yes" /><param name="filter" value="publish=yes" /></object></div>
<script type="text/javascript"> var divElement = document.getElementById('viz1510708804922'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='1000px';vizElement.style.height='850px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>

<h2 id="sydney">Sydney</h2>

<div class="tableauPlaceholder" id="viz1510709010961" style="position: relative"><noscript><a href="#"><img alt=" " src="https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;20&#47;2017AustralianMarriageLawvote&#47;Sydney&#47;1_rss.png" style="border: none" /></a></noscript><object class="tableauViz" style="display:none;"><param name="host_url" value="https%3A%2F%2Fpublic.tableau.com%2F" /> <param name="embed_code_version" value="3" /> <param name="path" value="views&#47;2017AustralianMarriageLawvote&#47;Sydney?:embed=y&amp;:display_count=y&amp;publish=yes" /> <param name="toolbar" value="yes" /><param name="static_image" value="https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;20&#47;2017AustralianMarriageLawvote&#47;Sydney&#47;1.png" /> <param name="animate_transition" value="yes" /><param name="display_static_image" value="yes" /><param name="display_spinner" value="yes" /><param name="display_overlay" value="yes" /><param name="display_count" value="yes" /><param name="filter" value="publish=yes" /></object></div>
<script type="text/javascript"> var divElement = document.getElementById('viz1510709010961'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='1000px';vizElement.style.height='850px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>

<h2 id="data">Data</h2>

<ul>
  <li>ABS marraige vote data in csv format
    <ul>
      <li><a href="https://justin.kelly.org.au/assets/2017_ABS_Marriage_Law.csv">https://justin.kelly.org.au/assets/2017_ABS_Marriage_Law.csv</a></li>
    </ul>
  </li>
  <li>AEC map file for Tableau
    <ul>
      <li><a href="https://justin.kelly.org.au/assets/national-midmif-09052016.zip">https://justin.kelly.org.au/assets/national-midmif-09052016.zip</a></li>
    </ul>
  </li>
  <li>Tableau Public files
    <ul>
      <li><a href="https://public.tableau.com/profile/justin.kelly#!/vizhome/2017AustralianMarriageLawvote/Australia">https://public.tableau.com/profile/justin.kelly#!/vizhome/2017AustralianMarriageLawvote/Australia</a></li>
    </ul>
  </li>
</ul>

<p>In my next post I’ll got into the details of how to create these maps in Tableau with ABS and AEC data</p>]]></content><author><name></name></author><category term="tableau" /><summary type="html"><![CDATA[The Australian Bureau of Statistics (ABS) published their data set of the 2017 Australian Marriage Law vote today in Excel format. So far I’ve only seen the results of this election by State. Personally I find the results by federal electoral region more interesting. Thankfully importing ABS data and Australian Electoral Commision(AEC) map files into Tableau is straight forward so we can easily analysis the YES vote per federal electoral Here is the Australia wide view - you can zoom in to a specifc area to get more details, further below are the zoomed in Melbourne and Sydney maps for comparison. Melbourne Sydney Data ABS marraige vote data in csv format https://justin.kelly.org.au/assets/2017_ABS_Marriage_Law.csv AEC map file for Tableau https://justin.kelly.org.au/assets/national-midmif-09052016.zip Tableau Public files https://public.tableau.com/profile/justin.kelly#!/vizhome/2017AustralianMarriageLawvote/Australia In my next post I’ll got into the details of how to create these maps in Tableau with ABS and AEC data]]></summary></entry><entry><title type="html">AWS RedShift - How to create a schema and grant access</title><link href="https://justin.kelly.au/aws-redshift-how-to-create-a-schema-and-grant-access/" rel="alternate" type="text/html" title="AWS RedShift - How to create a schema and grant access" /><published>2017-09-08T00:00:00+10:00</published><updated>2017-09-08T00:00:00+10:00</updated><id>https://justin.kelly.au/aws-redshift-how-to-create-a-schema-and-grant-access</id><content type="html" xml:base="https://justin.kelly.au/aws-redshift-how-to-create-a-schema-and-grant-access/"><![CDATA[<p>If you are new to the AWS RedShift database and need to create schemas and grant access you can use the below SQL to manage this process</p>

<h2 id="schema-creation">Schema creation</h2>

<p>To create a schema in your existing database run the below SQL and replace</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">my_schema_name</code> with your schema name</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span><span class="p">;</span> 
</code></pre></div></div>

<p>If you need to adjust the ownership of the schema to another user - such as a specific db admin user run the below SQL and replace</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">my_schema_name</code> with your schema name</li>
  <li><code class="language-plaintext highlighter-rouge">my_user_name</code> with the name of the user that needs access</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">OWNER</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="permissions">Permissions</h2>

<p>Now to allow only <code class="language-plaintext highlighter-rouge">SELECT</code> access to the new <code class="language-plaintext highlighter-rouge">my_schema_name</code> schema to the user <code class="language-plaintext highlighter-rouge">my_user_name</code> run the below SQL and replace</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">my_schema_name</code> with your schema name</li>
  <li><code class="language-plaintext highlighter-rouge">my_user_name</code> with the name of the user that needs access</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">GRANT</span> <span class="k">USAGE</span> <span class="k">ON</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="k">SELECT</span> <span class="k">ON</span> <span class="k">ALL</span> <span class="n">TABLES</span> <span class="k">IN</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">DEFAULT</span> <span class="k">PRIVILEGES</span> <span class="k">IN</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">GRANT</span> <span class="k">SELECT</span> <span class="k">ON</span> <span class="n">TABLES</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
</code></pre></div></div>

<p>To allow <code class="language-plaintext highlighter-rouge">ALL</code> access to the new <code class="language-plaintext highlighter-rouge">my_schema_name</code> schema to the user <code class="language-plaintext highlighter-rouge">my_user_name</code> run the below SQL and replace</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">my_schema_name</code> with your schema name</li>
  <li><code class="language-plaintext highlighter-rouge">my_user_name</code> with the name of the user that needs access</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="k">ALL</span> <span class="n">TABLES</span> <span class="k">IN</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">DEFAULT</span> <span class="k">PRIVILEGES</span> <span class="k">IN</span> <span class="k">SCHEMA</span> <span class="n">my_schema_name</span> <span class="k">GRANT</span> <span class="k">ALL</span> <span class="k">ON</span> <span class="n">TABLES</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
</code></pre></div></div>

<p>If the user <code class="language-plaintext highlighter-rouge">my_user_name</code> does not already have access rights to the database that the schema belongs to run the below SQL and replace</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">my_databasea_name</code> with your database name</li>
  <li><code class="language-plaintext highlighter-rouge">my_user_name</code> with the name of the user that needs access</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">GRANT</span> <span class="k">CONNECT</span> <span class="k">ON</span> <span class="k">DATABASE</span> <span class="n">my_database_name</span> <span class="k">TO</span> <span class="n">my_user_name</span><span class="p">;</span>
</code></pre></div></div>

<h2 id="group-access">Group access</h2>

<p>To assign permissions to a user group rather than individual user in RedShift change the abover queriers from <code class="language-plaintext highlighter-rouge">TO my_user_name</code> to <code class="language-plaintext highlighter-rouge">TO GROUP my_user_group</code>.  Replacing <code class="language-plaintext highlighter-rouge">my_user_group</code> with the name of your RedShift user group.</p>]]></content><author><name></name></author><category term="aws" /><category term="redshift" /><category term="postgres" /><summary type="html"><![CDATA[If you are new to the AWS RedShift database and need to create schemas and grant access you can use the below SQL to manage this process Schema creation To create a schema in your existing database run the below SQL and replace my_schema_name with your schema name CREATE SCHEMA my_schema_name; If you need to adjust the ownership of the schema to another user - such as a specific db admin user run the below SQL and replace my_schema_name with your schema name my_user_name with the name of the user that needs access ALTER SCHEMA my_schema_name OWNER TO my_user_name; Permissions Now to allow only SELECT access to the new my_schema_name schema to the user my_user_name run the below SQL and replace my_schema_name with your schema name my_user_name with the name of the user that needs access GRANT USAGE ON SCHEMA my_schema_name TO my_user_name; GRANT SELECT ON ALL TABLES IN SCHEMA my_schema_name TO my_user_name; ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema_name GRANT SELECT ON TABLES TO my_user_name; To allow ALL access to the new my_schema_name schema to the user my_user_name run the below SQL and replace my_schema_name with your schema name my_user_name with the name of the user that needs access GRANT ALL ON SCHEMA my_schema_name TO my_user_name; GRANT ALL ON ALL TABLES IN SCHEMA my_schema_name TO my_user_name; ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema_name GRANT ALL ON TABLES TO my_user_name; If the user my_user_name does not already have access rights to the database that the schema belongs to run the below SQL and replace my_databasea_name with your database name my_user_name with the name of the user that needs access GRANT CONNECT ON DATABASE my_database_name TO my_user_name; Group access To assign permissions to a user group rather than individual user in RedShift change the abover queriers from TO my_user_name to TO GROUP my_user_group. Replacing my_user_group with the name of your RedShift user group.]]></summary></entry></feed>