Microfiche Project Report

Progress

Thus far, we have scanned three college catalogs for each college type:

For liberal arts colleges, we scanned Carleton College, Lewis and Clark College, and Reed College
For public colleges, we scanned UCLA, Georgia State University, and San Francisco State University
For HBCUs, we scanned Howard University, Hampton University, and Tuskegee University
For community colleges, we scanned Compton Community College, Tri-County Community College, and Windward Community College

For each college, we separated our scans into 5 sections:

Scans containing general information on the college
Scans containing information on the Art curriculum
Scans containing information on the Chemistry curriculum
Scans containing information on the English curriculum
Scans containing information on the Political Science curriculum

After scanning all of the colleges, we performed OCR using Adobe Acrobat. Right now we are in the process of verifying that the text from our scans have been adequately identified so that we can use it for sentiment analysis.

Problems

Originally, we had planned to scan the box of microfiche located in the digital archives which contained information on gifts given to the college. After discussing with Nat Wilson, director of the digital archives, it became apparent that we wouldn’t be able to remediate the contents of the fiche due to privacy restrictions.

We went back to the drawing board and decided to compile microfiche containing college catalogs. Our intention is to collect college catalogs from a variety of colleges- liberal arts colleges, public schools, HBCUs, and community colleges- and compare sentiment analysis of the college types.

Emulator wanted to use a Forge.js to create the hotspots we wanted, but it was too confusing so instead, using HTML image map functionality to make it happen instead.

Tools and Techniques

We have used Crazy Frog, the machine on 4th libe, to scan our microfiche. Crazy Frog is hooked up to a Windows computer in the library which visualizes the fiche using ScanPro 2000 software.
We used Adobe Acrobat DC Pro to perform OCR on our scans.
Emulator created from scratch w/ HTML & CSS

Deliverables

Thursday, 3/3:
- Verify scans are adequate for OCR
- Identify sentiment analysis software to use
  - Potential suitors are Python package, lingmotif
- Make tutorial for class
  - Idea: how to use Crazy Frog/scan microfiche
- Create function skeletons for emulator operations
  - Print(Download), Rotate, ZoomIn/Out
Tuesday 3/8:
- Have all scans scanned, OCRed
- Print/Rotate/Zoom Completed
- Manually transcribe intros from Carleton, Howard, UCLA, Windward
- Test OCR using Tesseract, abbyy, omnipage
Thursday 3/10:
- Upload OCR analysis to website
- Upload transcriptions to website
End of term:
- Complete final writeup of research
- Add finishing touches to UI

References:

Microfiche: https://en.wikipedia.org/wiki/Microform
Adobe: https://helpx.adobe.com/document-cloud/help/using-ocr-exportpdf.html
Tesseract: https://programminghistorian.org/en/lessons/retired/OCR-with-Tesseract-and-ScanTailor
Abbyy: abbyy.com
Omnipage: https://www.kofax.com/products/omnipage
Forge.js: https://forgejs.org/

2 Comments


March 3, 2022
Austin

Team Fiche, this is a solid plan that shows adaptability and an inspiring interest in tackling the medium from multiple angles. I love that you’ll have an emulator alongside the remediated text and subsequent analysis to give readers a sense of the whole analog to digital process.

As you move to the analysis stage, try to nail down hypotheses for each step. What do you expect sentiment analysis of the different institutions’ introductions to show? Different departments? Be open to other forms of text analysis if sentiment falls short. Simple word clouds could be enlightening.

March 3, 2022
Eric Gassel

http://gassele.sites.carleton.edu/uncategorized/progress/

Henry

2 Comments

Leave a Reply Cancel reply