Reference list of email processing resources; focus on analysis, PII handling, preservation, and access.
Apache Pony Mail: Apache Pony Mail is a web-based mail archive browser built to scale to millions of archived messages with hundreds of requests per second. It allows you to browse, search, and interact with mailing lists including creating replies to mailing list threads.
https://ponymail.incubator.apache.org/
ePADD: ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
https://github.com/ePADD/epadd
Email4n6: A simple cross-platform forensic application for processing email files
https://github.com/Marten4n6/Email4n6
imapfw: imapfw is a simple and powerful framework to work with mails.
https://github.com/OfflineIMAP/imapfw
libpff: Library and tools to access the Personal Folder File (PFF) and the Offline Folder File (OFF) format
https://github.com/libyal/libpff
libpst: Library for reading Microsoft Outlook PST files
http://hg.five-ten-sg.com/libpst/
maildir2mbox.py: Convert maildirs (including subfolders) to mbox format
https://gist.github.com/nyergler/1709069
mbox: Package mbox parses the mbox file format into messages and formats messages into mbox files.
https://github.com/blabber/mbox
mstor: A javamail provider supporting the unnofficial mbox mail storage format
https://github.com/benfortuna/mstor
Muse: Revive Precious Memories Using Email
https://mobisocial.stanford.edu/muse/
OfflineIMAP: Read/sync your IMAP mailboxes
https://github.com/OfflineIMAP/offlineimap
DArcMail: Digital Archiving of eMail
CERP decriptive link: https://siarchives.si.edu/what-we-do/digital-curation/email-preservation-cerp
Direct download link: https://siarchives.si.edu/sites/default/files/DArcMail/DArcMail-v1.2-2018-03-07.zip
TOMES: Transforming Online Mail with Embedded Semantics
https://github.com/StateArchivesOfNorthCarolina?utf8=%E2%9C%93&q=tomes&type=public&language=
Avocado Research Email Collection
https://catalog.ldc.upenn.edu/LDC2015T03
https://github.com/ic4f/pluto
PST Indexer using libpff (Simple example from LPFF)
https://github.com/PacktPublishing/Learning-Python-for-Forensics/blob/master/Chapter%2010/pst_indexer.py
Forensic Email Visualization
https://www.cs1.tf.fau.de/research/archive/forensic-email-visualization/
Sotera Newman: Email analysis and visualization
https://github.com/Sotera/newman
Node.js PST tool
https://github.com/epfromer/pst-extractor
Apache Software Foundation Public Mail Archives
https://aws.amazon.com/datasets/apache-software-foundation-public-mail-archives/
Email Research Data Sets
https://sites.google.com/site/emailresearchorg/datasets
Enron / CALO project
https://www.cs.cmu.edu/~./enron/
Enron / Nuix set v1.3
http://info.nuix.com/EnronDownload2013.html
Jeb Bush's Gubernatorial Email Archive
https://ab21www.s3.amazonaws.com/JebBushEmails-Text.7z
Labeled training and test data for email intent machine learning (for sentence-level speech acts)
https://github.com/ParakweetLabs/EmailIntentDataSet
[MS-PST]: Outlook Personal Folders (.pst) File Format
https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst
[pstviewtool]: Microsoft's open source tool for viewing PST structure
https://archive.codeplex.com/?p=pstviewtool
[pstsdk]: Microsoft's cross platform header only C++ library for reading PST files
https://archive.codeplex.com/?p=pstsdk
How MAPI tables work
http://www.dimastr.com/redemption/mapitable.htm
Digital Preservation Coalition (Portal link for articles on email)
https://www.dpconline.org/knowledge-base/preservation-lifecycle/email
Strategies for Preserving Institutional and Researcher Email
https://www.cni.org/wp-content/uploads/2018/09/CNI-email-preservation-ERreport-Spring18.pdf
The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives
https://www.clir.org/pubs/reports/pub175/
Office 365: PII Guidelines ("What the sensitive information types look for")
https://docs.microsoft.com/en-us/office365/securitycompliance/what-the-sensitive-information-types-look-for
Office 365: Overview of retention policies
https://docs.microsoft.com/en-us/office365/securitycompliance/retention-policies
DArcMail Users Guide
https://siarchives.si.edu/sites/default/files/forum-pdfs/SIA_DArcMail_UsersGuide.pdf
A Forensic Email Analysis Tool Using Dynamic Visualization
https://commons.erau.edu/jdfsl/vol12/iss1/6/
A Comprehensive Gold Standard for the Enron Organizational Hierarchy
http://www.aclweb.org/anthology/P12-2032
Machine Learning for email insight https://towardsdatascience.com/how-i-used-machine-learning-to-classify-emails-and-turn-them-into-insights-efed37c1e66
Network Analysis with the Enron Email Corpus
https://arxiv.org/pdf/1410.2759.pdf
Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora
https://pdfs.semanticscholar.org/d103/24c0a31845cb29e6d0157b60fb1130f89624.pdf
A Content-Based Approach to Email Triage Action Prediction: Exploration and Evaluation
https://www.groundai.com/project/a-content-based-approach-to-email-triage-action-prediction-exploration-and-evaluation/