Joël de Bruijn

  • 1 Post
  • 52 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle



  • I don’t know.

    • I don’t need formatting but it doesn’t get in the way either. So I am not bothered by it.
    • Also pdf and especially PDF/A standard is widely used for archiving and compliance regulation concerning archival and preservation.
    • If you want text the same tactic goes: just export in bulk to txt instead of pdf

    My main point is: Why would you want a mail specific stack of hosting, storage, indexing and frontends? If it’s all plain text anyway so the regular storage solutions for files come a long way.

    There is an entire industry (which has its own disadvantages) to get communication artefacts out of those systems and put it in document management systems or other forms of file based archival.


  • I had roughly the same goals ( archive search 2 decades of mail) but approached it completely different: I export every mail to PDF with a strict naming convention.

    • Backend: No mailserver, just storage and backup for files.
    • Search: based on filenames FSearch and Void tools Everything. I could use local indexing on pdf content.
    • Frontend: a pdf viewer.





  • Also I’m very much cautious about them on anything browsing related. Discovered (after others also) they let their search-pages-in-a-shop get indexed.

    Meaning I could go to Caterpillar, search for “Wabtec is better” and then this search url (with 0 products) would turn up in Google searches and that URL persisted. Text and all.

    Basically one could spray-paint and tag sites with this graffiti. Shop admins didn’t even have means to remove it.

    Problem ignored and stayed this way for months.





  • Dont know if it’s illegitimate otherwise 😉

    But my user story is like this:

    I want to preserve and archive information I used because it’s a reflection of the things I did, learned and studied throughout life.

    Then my use case are:

    • Orientation about “events”: places to visit on daytrips or holidays (musea, nature, parks, campsites) and looking for practical information and background as well.
    • Gather a “dossier”: info to help make a decision (buying expensive things, how to do home improvement etc)
    • Building a personal knowledge database: interesting articles and blogs.

    My current workflow:

    • Browse
    • Bookmark extensively
    • Download pdf or other content (maps, routes, images) when provided.
    • Open bookmarks.
    • Fireshot every webpage to pdf and png
    • Save everything with a consequent filename (YYYYMMDD - Source - Title)

    I would like to automate the last 3 steps of my workflow.