Hi, I requested reddit for my data and I got 16Mb of CSVs… which is a considerable amount. Do anyone know of any tool to process / visualize / search … the data. I asume the format is the same for everyone, so maybe someone has already built something like that.
EDIT: the problem is not performance, with files <5Mb I can search with notepad++ in miliseconds. What I’m looking for is a user friendly interface (ideally with thumbnail images, links and such).
The problem with searching for “reddit export data visualizer” is that Google shows posts from reddit about visualization of generic data.
Thanks.
Firstly, can you even open the Csvs? If you can then Power Bi Desktop by Microsoft is the emerging goto for data visualisation
Yes, no problem reading the CSVs, sorry if that wasn’t clear.
I was looking for something more specific. Ideally something like a local web app that renders the posts, comments,… in a webpage with thumbnails and links to reddit elements.
But that’s probably asking too much :).
Thanks for the suggestion!
If you find one, let me know pretty please…
I found a UI for my Hangouts data a while back, occasionally skim through those old chats once in a while. It’s nice to have a tool that visualises data request files in a user friendly way
I’m searching on github different CSV filenames and I found a couple of projects that may be relevant:
humandataincome/hudi-packages-connectors: hudi-packages-connectors is a library that provides a toolset to parse and extract relevant information from the personal data sources provided by major websites or social networks.
d4data-official/archive-lib: Standardize GDPR data archives from various providers.
EDIT: This one also looks interesting:
I’m still trying to figure out how to use them.
Those first two look interesting - thanks!
Power Query can search line by line without loading a file much bigger than your RAM.
The links you posted are weird:
https://pixeldrain.com/u/KfgV7bqn: It offers to download a file with the name
Antimutt in r-Excel ultra.paq8o
which I have no idea what is for.https://the-eye.eu/redarcs: It says “This Reddit Community Has Been Archived”
The first is the result when I extracted all lines with my nick in them from the csv, stored with the best compression around. The second is where to get the csv - and a lot of communities have been archived there, like it says.
Just to confirm I understand: you are talking about Power Query VS Power Bi for dealing with huge datasets, right?
Because, in my case, with 16Mb, I don’t see the need for anything specially powerful. My problem is not performace, but convenience.
Thanks for the input.