I'm a researcher in the social sciences, working on a project that requires me to scrape a large amount of text and then use NLP to determine things like sentiment analysis, LSM compatibility, and other linguistic metrics for different subsections of that content.
The issue: After weeks of work, I've scraped all this information (a few GB's worth) and begun to analyze it using a mixture of Node, Python and bash scripts. In order to generate all of the necessary permutations of this data (looking at Groups A, B, and C together, A & C, A & B, etc), I've generated an unwieldy number of text files (the script generated > 50 GB before filling up my pitiful MBP hard drive), which I understand is no longer sustainable.
The easiest way forward is loading this all into a database I can query to analyze different permutations of populations. I don't have much experience with SQL, but it seems to fit here.
So how do I put all these .txt files into a SQL or NoSQL database? Are there any tools I could use to visualize this data (IntelliJ, my editor, keeps breaking). And where should I do all this work? I'm thinking now either an external hard drive, or on a VPS I can just tunnel into.
Thanks in advance for your advice HN!