ISSN : 2583-2646

Mining Reddit for Market Moves: NLP-Driven Stock Prediction with ML and Deep Learning

ESP Journal of Engineering & Technology Advancements
© 2025 by ESP JETA
Volume 5  Issue 2
Year of Publication : 2025
Authors : Manan Buddhadev, Virtee Parekh
:10.56472/25832646/JETA-V5I2P109

Citation:

Manan Buddhadev, Virtee Parekh, 2025. "Mining Reddit for Market Moves: NLP-Driven Stock Prediction with ML and Deep Learning", ESP Journal of Engineering & Technology Advancements  5(2): 77-89.

Abstract:

For many years, attempts to forecast the behavior of the stock market have fascinated scholars and analysts. However, accuracy never repeats itself due to the interconnection and intricacy of many determinants. The determinants influence one another within a tangled net of impact, complicating prediction modeling. During this age of the digital internet, though, sheer quantities of data on sites have presented fresh avenues for study. This covers various opinions from leading specialists, reputable news organizations, and investing weblogs. Further, social network websites are areas where individuals freely make statements of their thoughts and feelings concerning trend developments in markets. These resources comprise an immense reservoir of possibly helpful unorganized data—information that can enhance traditional predictive models and produce creative methodologies for predicting market trends.

References:

[1] Liddy, Elizabeth D. "Natural language processing." (2001).

[2] Aaron7sun. Daily News for Stock Market Prediction. Kaggle, 2018, https://www.kaggle.com/datasets/aaron7sun/stocknews. Accessed 20 Apr. 2025.

[3] Romanowski, C. "Words, Phrases, Sentences Parsing 1." Rochester Institute of Technology, 2017. Lecture.

[4] Lascarides, A. "Text Classification." University of Edinburgh, 2018. Presentation.

[5] Knispelis, Andrius. "LDA Topic Models." YouTube, uploaded by Andrius Knispelis, 2016, https://www.youtube.com/watch?v=3mHy4OSyRf0. Accessed 20 Apr. 2025.

[6] Nguyen, Thien Hai, and Kiyoaki Shirai. "Topic modeling based sentiment analysis on social media for stock market prediction." Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

[7] Zhang, Xue, Hauke Fuehres, and Peter A. Gloor. "Predicting stock market indicators through twitter “I hope it is not as bad as I fear”." Procedia-Social and Behavioral Sciences 26 (2011): 55-62.

[8] Ichinose, Ko, and Kazutaka Shimada. "Stock Market Prediction Using Keywords from Expert Articles." Recent Advances on Soft Computing and Data Mining: Proceedings of the Third International Conference on Soft Computing and Data Mining (SCDM 2018), Johor, Malaysia, February 06–07, 2018, Springer International Publishing, 2018, pp. 409–417.

[9] Xing, Frank Z., Erik Cambria, and Roy E. Welsch. "Natural language based financial forecasting: a survey." Artificial Intelligence Review 50.1 (2018): 49-73.

[10] Wallach, Hanna M., Iain Murray, Ruslan Salakhutdinov, and David Mimno. "Evaluation Methods for Topic Models." Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 1105–1112.

[11] Bansal, Shivam. "Text Data Cleaning Steps in Python." Analytics Vidhya, 2014, https://www.analyticsvidhya.com/blog/2014/11/text-data-cleaning-steps-python/. Accessed 20 Apr. 2025.

[12] Python Software Foundation. "Lexical Analysis." The Python 3.3 Reference Manual, Python Software Foundation, 2012, https://docs.python.org/3.3/reference/lexical_analysis.html. Accessed 20 Apr. 2025

[13] Cavnar, William B., and John M. Trenkle. "N-Gram-Based Text Categorization." Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, vol. 161175, 1994, p. 14.

[14] Ramos, Juan. "Using TF-IDF to Determine Word Relevance in Document Queries." Proceedings of the First Instructional Conference on Machine Learning, vol. 242, no. 1, 2003, pp. 29–48.

[15] Trstenjak, Bruno, Sasa Mikac, and Dzenana Donko. "KNN with TF-IDF based framework for text categorization." Procedia Engineering 69 (2014): 1356-1364.

[16] Chen, Kewen, et al. "Turning from TF-IDF to TF-IGM for term weighting in text classification." Expert Systems with Applications 66 (2016): 245-260.

[17] Udacity. "Weighting by Term Frequency - Intro to Machine Learning." YouTube, 27 Aug. 2015, www.youtube.com/watch?v=t2Nq3MFK_pg. Accessed 20 Apr. 2025.

[18] RevMachineLearning. "TF-IDF for Machine Learning." YouTube, uploaded by RevMachineLearning, 25 Nov. 2016, https://www.youtube.com/watch?v=4vT4fzjkGCQ. Accessed 20 Apr. 2025.

[19] Sundog Education. "TF/IDF." YouTube, uploaded by Sundog Education, 25 Nov. 2016, https://www.youtube.com/watch?v=6HuKFh0BatQ. Accessed 20 Apr. 2025.

[20] Loss Functions. ML Cheatsheet, readthedocs.io, https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html. Accessed 20 Apr. 2025.

[21] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." The Journal of machine Learning research 12 (2011): 2825-2830.

[22] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "GloVe: Global Vectors for Word Representation." Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Oct. 2014, pp. 1532–1543.

[23] Chollet, Francois, and François Chollet. Deep learning with Python. Simon and Schuster, 2021.

[24] Pérez, Fernando, and Brian E. Granger. "IPython: a system for interactive scientific computing." Computing in science & engineering 9.3 (2007): 21-29.

Keywords:

Deep-Learning; Machine Learning; Natural Language Processing; Stock Markets.