| ESP Journal of Engineering & Technology Advancements |
| © 2025 by ESP JETA |
| Volume 5 Issue 4 |
| Year of Publication : 2025 |
| Authors : Vamshi Krishna Pamula |
:10.56472/25832646/JETA-V5I4P104 |
Vamshi Krishna Pamula, 2025. "Building a Real-time Data Ingestion Platform for Web Log Analytics using GCP Pub/Sub and Dataflow", ESP Journal of Engineering & Technology Advancements 5(4): 19-22.
This paper proposes a scalable low-latency fault-tolerant architecture for real-time web log analytics based on the native stream processing services of Google Cloud Platform. The main contribution is an end-to-end system design that uses Pub/Sub high volume ingestion and custom Dataflow (Apache Beam) pipeline to process high-throughput unstructured log streams plus details of custom parsing, real-time enrichment via Beam Enrichment transform, and event time-based aggregation techniques. Most importantly, it presents an empirical analysis of performance tradeoffs under exactly-Once versus At-Least-Once stream processing semantics toward optimizing both latency and cost of operation with the optimized setting reducing latency in demanding web analytic workloads by a very large factor. In its current version, this system writes output into BigQuery in a format readily available for direct querying at minimal cost through partitioning and clustering.
[1] S. K. G. Maheswari and P. S. J. K. Kumar, "Real-Time Analytics In Streaming Big Data: Techniques and Applications," in 2021 International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 1201-1205.
[2] A. N. A. Bakar, M. R. M. Said, and N. M. Nor, "Development of infrastructure for anomalies detection in big data: Applied implementation of Anomaly Detection in Real-Time using GCP and Apache Beam," in 2020 IEEE 8th International Conference on System Engineering and Technology (ICSET), Bandung, Indonesia, 2020, pp. 141-146.
[3] "Scaling streaming workload on Apache Beam," Google Cloud, 2021. [Online] [Accessed: Oct. 26, 2023].
[4] M. S. Aslan, A. M. S. Al-Talabani, and A. J. H. Al-Sherbaz, "RES: Real-time Video Stream Analytics using Edge Enhanced Clouds," in 2022 International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 2022, pp. 1-5.
[5] J. Roberts, "Serverless Architecture in 2025: Scalability, Cost, and Performance," IEEE Cloud Computing, vol. 12, no. 2, pp. 45-53, Mar. 2025.
[6] T. Akidau, R. Bradshaw, and C. Chambers, "Apache Beam: A Unified Model for Batch and Stream Processing Data," Proceedings of the VLDB Endowment, vol. 11, no. 12, pp. 2070-2073, 2018.
Cloud Computing, Stream Processing, Apache Beam, Dataflow, Web Log Analytics, Low Latency, Event Time Semantics.