Stack Overflow Data Analysis Using Hadoop and Spark – HDFS, Yarn, Spark, HBase, Python
Developed a big data project utilizing HDFS, YARN, Spark, Python, and HBase for extracting and analyzing large datasets in XML format. Implemented data pre-processing techniques to extract and convert data to a smaller subset and then to parquet file format for efficient analysis. Deployed the project in three different modes: standalone, core cluster, and one-click cluster.
Attendance System with Liveness detection- Python, TensorFlow, Kubernetes
Developed an attendance system using Python and CNN models. Implemented liveness detection to ensure the authenticity of the user's face. Utilized Flask and Tkinter to create a user-friendly interface for the system. Used Docker for environment standardization to ensure compatibility across different platforms. Constructed a unique dataset of approx. 660 images from 220 students under varied lighting for training a CNN face recognition model.