- In depth understanding and knowledge of Hadoop and Spark architecture and RDD transformation
- Proven experience in developing solutions using Spark architecture and PySpark for data engineering pipelines, transformation and aggregation of data from variety of sources into data lake.
- Atleast 3 or more years of relevant experience in developing PySpark programs using APIs.
- Expertise in different file formats like parquet, ORC.
- Experience with troubleshooting, fine tuning Spark and python based applications for scalability and performance.
- Experience in designing hive tables to handle velocity, variety and to handle huge volumes.
- Experience in data ingestion, processing and analyzing data using Spark/SQL from disparate sources.
- Knowledge in using Spark-Submit and Spark UI.
- Experience in creating and then performing operations on Spark RDD.
- Experience in creating Spark Data Frames from RDD, HIVE and Parquet files and then performing Joins and Aggregations on Dataframes.
- Experience in processing data from Python and other API modules.
Hadoop, Spark architecture, Python, Spark RDD
Associated topics: backend, c c++, c++, develop, devops, programming, sde, software developer, software programmer, sw