ROLE: BIGDATA ENGINEER
Location: Fully Remote
Contract: 8+ months
NOTE: ONLY W2
Spark Streaming using spark specifically for streaming, data sent in batches. Will be building a spark framework for internal spark customers so they don t need code
Kafka will work in conjunction with spark
Java will be coding within spark, ideally have scala. Will be pushing through algorithms, data structures, etc
NoSQL Will be pushing spark/kafka streams and batches into NoSQL based DB s
Nice to Have:
Cloud any cloud platform is fine, can even come from on prem
Design and build large scale data processing system (real-time and batch) to address growing AI/ML and Data needs of a Fortune 500 company
Build a product to process large amount data/events for AI/ML and Data consumption
Automate test coverage (90+%) for data pipelines. Best practices and frameworks for unit, functional and integration tests.
Automate CI and deployment processes and best practices for the production data pipelines.
Build AI/ML model based alert mechanism and anomaly detection system for the product. The goal is have a self-annealing product
10+ years of overall experience in software development with 5 or more years of relevant experience in designing, developing, deploying and operating large data processing data pipelines at scale.
3 or more years experience with Apache Spark for Streaming and batch process
Good knowledge on Apache Kafka
Strong background in programming (Scala/Java)
Experience on building reusable data frameworks/modules
Experience on Airflow scheduler
Experience with Containers, Kubernetes and scaling elastically
Strong background in algorithms and data structures
Strong analytical and problem solving skills
Strong bent towards engineering solutions which increase productivity of data consumers
Strong bent toward completely automated code deployment/testing (DevOps, CI/CD)
Passion for data engineering and for enabling others by making their data easier to access.
Some experience with working with and operating workflow or orchestration frameworks, including open source tools like Activiti, Spring Boot, Airflow and Luigi or commercial enterprise tools.
Excellent communication (writing, conversation, presentation) skills, consensus builder
Demonstrated ability to tackle tough coding challenges independently and work closely with others on a highly productive coding team
Must have Skills: Apache Spark Streaming, Apache Kafka, Scala/Java, NoSQL Databases, Elasticsearch & Kibana, Kubernetes, Docker Containers
Nice to have: Knowledge of API Development, Apache Flink experience, Cloud experience, DevOps skills, Any other streaming technologies/tools experience
Thanks & Regards,
Mohan Sai|Technical Recruiter
Thoughtwave Software and Solutions
1444 N, Farnsworth Ave Suite 302, Aurora, IL , 60505
linkedin: - provided by Dice
Associated topics: data analytic, data engineer, data integration, data management, data scientist, data warehouse, database administrator, etl, mongo database, teradata