• Thoughtwave Software & Solutions
  • 60 Lispenard St, New York, NY 10013, USA
  • Nov 09, 2020
[Information Technology]

Job Description

Location: Fully Remote
Contract: 8+ months


Spark Streaming using spark specifically for streaming, data sent in batches. Will be building a spark framework for internal spark customers so they don t need code

Kafka will work in conjunction with spark

Java will be coding within spark, ideally have scala. Will be pushing through algorithms, data structures, etc

NoSQL Will be pushing spark/kafka streams and batches into NoSQL based DB s

Nice to Have:


Cloud any cloud platform is fine, can even come from on prem

DevOps Skills

Job Description:

Primary Responsibilities:

Design and build large scale data processing system (real-time and batch) to address growing AI/ML and Data needs of a Fortune 500 company

Build a product to process large amount data/events for AI/ML and Data consumption

Automate test coverage (90+%) for data pipelines. Best practices and frameworks for unit, functional and integration tests.

Automate CI and deployment processes and best practices for the production data pipelines.

Build AI/ML model based alert mechanism and anomaly detection system for the product. The goal is have a self-annealing product

Required Skills/Experience

10+ years of overall experience in software development with 5 or more years of relevant experience in designing, developing, deploying and operating large data processing data pipelines at scale.

3 or more years experience with Apache Spark for Streaming and batch process

Good knowledge on Apache Kafka

Strong background in programming (Scala/Java)

Experience on building reusable data frameworks/modules

Experience on Airflow scheduler

Experience with Containers, Kubernetes and scaling elastically
Strong background in algorithms and data structures
Strong analytical and problem solving skills
Strong bent towards engineering solutions which increase productivity of data consumers
Strong bent toward completely automated code deployment/testing (DevOps, CI/CD)

Passion for data engineering and for enabling others by making their data easier to access.

Some experience with working with and operating workflow or orchestration frameworks, including open source tools like Activiti, Spring Boot, Airflow and Luigi or commercial enterprise tools.

Excellent communication (writing, conversation, presentation) skills, consensus builder

Demonstrated ability to tackle tough coding challenges independently and work closely with others on a highly productive coding team

Must have Skills: Apache Spark Streaming, Apache Kafka, Scala/Java, NoSQL Databases, Elasticsearch & Kibana, Kubernetes, Docker Containers

Nice to have: Knowledge of API Development, Apache Flink experience, Cloud experience, DevOps skills, Any other streaming technologies/tools experience

Thanks & Regards,
Mohan Sai|Technical Recruiter
Thoughtwave Software and Solutions
1444 N, Farnsworth Ave Suite 302, Aurora, IL , 60505
linkedin: - provided by Dice
Associated topics: data analytic, data engineer, data integration, data management, data scientist, data warehouse, database administrator, etl, mongo database, teradata