Job posting has expired

Big Data SRE - Observability and Reliability Engineering

Apple, Inc.
United States, Texas, Austin
July 01, 2022
Summary Posted: Apr 6, 2022 Weekly Hours: 40 Role Number: 200361978 At Apple, great ideas quickly have a way of becoming great products, services, and customer experiences. Bring passion and dedication to your job and there's no telling what you could accomplish here. Do you want to be part of a team that builds cutting edge software service, a team that is continually innovating and is proud of making a difference? If so, bring your passion and talent and come join us to be part of something big and amazing. Apple's AML(Applied Machine Learning) team is looking for highly motivated and talented Big Data Site Reliability Engineers (SRE) to build and operate the next generation of platform, frameworks and software services that powers several mission critical applications. Key Qualifications Hadoop SRE Experience - Manage HDFS, YARN, HIVE. Experience in managing Kafka-based data-pipelines / Spark Jobs / Airflow DAGs / Jupyter Notebooks. Experience in managing/deploying application on thousands of nodes across multiple Data-centers using configuration management tool (such as SaltStack, Ansible etc.) Expert level hands-on experience of one of the tooling language - Python, Golang, Java or other JVM languages. Excellent troubleshooting, problem solving, critical thinking, and communication skills Good understanding of Unix/Linux based operating system. Proficient in Linux, command-line tools, and general system debugging Description We work on Apple scale opportunities and challenges. We are engineer at heart. We like solving technical problems. We believe that a good SRE must be a good software engineer and can code anything which has a logic and pattern to it. We believe a good engineer has the curiosity to dig into inner workings of technology and is always experimenting, reading and in constant learning mode. If you are a software engineer with passion to code and dig deeper into any technology, love knowing the internals, fascinated by distributed systems architecture, we want to hear from you. The person should be capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner. The person will have to participate in 12x7 on-call rotation and provide incident resolution for the production issues in timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams. Education & Experience BS in computer science or related degree with 7+ years or MS plus 5+ years experience or related experience. Additional Requirements - Experience with Kubernetes or other container orchestration framework. - Experience in managing Hadoop clusters with 100's of terabytes of data - Experience in Capacity management on multi tenant Hadoop cluster