Systems Engineer, Site Reliability Engineering Job Listing at Google in Los Angeles, CA (Job ID googleus-46089)

Google

Location: Los Angeles, CA
Posted: 06/18/2013
Refreshed: 06/18/2013
Application deadline: None
Type: Not specified
Career Level: Not specified
Salary Range: Not specified
Number of Jobs: 1
Relocation Available: No
Show all jobs for Google
Industries
Media / Publishing, Business Services
Description
As a Systems Engineer working on Google's critical production applications and
infrastructure, your mission will be to ensure Google is always fast,
available, scalable and engineered to withstand unparalleled demand. You will
design and develop the systems which run Google Search, Gmail, YouTube, Maps,
Docs, Ads, Blogger, AppEngine, Google+ and more. You'll own the production
services which comprise *.google.com, as well as key infrastructure like GFS,
BigTable, MapReduce, Chubby and large-scale 'cloud computing' clusters.

You will also be driving performance and reliability from software and
infrastructure at massive scale, where even the 0.01% case must be considered.
You will encounter challenging, novel situations every day, and work with just
about every other engineering and operations team at Google. You will be
looked upon as an expert and advocate to fellow engineers on making design and
reliability trade-offs in running large-scale services and engineering complex
systems that fail gracefully and transparently to users.

The most successful candidates for this role will have strong analytical and
troubleshooting skills; fluency in coding, algorithms, and systems design;
solid communication skills; and a desire to solve complex problems of scale
which are uniquely Google. We are particularly interested in software
engineers, systems administrators, and UNIX programmers familiar with aspects
of running web services at scale. Depth in networking technologies and
UNIX/Linux internals are strong pluses.

Responsibilities

* Manage availability, latency, scalability and efficiency of Google services by engineering reliability into software and systems
* Respond to and resolve emergent service problems; build tools and automation to prevent problem recurrence
* Review and influence new and evolving design, architecture, standards, and methods for operating services and systems
* Participate in software and system performance analysis and tuning, service capacity planning and demand forecasting
* Perform periodic on-call duty as part of a global team

Minimum qualifications

* BA/BS degree in Computer Science or related field (In lieu of degree, 4 years of relevant work experience)
* 3 years of relevant work experience, including with UNIX/Linux systems requiring the use of languages like Python, C, C++, Java, Perl, Shell or PHP
* Technical troubleshooting and performance tuning experience

Preferred qualifications

* 6 years of relevant work experience, including in a high-volume or critical production service environment as well as experience leading short projects involving outside teams
* Experience coordinating or leading small cross-team technical projects
* Experience in OSes and systems (e.g. UNIX internals, device drivers, FreeBSD), open source tools (e.g. dtrace, ktrace), web service components (e.g. load balancing, LAMP stack), storage and clustering (e.g. column stores, Hadoop), scripting and programming languages (e.g. Erlang, Haskell, Scala or Scheme)
* Strong written and spoken English language skills

Apply on Company Website