Principle /Sr. Staff HA Software Engineer
San Francisco Bay Area (South Bay), USA
Type: Permanent or Contract
Our company is a leading global information and communications technology (ICT) solutions provider. Through our dedication to customer-centric innovation and strong partnerships, we have established end-to-end advantages in telecom networks, devices and cloud computing. We are committed to creating maximum value for telecom operators, enterprises and consumers by providing competitive solutions and services. Our products and solutions have been deployed in over 140 countries, serving more than one third of the world’s population.
As the leading global supplier of telecom equipment solutions, we works with our partners to provide end-to-end cloud computing solution capabilities with the goal of fulfilling the cloud computing needs of its users in the IT and CT sectors. Our cloud computing products and solutions fall into two types including Cloud data center solutions and Cloud application solutions. Our enterprise products and solutions are new business area including IT, CT and relevant solutions for industries.
Our R&D common Competence Center is responsible for company level product design competence improvement and R&D product development efficiency enabler, including architecture and system design, design for X,UCD,CPI, product project model, design, development and testing method and tools.
The Principle HA (High Availability) engineer acts as technical leader of local HA Competence Center in South Bay which is central technical competence center for all our business groups, his/her main responsibilities are HA technologies planning, HA best practice study, HA technical research and development, HA system design and prototype development and also involved in our mainstream product HA design including cloud storage, servers, IP product, Server Provider products and etc. The current main critical technology under development of HA is proactive self-healing for software.
· Build HA architecture for our IT domain platform and various solutions
· Perform research and design on system proactive failure analyses and system self healing. Key areas may include:
o Predictive fault monitor methodology and integrated tools
o HA related Performance index monitor methodology and integrated tools
o Root cause trace tools and methodology
o Proactive fault location and fault recovery methodology, which may include fine grained level recovery for OS, Middleware, application and OSS.
o Disaster Recovery, Fault tolerance and service Load Balance
· Guide design team on availability prototype implementation and collaborate with test team on availability prototype verification.
· Guide design team to deploy HA solution in products and solution.
· Propose HA strategy and roadmap planning for our platform and various products based on advance technologies research, industry benchmark and customer analysis.
· Participate in the related international standards organizations and forums.
• Over 8 years software development experience in ICT (information and communications technology) industry;
• Large system or sub-system level software architecture design experience
• Hands on experience on HA (High Availability), Self Healing, PFA (Proactive Fault Avoidance) or FT (Fault Tolerant)
• Strong programming skill using C/C++
• Knowledge and experience on Linux, Unix, or Open Solaris is strongly desirable
• Solid understanding and experience of distributed system design is preferred;
• Solid understanding and experience of virtualization design is preferred;
• Excellent communication and interpersonal skills, as well work skills within a multi-cultural team