Uber's open source technology stack
Uber, a taxi aggregation platform operates around the globe. It was started as monolith application and later re-architected as bunch of microservices, which gives them scalability. Uber uses lot of open source tools and they have contributed quite a lot projects back to the community. This article is about analyzing Uber's open source technology stack.
Programming Language:
Uber initially started with Python and later switched to NodeJS. They also use Java, Go, C++. Java and Go helps to build scalable, high performance and concurrent application.
Database:
Riak and Cassandra are used to meet high-availability, low-latency.
Schemaless, a inhouse product built on top of MySQL is used to store trips. Schemaless is an append-only sparse three dimensional persistent hash map, very similar to Google’s Bigtable. The smallest data entity in Schemaless is called a cell and is immutable; once written, it cannot be overwritten or deleted. The cell is a JSON blob referenced by a row key, a column name, and a reference key called ref key. The row key is a UUID while the column name is a string and the reference key is an integer. Schemaless is NOT open source.
JanusGraph - Scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster.
LevelDB - Fast key-value storage library.
Caching:
Redis is used for caching
Twemproxy is used as a proxy for Redis, primarily to reduce the number of connections to the caching servers on the backend.
Load balancer, Service routing and discovery
When there are bunch of microservices, they should communicate with each other, they should be discoverable and provide high availability. A single API call, may internally call couple of other services. A co-ordination and high availability is required.
Nginx - Web server and also proxy's the request to backend servers.
HAProxy is used to provide high availability
TChannel - Network multiplexing and framing protocol for RPC and it has client in various programming languages.
Ringpop - Builds cooperation and co-ordination between application. It is resilient, client-agnostic sharding and fault tolerant
Thrift and Protobuf is used as Interface definition language (IDL) and helps in serializing data between RPC client and servers.
Development:
Phabricator powers a lot of internal operations, from code review to documentation to process automation.
Github is used for issue tracking and code review for open source projects
Jenkins does continuous integration
Puppet manages system configuration.
React to build user interface for data visualization
Radium set of tools to manage inline styles on React elements
Express.js - Bedrock webserver, is built on top of the this framework
D3.js is used for visualization
Mapbox - Interactive maps using mapbox
Picasso - A powerful image downloading and caching library for Android.
Gulp - Build system
Gradle - Build system for Android
Deployment
Production instances run Linux with Debian Jessie.
Docker containers on Mesos to run microservices
Apache Aurora is used for long-running services and cron jobs.
Clusto - Cluster management tool. It helps keep track of inventory of the infrastructure.
Logging
Logtron is used to log to disk and also push the events to Kafka.
Kafka is used to store events from various services.
ELK stack (ELK stands for Elasticsearch, Logstash and Kibana) is used to index and analyze logs.
Site reliability
Nagios aalerting for monitoring, tied to an alerting system for notifications.
Grafana - Metric analytics & dashboards
Apache Storm and Spark crunch data streams into useful business metrics.
References:
https://github.com/uber-common