How to Design a POI (Point of Interest) Service

How to design a POI (Point of Interest) Service

Functional Requirements

  1. Searching nearby locations
  2. Viewing details
  3. Owners can edit / delete

Non-functional Requirements

  1. Latency: 1s
  2. Freshness: 1 day -> 1 hour
  3. Scalability
  4. Fault Tolerance

Assumptions

  1. 1B users -> 50% -> 500M DAU
  2. 200M businesses
  3. Storage: NoSQL, SQL, in memory

Database Selection

  1. In-memory: using multiple machines to shard the data, and the latency requirement is 1s, it’s expensive to use this
  2. SQL: more expensive
  3. NoSQL: cheaper, fast querying, self-sharding and optimization, suitable

How to scale searching

  • Master-Slave Model, separate reading and writing
  • This system is suitable for using this model because the need for reading is much more than writing
  • Not using whole table scaning here, given longtitude and latitude
  • Composite Index can not speed up the searching here because it follows the Leftmost Prefix Rule, we can only search the range of the latitude once the longitude is matched, so it equals whole table scaning
  • Two other effective searching ways will be introduced in the following content

GeoHash

  • GeoHash is a spatial hashing algorithm that continuously divides latitude and longitude ranges in half (using a Z-order curve), converting a (latitude, longitude) pair into a Base32 string that represents the geographic grid cell in which the location falls
  • It is a value stored in the database
  • The longer the string, the higher the precision of the geographical location

Quad Tree

  • A Quad Tree is a tree data structure that recursively partitions a two-dimensional space into four quadrants (subregions). Each node in the tree has up to four children representing the top-left, top-right, bottom-left, and bottom-right regions
  • It is a memory structure
  • Quad Tree can be used as indicies to speed up the searching (k values), assume each node contains about 100 businesses
  1. Can it fits into memory? (business_id, latitude, longitute, short_brief) < 50B (200M / 100) * (100 * 50B) = 10GB, which can be stored in memory, meets the latency requirement in 1 second

  2. How to build this tree? Need a starting point, stored on disk as a seed value

  3. How to update this tree? Live update / eventually consistecy

  4. Adding a cache to store the latitude, longitude, and their corresponding business list, extra cost.

  5. Construct the Quad Tree according to the deployment (west, middle, east USA), extra cost, availability

System Architecture