Availability

Engineered for99.99% availability

The CloudIP platform is built layer by layer to stay online when something goes wrong. This page is the engineering posture: targets, the architecture behind them, and the proof we run every day.

99.99%
Uptime
≤ 60 seconds
Recovery Point Objective (RPO)
≤ 5 minutes
Recovery Time Objective (RTO)
11 nines
Backup durability
AES-256 + TLS 1.3
Encryption
Quarterly, public
DR drills
Engineering targets for the CloudIP platform · Read the availability storyLive status →
Targets

What we engineer for

A single source of truth for the numbers we hold ourselves to. Marketing copy reads the same values as the runbooks.

MetricTarget
Uptime99.99%
Read latency (p95, US)< 50 ms
Write latency (p95, US)< 150 ms
Recovery Point Objective (RPO)≤ 60 seconds
Recovery Time Objective (RTO)≤ 5 minutes
Hot backup retention30 days
Cold archive retention7 years
Backup durability11 nines
US edge presence30+ POPs
DDoS / WAFAlways on
EncryptionAES-256 + TLS 1.3
Admin MFAMandatory
DR drillsQuarterly, public
Public post-mortems≥ 15 minutes impact
Targets describe the engineering posture of the CloudIP platform during the current development phase. They are stated as engineering goals rather than as a contractual service-level agreement. Customers requiring binding SLAs, custom RPO/RTO guarantees, dedicated infrastructure, or cross-cloud cold backups should contact CloudIP Professional Services for a custom engagement.
How we get there

Six pillars behind the numbers

Every target on this page maps to specific engineering practice. None of these is theoretical \u2014 each is a system that runs in production.

Global edge, U.S. focus

Compute and cache run on 30+ U.S. points of presence on Cloudflare’s anycast network. Requests are served from the location closest to the user, with automatic failover between POPs when a region misbehaves.

Five-database isolation

The platform runs on five logically isolated databases — auth, tenant operations, business records, communications, and audit. A schema change or hot table in one cannot stop the others.

Cross-region data replication

Backups, blob storage, and database snapshots are replicated to a second U.S. region within seconds of being written. Object lock prevents tampering even by the customer.

Canary deploys with auto-rollback

Every release ships to 5% of traffic, then 25%, then 100%, with synthetic checks watching the error rate. A bad deploy rolls itself back automatically within 90 seconds.

Graceful degradation by design

Each external dependency — cards, email, AI, shipping — sits behind a circuit breaker with a graceful fallback. A vendor outage shows up as a queue, not a broken page.

Always-on edge protection

Layer 3, 4, and 7 DDoS protection, a managed WAF, and per-tenant rate limits run in front of every request. A noisy integration cannot drain capacity for the rest of the platform.

Layered architecture

What lives where

Resilience comes from making each layer fail independently. Here is the layered map of the CloudIP platform.

Layer

Edge

Cloudflare anycast network with 30+ U.S. POPs. Static pages, marketing sites, and storefronts are cached at the edge so they continue to serve traffic during platform maintenance.

Layer

Compute

Workers run as stateless V8 isolates that auto-scale to demand. The main app is split into sibling Workers for OG image generation, billing, AI, and real-time communications, so a bug in one cannot stop the others.

Layer

State

Stateful coordination runs on Durable Objects — one per tenant for collaboration, one per resource for booking, and one per provider for circuit-breaker state. Failures stay isolated to a single instance.

Layer

Data

Five logically split D1 databases with regional read replicas via the Sessions API. Daily Time-Travel restores are validated against invariants in a sandbox.

Layer

Backup

Snapshots land on R2 with eleven nines of durability and object-lock retention enforced at the storage layer. Cross-region replication runs continuously.

Layer

Queues

Cloudflare Queues with retries, dead-letter queues, and replay tooling. Asynchronous work survives transient failures without operator action.

Layer

Observability

Every request carries a propagated request ID. Logs and metrics feed Workers Analytics Engine. Synthetic Health-Check probes measure every Worker every 60 seconds from three U.S. regions.

Continuous proof

The platform tests itself

A reliability story you cannot verify is marketing. These are the jobs that run on a schedule, and the artifacts they produce.

Synthetic probes every minute

Three U.S. regions hit every Worker\u2019s health endpoint every 60 seconds. Failures roll up into the public status page.

Nightly Time-Travel restore

A randomly chosen tenant is restored to its state from one hour ago, validated against invariants, and discarded. Failures page on-call.

Quarterly DR drill

A full sandbox tenant is restored from cold backup once a quarter. Observed RTO and RPO are published in the changelog.

FAQ

Common questions about availability

Specific answers, not marketing fluff.

During the current development phase, 99.99% is an engineering target rather than a contractual service-level agreement. Customers requiring binding SLAs, custom RPO and RTO guarantees, or dedicated infrastructure can engage CloudIP Professional Services for a custom contract.

Need a custom HA contract?

Dedicated infrastructure, binding SLAs, custom RPO and RTO targets, and cross-cloud cold backup are available through CloudIP Professional Services.