Your team is developing a payment-processing system using Kubernetes. The system needs to be resilient, ensuring that failures of individual services or nodes do not result in downtime. You also want to ensure that the system can recover gracefully from failures. Which architectural strategy should you implement to improve fault tolerance?