Engineering Principles & Design
This document outlines the key engineering principles, architectural decisions, and trade-offs made during the development of EarnKit. The primary goal was to build a robust, developer-friendly, and reliable monetization toolkit, even within the constraints of a take-home assignment.
Part 1: Core SDK Philosophy & Best Practices
We started with a clear philosophy for the earnkit-sdk
: it should be lightweight, flexible, and completely focused on providing a best-in-class developer experience (DX).
A Focus on Developer Experience (DX)
-
Lightweight and Dependency-Free: A core tenet of the SDK’s design is its minimal footprint. It is intentionally built with zero production dependencies, ensuring it is lean, easy to integrate, and will not bloat a developer’s application. The final production bundle is a mere ~15 KB (and ~4 KB gzipped), making its impact on load times negligible.
-
Decoupled & Wallet-Agnostic: To maximize compatibility across the web3 ecosystem, the SDK is completely wallet-agnostic. Instead of bundling a specific wallet connector, it requires the developer to pass in the walletAddress. This crucial design choice grants developers the freedom to use any wallet connection library they prefer (e.g., Privy, RainbowKit, Web3Modal).
-
Instant Agent Configuration Change: The SDK is designed in such a way that the developer doesn’t need to redeploy their application to change the agent configuration.
-
Performant and Reliable: The EarnKit Backend is designed to be extremely performant and reliable, making sure the developer’s application is not hindered by the backend. Database and query optimizations like indexing, transactions,
Promise.all
,select
queries are implemented. -
Clear & Actionable Errors: The SDK features a set of custom error classes (
EarnKitInitializationError
,EarnKitInputError
,EarnKitApiError
). This empowers developers to move beyond generic error handling and build robust, programmatic responses to specific failures, such as prompting a user to top up when anEarnKitApiError
with a status of 402 is caught. -
Resilience Through Automatic Retries: The SDK is built to be resilient against transient network and server issues. It incorporates a built-in retry mechanism with exponential backoff. If a request fails with a server-side error (5xx), the SDK will automatically retry the request up to two more times, preventing temporary glitches from impacting the user experience.
-
Configurable & Transparent:
- To maintain a clean developer console, all internal SDK logging is disabled by default and can be enabled via a
debug: true
flag. - To handle varying network conditions, the global API request timeout is configurable (
requestTimeoutMs
), making the developer’s application more resilient.
- To maintain a clean developer console, all internal SDK logging is disabled by default and can be enabled via a
Adherence to Software Design Principles
-
Single Responsibility Principle (SRP): The SDK’s sole responsibility is to act as a secure and efficient messenger to the EarnKit backend. All business logic (fee calculations, balance updates) resides on the server. This is a paramount design choice, as it allows for pricing models and features to be updated without requiring developers to update their installed SDK package.
-
Support for Multiple Instances: The SDK was explicitly designed as a standard class, not a singleton. This allows developers to instantiate multiple EarnKit clients in a single application, each configured for a different agent. This is essential for applications that need to manage multiple distinct AI agents simultaneously.
-
Clean & Maintainable Code (DRY): The SDK’s internal structure adheres to the “Don’t Repeat Yourself” principle. For example, common logic like input validation is abstracted into private helper methods (
_validateString
). This reduces code duplication, improves maintainability, and ensures that validation rules are applied consistently across the entire SDK. -
High-Level Helpers for Common Workflows: Beyond providing core API methods, the SDK includes high-level utility functions to simplify complex tasks. The
pollForBalanceUpdate
method, for instance, encapsulates the entire asynchronous workflow of checking for a balance change after a top-up, saving the developer from writing boilerplate polling code.
Part 2: Architectural Deep Dive
These core principles directly informed the key architectural decisions of the platform.
1. The Hybrid Billing Model: On-Chain vs. Off-Chain
- The Challenge: How do you process potentially high-frequency, low-value transactions for each AI interaction without forcing users to pay gas fees every single time and wait for every single transaction to be confirmed?
- The Decision: We implemented a hybrid model.
- Off-Chain (Micro-transactions): Per-prompt fees and credit deductions are handled as atomic updates in our own database.
- On-Chain (Macro-transactions): User top-ups are standard blockchain transactions (e.g., sending ETH on Base Sepolia) directly to the developer’s wallet.
- The Trade-off: This provides an optimal UX with instant, gas-free interactions. The trade-off is that the off-chain balance relies on a centralized, trusted system. For a production environment, this database would require rigorous security, auditing, and disaster recovery protocols.
2. The track/capture/release
Transaction Lifecycle
- The Challenge: How do you guarantee a user is never charged for a failed AI call and make sure the developer is credited for a successful AI call?
- The Decision: We implemented a two-step transaction flow.
track()
is called before the AI operation to place a temporary hold on the user’s balance.capture()
is called only after a successful AI response to finalize the charge. If the AI call fails,release()
is called to cancel the hold and refund the user. - The Trade-off: This pattern adds a minor complexity to the SDK integration for the developer (they must remember to call
capture
orrelease
). However, this is a necessary trade-off for achieving a fair and reliable billing system, which is fundamental to user and developer trust.
3. Asynchronous Top-Up Confirmation
- The Challenge: On-chain transactions can take time to confirm. How do we avoid forcing the user to wait on a loading screen?
- The Decision: The top-up flow is fully asynchronous. The client submits the
txHash
to the backend and the UI responds immediately. A separate, simulated background process on the server monitors the blockchain for confirmation and updates the user’s balance once verified. - The Trade-off: This creates a seamless, non-blocking user experience. The architectural cost is the need for a background worker system. In this assignment, this is simulated with
setTimeout
, but a production system would use a robust job queue (e.g., BullMQ, Celery) for this task.
4. Idempotency to Prevent Double-Charging
- The Challenge: What happens if a developer’s request to
track
a charge times out on the network, and they retry? They might accidentally charge the user twice for the same operation. - The Decision: The
track
endpoint accepts an optionalidempotencyKey
. If a developer provides this unique key (e.g., a UUID generated for the operation), our backend can recognize and discard any duplicate requests, guaranteeing that the user is only charged once.
5. Robust & Performant Backend Design
- The Challenge: A backend handling financial state must be both resilient to errors and fast enough to provide a good user experience.
- The Decision: We adopted several key backend and database best practices:
- Atomicity with Transactions: All multi-step database operations, such as deleting an agent and its related records or refunding a charge, are wrapped in a
prisma.$transaction
. This guarantees that the operations complete as a single, atomic unit—they either all succeed or all fail, ensuring the database is never left in an inconsistent state. - Efficient & Safe Queries: We make extensive use of Prisma’s
select
option to explicitly query for only the data needed by the API endpoint. This prevents accidental data leakage and reduces database load. Furthermore, wherever possible, independent database queries are executed in parallel usingPromise.all
to minimize response latency. - Optimized for Scale with Indexing: We’ve added database indexes to frequently queried columns, such as foreign keys (
developerId
) and fields used in WHERE clauses (likestatus
anduserWalletAddress
). This ensures that database lookups remain fast and efficient, even as the data grows. - Race-Condition Prevention with
upsert
: For creating or updating user balances, we use theupsert
operation. This atomic command elegantly handles both cases in a single database roundtrip, simplifying the logic and preventing race conditions that can occur with a manual “read-then-write” approach.
- Atomicity with Transactions: All multi-step database operations, such as deleting an agent and its related records or refunding a charge, are wrapped in a
- The Trade-off: There are no direct trade-offs to these decisions. Implementing these patterns requires a deeper understanding of database and backend architecture, but the payoff is a significantly more robust, secure, and reliable system.
Part 3: Production Considerations & Future Improvements
The current implementation serves as a robust proof-of-concept. To evolve this into a production-grade service, we would focus on the following key areas:
1. Achieving Production-Grade Reliability & Scale
-
Dedicated Blockchain Monitoring: We would integrate with a dedicated node provider like Alchemy or Infura. This provides guaranteed uptime, higher rate limits, and access to archival data, which is essential for a reliable financial service that must never miss a transaction.
-
Database Read Replicas for Performance: As the platform grows, read-heavy operations (like fetching agent logs and analytics for the dashboard) could slow down critical write operations (like
track
andcapture
). To solve this, we would implement database read replicas.- The Benefit: This architecture isolates the read workload from the write workload, ensuring that intensive analytical queries never block time-sensitive financial transactions, allowing the platform to scale smoothly.
2. Implementing Full-Stack Observability
- Structured Logging: All
console.log
statements would be replaced with a structured logger like Pino/Winston. Logs would be formatted as JSON, including context likeagentId
andtraceId
. These structured logs would be shipped to a centralized service (e.g., Datadog, Logtail) for powerful searching, filtering, and analysis.
3. Hardening Security
- SDK Authentication: Currently, the SDK authenticates with just a public
agentId
. In a production system, this is insufficient. The ideal approach would be a dual-SDK model:- Client-Side SDK (current): For safe, read-only operations like fetching top-up options.
- Server-Side SDK: A separate package for a developer’s backend, authenticated with a secret key. Critical operations like
track
,capture
, andrelease
would be restricted to this server-to-server communication channel with API key authentication.
- CORS Policy: The backend would implement a strict CORS policy to only allow requests from whitelisted domains associated with a developer’s account.
4. Platform Business Model
- Direct vs. Escrow: The current model has users pay the developer’s wallet directly. This is simple but limits business model flexibility.
- Future Vision: A more scalable model would involve a central, secure platform wallet that acts as an escrow. Funds would be paid out to developers on a schedule (e.g., monthly), allowing EarnKit to take a small platform fee (e.g., 5%) as its revenue model, similar to Stripe. This would require a more complex and highly secure payout and accounting system.
5. Application Monitoring & Alerting
- We would integrate an application performance monitoring (APM) tool like Sentry for error tracking and Grafana (with Prometheus or OpenTelemetry) for metrics. We would define and monitor key indicators (e.g., API error rates, request latency) and configure automated alerts to notify an on-call engineer if a critical threshold is breached.
6. Enhancing the Developer & User Experience
-
Real-time Updates via WebSockets & Webhooks: The current polling for balance updates is effective but inefficient. In production, this would be replaced by a push-based system.
- WebSockets: For the
example-agent
, the backend could use a WebSocket to instantly notify the client’s UI when a top-up is confirmed. - Signed Webhooks: For developers building their own backends, we would provide signed webhooks. EarnKit would send a cryptographically signed request to a developer’s registered endpoint upon events like a confirmed top-up, ensuring secure and reliable server-to-server communication.
- WebSockets: For the
-
Dedicated Test Environments: To provide a safe and professional development workflow, we would introduce a full Test Mode. This would include a separate set of test API keys and a dashboard that points to test data. This is a critical feature for any financial API, allowing developers to build and test their integration thoroughly without risking real funds.