The Legacy Data Layer: Why Excel Stayed
Keeping track of expenses sounds simple, until the time spent doing it becomes visible.
What began as a neatly structured Excel file slowly turned into an overall ritual of manually typing long receipts, duplicating cell values, and questioning whether all that effort was still justified. This is the story of how that frustration evolved into a fully serverless, GenAI powered expense bot, without abandoning Excel as the source of truth.
The Legacy Data Layer: Why Excel Stayed
Long before automation entered the stack, there was Excel. It wasn’t a placeholder; it was a battle-tested analytical tool.
The spreadsheet already had everything it needed:
- Granular Categorization: Pre-defined monthly tabs and sub-categories.
- Built-in Logic: Native anomaly detection and historical trend analysis.
- Zero Overhead: No database maintenance or UI to manage.
- Readily available on the fly on my smartphone, stored in OneDrive. Already a collaborative environment, shared with family members.
It worked. And it worked well. The blocker was never storage or analysis, it was ingestion. A big grocery session meant a kilometer long receipt, dozens of items, and entering the same values repeatedly for the expense category and beneficiary. That was a massive tax on my time: long receipt, dozens of items, and repeating the same values over and over again.
The Core Trade-Off: Solving the Ingestion Problem
Two questions shaped the technical direction of the solution.
1. Input design: Scheme vs. Natural Language
- How should expense data be entered? The primary friction point was the UI. Structured Table: was precise, but slow, high-friction.
- Unstructured free-text: was fast, but needs additional layers to be parsed into a schema
Typing “Bread, milk, vegetables, paid with card” felt far more natural than filling multiple spreadsheet columns, if something could reliably interpret it. The goal was to achieve “instant messaging” like speed with highest accuracy.
2. Storage: Migration vs. Integration
Where should the data be stored?
The second decision was the data layer. While migrating to a relational database (PostgreSQL/DynamoDB) offered more flexibility, it also introduced significant overhead:
- Migrating all existing expense data to the database
- Managing a new front-end just to view the data.
- Losing the "zero-cost" simplicity of a versioned excel file over maintaining a database server.
The conclusion was a classic engineering pivot: Keep Excel as the “Source of Truth,” but build a modern API wrapper around it.
Why a Chatbot Became the Obvious Interface
If the goal is to reduce friction, the UI must meet the user where they already are. A mobile-first, chat-based interface was the logical conclusion for real-time expense tracking.
The Comparison: Why Telegram Won
I evaluated several “Natural Language” ingestion points before settling on the stack:
- WhatsApp: Dismissed due to the friction of Business API approvals, complex onboarding, and opaque pricing models.
- Custom Puppeteer Wrappers: Scrapping web interfaces is notoriously fragile; any UI update on the host side would break the integration.
- Telegram Bot API: This emerged as the clear winner. It offers a developer-friendly API, near-instant setup, and-crucially-zero operational cost.
The transition from a “Project Idea” to a live “Echo Bot” took a very small amount of time. While Telegram bots are technically public-facing, this trade-off was easily mitigated through User ID filtering and authentication middleware within the backend logic.
The Serverless Lifecycle
Once Telegram was selected, serverless architecture followed naturally.
Rather than maintaining an “always-on” EC2 instance or container that sits idle 99% of the day, the bot follows a purely Event-Driven Lifecycle:
- Ingress: Telegram sends a POST request to the API Gateway endoint
- Trigger: AWS API Gateway receives the request and passes the payload to the Lambda function.
- Execution: The function "wakes up," processes the input (logic + GenAI), and responds.
- Suspension: The compute environment shuts down, incurring zero further costs.
One technical detail: Telegram expects an HTTP 200 OK response. If the backend fails to respond with a successful status, or if it times out, Telegram will retry the request, potentially causing duplicate entries in the spreadsheet.
The Parsing Engine: Gemini as a Deterministic Logic Layer
Free text only works if something understands it.
Structured Parsing
The bot uses Google’s Gemini Flash model not for conversation, but for structured parsing. By applying strict “Prompt Discipline,” the model is constrained to a specific execution pipeline:
- Entity Extraction: Identifying line items, quantities, and unit prices.
- Metadata Tagging: Automatically assigning beneficiaries and expense categories.
- Schema Enforcement: Forcing the output into a JSON format for direct ingestion by the downstream logic.
Multimodal Ingestion
Because Gemini Flash is natively multimodal, the pipeline isn’t limited to text. By passing the same prompt instructions alongside a raw image byte-stream, the bot can “read” a physical grocery receipt and extract the same structured JSON as it would from a text message. Reliability comes from prompt discipline, not blind trust.
Designing the Interaction Flow
The first version worked. Once. Each new message was treated as unrelated. There was no concept of follow-up corrections or additions.
To solve this, I redesigned the interaction into a Stateful Validation Loop. The bot no longer assumes its output is correct; instead, it operates on a “Propose and Confirm” model:
- Extraction: The user sends raw input; the bot returns a structured summary.
- Staging: Data needs to be temporarily remembered but not written to Excel.
- Refinement: The user can provide follow-up corrections (e.g., "Actually, the milk was $x, not $y").
- Commit: Only an explicit confirmation message initiates the write to the spreadsheet.
Ambiguous responses are treated as negative, a deliberate safety choice.
Designing the Interaction Flow
By design, AWS Lambda functions are stateless. Once a request is fulfilled, the execution environment is destroyed. However, human conversation is inherently stateful—especially when a user needs to confirm or edit parsed data before it’s committed to the “Source of Truth”.
To preserve context between messages, the system temporarily stores extracted expenses:
- Ephemeral State with DynamoDB: I used a DynamoDB table as a high-speed, serverless cache. This stores "pending" expenses indexed by the user’s Telegram ID, allowing the bot to remember what it just parsed when the user sends a follow-up "Yes" or "Edit."
- User Isolation: Each Telegram user receives an isolated state partition, ensuring data integrity across concurrent sessions.
- Configuration Management: Instead of hardcoding API keys or using the more expensive AWS Secrets Manager, I used AWS Systems Manager Parameter Store. It offers a cost-effective, secure way to inject environment variables and rotated tokens into the Lambda environment.
Why Writing to Excel Was the Hardest Part
Ironically, the most complex engineering challenge wasn’t the generative AI or the serverless logic – it was the integration with Microsoft Excel. Because the goal was a professional, hands-off automation, I had to move beyond simple “local file” scripts and interface with the Microsoft Graph API.
One time set-up:
- Azure app registration for OneDrive access
- Device code validation and obtaining an initial access token which will be rotated by our Lambda
Programmatic writes required:
- Token rotation
- Session management
- Locating the next empty row
- Append new data to the appropriate worksheet and range
A long-lived refresh token is stored securely and rotated automatically, allowing Lambda to work without repeated manual authentication.
Cost Reality Check
After everything was in place, the final question was cost.
In practice:
- Lambda, API Gateway, DynamoDB → well within the free tier
- Gemini Flash → free within the rate limits which are never reached for this use case
- Parameter Store → free due to very low usage levels
- Microsoft Graph API → free
The only charge here came from stored Lambda deployment bundles in S3, amounting to just a few cents.
Engineering Takeaways: Beyond the Bot
This project was never about building a commercial product; it was an exercise in pragmatic engineering. By solving a personal topic, the build demonstrated four key architectural truths:
- Interfaces Should Meet Users Where They Are: Chat-based ingestion (Telegram) eliminated the "data entry tax" that kills most habits.
- GenAI is a Parser, Not a Pilot: The system's reliability comes from using LLMs for structured extraction rather than open-ended decision-making.
- Serverless is the Ultimate Sandbox: The event-driven stack (Lambda/API Gateway) allowed for enterprise-grade experimentation with zero financial risk.
- Legacy systems aren't obsolete; they're just waiting for a modern interface: Excel remains a world-class analytical engine when you replace its manual UI with a modern API wrapper.
True UX design isn’t about the newest tech – it’s about identifying a specific pain point and building the shortest possible path to solve it.
In this article:
Software Engineer
Levi9 Romania