An architectural digital twin running on Next.js 16 App Router. Features semantic multi-repository ingestion (portfolio.json), stateless RAG pipeline embedding queries, and React 19 async streaming. (In Development)
An event-driven, decoupled Retrieval-Augmented Generation (RAG) agent and streaming chat interface acting as an interactive professional clone of Ylya Martchenko. The system implements a dynamic Git Submodule pattern to expose the bot as a standalone module inside a core Next.js host while retaining an independent repository structure. Leveraging an optimized PostgreSQL Remote Procedure Call (RPC), the bot matches recruiting queries with sub-10ms database latency, utilizing vector cosine similarity calculations to deliver high-precision context grounding. It features a robust 5-tier model fallback hierarchy with timezone-aware cookie exclusions, resolving Gemini API rate limits with zero downtime. Additionally, it integrates a production-grade secure real-time analytics dashboard monitoring conversation telemetry, model efficiency, and geographic request metrics.
Under 150ms time-to-first-token token streaming; sub-10ms PostgreSQL vector retrieval execution speed; sub-5ms stateless cryptographic JWT signature verification.
60 FPS responsive animations utilizing client-side stream reading hooks, rate-limiting guards, and strict layout containment supporting dynamic HTML5 semantic structures (<header>, <main>, <section>). Exclusively compiled via the native React 19 compiler with zero legacy hook overhead.
$0/month. Operates entirely within Google AI Studio, Supabase, and Upstash free-tier thresholds, dropping system maintenance overhead to zero.
Architected a Decoupled RAG Architecture using Git Submodules to host the YlyaBot agent independently, separating presentation layers from ingestion codebases while keeping global workspace configurations clean.
Engineered a robust 5-Tier prioritized Model Fallback Hierarchy (gemini-flash-latest -> gemini-3.5-flash -> gemini-2.5-flash -> gemini-3.1-flash-lite -> gemma-4) to guarantee consistent high-uptime LLM performance.
Implemented a Time-Zone Aware Cookie Exclusion system expiring at exactly midnight Pacific Time (America/Los_Angeles) to cache quota-exhausted endpoints, preventing redundant and failing API calls.
Designed a high-performance vector matching function inside Supabase using PL/pgSQL, computing multi-dimensional cosine distance metrics natively on the database hardware layer.
Exhaustive situation-action-result breakdowns showcasing problem-solving and architectural execution.
Free-tier Gemini API endpoints frequently return 429 Rate Limit Exceeded or Quota Exhausted errors under concurrent recruiter evaluation sessions, causing the AI bot to crash and return blank streams.
Engineered a prioritized 5-tier fallback model hierarchy coupled with a timezone-aware cookie exclusion engine. The Next.js Server Action synchronously attempts to resolve the first stream chunk from active models. If a model fails due to a quota exhaustion, the server catches the error, writes a cookie marking that model as exhausted until exactly midnight Pacific Time (America/Los_Angeles), and seamlessly falls back to the next model in the priority list.
Low-level component relationships, system boundaries, and runtime flows.
The engine matches input queries against vectorized segments of profile.json and individual repository records. High-fidelity embedding is achieved via gemini-embedding-2 compressed natively to 768 dimensions, stored and indexed in Supabase. Recruiter chat queries are processed by a typesafe React Server Action that validates inputs with Zod, checks Upstash Redis rate limits, filters models against active quota exclusion cookies, and streams Gemini tokens securely to the client. Detailed telemetry records are stored atomically inside Supabase, while aggregate session distributions (daily count history, models, countries) are resolved via Upstash Redis pipelines, all surfaced dynamically inside our secure metrics workspace.
Injecting structural key data (like languages or contact details) directly into system contexts guarantees deterministic truth, while keeping semantic RAG strictly for project implementation logs yields the highest conversational coherence.
Built a mobile-first, high-fidelity chat dashboard utilizing the Vercel AI SDK, optimizing response times to under 150ms time-to-first-token using React Server Actions and modern streamable value client iterators.
Developed a Secure Telemetry & Real-Time Analytics Dashboard (/ylya-bot/metrics) protected by stateless cryptographic JWT authentication, combining live PostgreSQL database logs with high-performance rolling Redis aggregate metrics.
Completely resolved free-tier API quota exhaustions with zero user downtime, maintaining 100% agent uptime while keeping monthly maintenance costs at absolute zero ($0.00).
Traditional RAG pipelines require continuous polling servers or expensive always-on extraction endpoints to index portfolio data changes.
Engineered an event-driven ingestion script executed purely within serverless ephemeral GitHub Actions runners, triggered instantly by code pushes.
Completely eliminated runtime compute overhead and reduced system operational maintenance costs to zero.
Building real-time dashboards in modern React frameworks regularly triggers hydration mismatches when server-rendered timestamps and client-rendered layouts conflict, especially on high-frequency mobile viewport states.
Adopted pure utility-first layouts, isolated dynamic locks inside deferred CSS micro-animation states rather than synchronous React render frames, migrated client-side telemetry queries to server-rendered Next.js Async Components, and replaced legacy useMemo hooks with native React 19 Compiler instructions.
Fully resolved layout shift flashes and hydration anomalies, creating a rock-solid, fully responsive mobile analytics console that compiles with zero console warnings.