Platform Security Overhaul • Filip Zobic

Project Date2026-01-16

Tech Stack

KafkaRedisOAuth2/OIDCHibernateRBAC

#The Context

When our organization moved to a unified login system across our product suite, I assumed ownership of the security transition for our backend api. This involved a complete overhaul of our authentication layer, moving from standard password flow to an OTP-based flow. This migration became a pivotal moment for the product's security posture; during the refactor, I identified and remediated several dormant exploits, hardening the application before connecting it to the central identity service.

Notice

The code examples provided herein are entirely synthetic and created solely for this case study. They do not represent or contain any proprietary source code from the actual project.

#OTP Migration & Security Hardening

The transition from a legacy password flow to a centralized OTP mechanism required a careful balance between user experience, security, and infrastructure health.

Migrating authentication methods on a live product—especially one dependent on mobile clients—presented a synchronization challenge. Due to the slow adoption rate of mobile application updates, a "hard switch" was impossible without forcing an update.

I architected a multi-stage deployment process to handle this transition gracefully. This allowed the backend to support legacy clients during the update window while enforcing the new security standards for updated clients, ensuring zero downtime for users.

Reputation at risk

A major risk introduced by OTP is the potential for bot-driven abuse, which can destroy an organization's email sender reputation and lead to blacklisting.

Rate-limiting

To mitigate this, I implemented a sophisticated rate-limiting component. This system was designed to detect and block abusive request patterns without impacting genuine users. The solution proved robust enough that I subsequently expanded its scope, applying the same protection logic to secure other transactional email flows across the product.

Adhering to strict OpSec principles, a key requirement was to prevent attackers from mapping our user base.

I redesigned the API responses to prevent account enumeration. By standardizing feedback for both valid and invalid attempts, the system ensures that an attacker cannot distinguish between a registered user and a non-existent one based on error messages or response timing.

#Vulnerability Management & Detection

#The CI/CD False Negative

During a major infrastructure upgrade—updating Java, Spring Boot, and the Docker base images—I encountered a critical discrepancy in the security pipeline. When I ran the standard compliance check, the result was unexpectedly perfect:

trivy image --server [https://trivy.zobic.io](https://trivy.zobic.io) backend

Result: 0 Vulnerabilities (Clean)

I was immediately suspicious. While the goal of the upgrade was to keep everything up to date, a completely "clean" scan on a complex stack is statistically improbable. Trusting my instinct that the report was a false negative, I bypassed the internal server cache to run a direct, standalone scan:

trivy image backend

Result: Critical Vulnerabilities Detected

The standalone scan lit up "like a Christmas tree." The contrast confirmed that the centralized scanner instance was misconfigured. More critically, it meant these vulnerabilities were currently present in production undetected, as previous builds had been passing based on false data. I immediately escalated the infrastructure failure to the Platform Team to restore observability. Simultaneously, I took ownership of the immediate remediation, patching the problematic libraries to ensure the new deployment was genuinely secure.

#Frontend Supply Chain Visibility

I also identified a blind spot in the frontend security posture. Because the frontend application is compiled into static assets and served via a lightweight web server, standard container scanners were failing to detect vulnerabilities in the JavaScript dependency tree.

When the frontend app is built for production, the files are bundled and minified. The package.json and node_modules—which scanners rely on to identify versions—are often stripped from the final runtime image. This meant that vulnerabilities in client-side dependencies (e.g., XSS vectors in third-party libraries) were invisible to the scanner.

To bridge this gap, I implemented a Software Bill of Materials (SBOM) workflow:

I added a CI/CD step to generate a CycloneDX artifact during the build process. This creates a formal inventory of all dependencies before they are bundled.
This artifact is stored in the registry alongside the container image.
Trivy uses this SBOM to scan the static frontend application with the same rigor as the backend services.

#SSO Overhaul & Identity Architecture

During the integration of the unified login system, I uncovered a critical flaw in our legacy "Custom SSO" implementation regarding Identity Assurance.

The legacy model implicitly trusted the Custom SSO provider with the same weight as global OIDC providers (like Google or Apple). Specifically, it allowed for automatic email verification. In the new unified ecosystem, this created a vector for Account Takeover (ATO): an attacker could register via the lower-security Custom SSO to claim an email address, automatically bypassing verification to gain access to a pre-existing target account.

The Fix:

Trust hierarchy

I categorized Custom SSO as "Untrusted" compared to Google or Apple. This forces an extra verification step, preventing custom logins from compromising established accounts.

We faced a data integrity challenge where users would initiate registration but fail to fulfill all requirements (e.g., missing profile data), leading to "hanging" or "zombie" records in the database.

The Strategy: Token-Based State Container To eliminate the complexity of managing "partial" user states in the database, I architected a Stateless Registration Flow:

JWT registration

Replaced premature database writes with short-lived JWTs to stage user data. This enforces strict atomic persistence, ensuring only fully valid accounts enter the database.

#Context-Blind Caching

While Hibernate isolated our database, the caching layer bypassed these checks. Standard cache keys (e.g., resource::101) ignored the user's context, allowing Tenant B to retrieve data previously cached by Tenant A.

Composite cache key

I implemented a custom Cache Key Generator that overrides the default logic. It now composites the tenantId from the security context directly into the cache key, ensuring strict data segregation in the cache.

#Secondary Win

As an added win, I fortified the checkout against automated "Card Testing." This preserved our processor standing and prevented dispute costs while maintaining a smooth experience for legitimate customers.

#Summary & Reflection

I’ve always had an instinct for Security and Architecture, so taking ownership of this migration felt like a natural fit. Playing a pivotal role in protecting user data gave me a sense of purpose and clarified exactly where I want to apply my engineering skills.

Velocity without vulnerability?

Security shouldn't be a bottleneck, and speed shouldn't come at the cost of safety. I help engineering teams embed security into their foundation so they can ship new features with total confidence.

More Selected Work

Architecture

83% Less Memory & 200% Faster API Performance

Refactored the core application to be stateless and immutable. Eliminated a global N+1 query bottleneck by optimizing the RequestContext, resulting in a 2x performance boost across the entire platform.

View Case Study

Backend Engineering

The Carnival Launch & The Omicron Spike

Orchestrated the Carnival & SeaCare integration against an immovable deadline, all while our platform processed record-breaking holiday airline traffic. Zero downtime despite a 3x load increase.

View Case Study