The original intention of developing Unraid Deck was to create the ultimate native Unraid management experience on mobile. But among all the features, the one that caused me the most agony and required repeated rewrites was the seemingly inconspicuous “Push Notification” service.
Balancing user experience and privacy, this small feature went through three different versions and cost me days of hard work. Every refactoring was a compromise and reinvention of the underlying technology. Today, I want to document this “torturous” mental journey, hoping this is finally the ultimate solution.
V1: The “Local Polling” Rhapsody Under Extreme Privacy (A Day of Wasted Effort)
In the initial design blueprint, I was an absolute “Serverless Fundamentalist”.
My core philosophy was Privacy First. I didn’t want any of the user’s sensitive data passing through any third-party relay servers.
To carry out this philosophy, my first version had no concept of a backend at all. I attempted to use iOS’s background task mechanism to let the app silently wake up periodically in a pure local network environment (or under VPN) to fetch Unraid’s local notifications and trigger mobile alerts.
I spent a whole day wrestling with iOS background execution code. When I finally finished the first version and tested it late at night, the reality was terribly disappointing: The results were completely uncontrollable! Due to Apple’s extremely strict power management mechanism, the timing of background process wake-ups is a mystery—ranging from a few minutes to several hours, or it might just get killed entirely without receiving anything. Imagine if one of the disks in your Unraid array failed, or a Docker container crashed out of memory (OOM), and the app “randomly” notified you several hours later. The damage would already have been done.
This failure made me realize profoundly: If I want a truly usable notification experience that meets business-level real-time standards (second-level), a stable relay server must be introduced. A purely local solution was a wasted effort.
V2: Leveraging the Cloud, Enter Cloudflare KV (A 2-Day Advancement)
After compromising, I started looking for a relay solution that could guarantee high availability while maximizing privacy protection. Finally, I settled on Cloudflare Worker. It’s lightweight enough and deployed on global edge nodes. My idea was: the App reports the Device Token issued by Apple along with the NAS machine it wants to monitor to the Worker, saving it in the Cloudflare KV (Key-Value) database. As soon as the NAS server reports an error, it triggers a Webhook to the Worker, and the Worker finds the matching phone based on the records and sends an Apple Push.
It took me two full days to complete this KV-based relay system. The test results were stunning—the millisecond-level response opened up the pathway from Unraid to the Apple phone, finally maxing out the real-time performance.
However, at this seemingly “mission accomplished” stage, as I anticipated more complex usage scenarios, the underlying genetic defects of the KV database began to expose themselves.
The Nightmare of KV: Eventual Consistency and the Disaster of Complex Relationships
- Concurrent Multi-Device Vulnerability: When a user frantically switches and generates Token registrations in a very short time, due to KV’s global node synchronization delay of up to 60 seconds (eventual consistency), the Worker often reads old data. This causes the deprecated Tokens generated in between to become “immune” to the cleanup program’s scanning, gradually turning into “orphan garbage data” taking up massive space.
- Multi-Device Unbinding Disputes: When an old phone is retired (app uninstalled), the specific server parameters bound to that machine need to be precisely deleted. Under the deeply nested, flat, and simple Key-Value JSON, every update requires a painful process of “full pull -> loop comparison -> exclusion -> overwrite write”.
I suddenly realized that with KV, it had become very difficult to elegantly handle high-dimensional scenarios in the future.
V3: The Final Form Driven by Scenarios, D1 Relational Database’s Dimensional Strike
What really made me decide to rebuild from scratch for the third time were several highly important advanced scenarios that aligned more closely with user intuition.
Consideration 1: “One Voice, Many Listeners” — The Mass Mailing Scenario of a Shared Webhook
I envisioned: What if I have two devices (a primary iPhone and a bedside iPad), and they are both bound to the same main NAS (Tower)?
- In the logic of V2 (KV version): Each phone has to randomly generate its own dedicated Webhook URL and insert it into this NAS. This would result in the NAS being stuffed with different Webhooks like a hedgehog, having to push out two or three different links simultaneously for a single alert, which is extremely unreasonable.
- The truly reasonable logic should be: This NAS only needs to and should only be equipped with a single, unique Cloudflare Webhook Token! All phones (no matter how many come later) share and subscribe to this one Token. The NAS only needs to shout outwards once, and the Worker backend is responsible for finding all the people behind this Token and conducting concurrent mass mailing!
Consideration 2: “A Thousand Faces” — Localized Server Naming
If everyone shares one Token to receive messages, what should the server name in the notification popup be?
The iPhone user might name the NAS Home Server, while the iPad might name it Test Server. If the Worker server tries to look up the name for mass mailing, it would not only have terrible performance but also easily cause conflicts.
The Ultimate Solution: The server becomes completely “amnesiac”, recording no names at all, and can only find a cold Server ID (flashGuid) to push to Apple. Utilizing the iOS Service Extension feature, before the phone screen lights up, it intercepts the ID in the push payload, turns to the phone’s local dictionary to look up their respective custom names, and overwrites the display. Absolutely beautiful!
Consideration 3: “Decluttering” — The Exit Rule of Reference Counting
If it’s a shared Token, the iPhone can click “Unbind” at any time. How can we unbind without affecting the iPad that is still monitoring this NAS? If we write logic in KV, “Check if anyone else is occupying it, and only truly delete the Token if no one else is occupying it at the very end,” this requires writing massive and highly error-prone lookup and debounce code.
Consideration 4: “Clear Accounts” — Precise Dashboard and Single Machine Statistics
In the KV era, trying to accurately count active users and push volumes was very difficult. The flat data structure forced us to rely on fuzzy matching and forced accumulation, which was extremely prone to miscalculation.
The Ultimate Solution: We introduced a standard 3-Table Normalization design. Besides the devices table for recording phones and the subscriptions table for recording subscription mappings, we also separately abstracted the servers table. Every time a push is successful, it can not only increase the total dashboard count but also precisely record the push volume of a single server, easily achieving fine-grained data statistics.
Consideration 5: “Zero Zombie Data” — Cascading Deletes and Device Status Sync
As users change phones or uninstall the App, how to promptly clean up invalid subscription mappings became a difficult problem. Manually looping through cleanups in KV not only consumes computing power but is also prone to accidental deletions.
The Ultimate Solution: We directly utilized SQLite’s ON DELETE CASCADE feature. When a Worker initiates a push and receives Apple’s 410 Gone (indicating the App is uninstalled and lost contact), it decisively deletes that device from the devices table. The underlying D1 database will automatically clean up all subscription mappings associated with it, perfectly achieving garbage collection with zero code.
Based on the rigorous challenges of peeling back the layers above, I realized that the only thing that could save this grand architecture was a true relational database.
Thus, Solution 3 was born: A full migration to Cloudflare D1 (Serverless SQLite). In D1, all complex pain points were resolved:
- Want to mass mail? A simple
SELECT JOINpulls out all phone IDs bound to this Token, and send! - Want to clean up garbage associations? A simple
DELETE FROM subscriptions WHERE deck_id = Phone. Thanks to the natural advantages of relational data’s row locks and strong consistency, no matter which phone unbinds, it will only delete its own “row” of records. If no one in the family is monitoring this NAS anymore, and the rows of these people in the table are cleared, that Token will naturally die on the physical level. There is no need to write any complex garbage collection code to make judgments.
Conclusion: A Long Road Ahead
From the failure of V1’s local polling, to tasting the sweetness with V2 relying on KV, and finally compromising with complex business scenarios to establish the V3 core transfer mechanism rebuilt on D1. Just for an accompanying “relay mini-system,” it has gone through three major version iterations from front to back.
All this tossing and turning is not just to guard against external malicious pushes (we even considered how to safely issue Tokens from the cloud and support the feature of preventing brushing at any time in case of leakage), but more to balance the “seamless experience where users don’t need to configure” and the “extremely strict privacy design of backend infrastructure”.
This should probably be the final implementation plan.
At present, I have completely finished the development of the backend Workers and the Unraid native plugin, realizing a closed loop of fully automated assembly of Webhook URLs between the cloud and the Unraid system. To let everyone clearly know how their messages are sent and transferred, the code for both of these projects has been 100% open-sourced.
You can find their source codes respectively here:
- Cloudflare Worker Relay Service: unraid-push-worker
- Unraid Native Plugin: unraid-deck-agent
There are no shortcuts to improving experience, only repeatedly tearing down and starting over. I hope this messaging system, bearing countless compromises and hard work, can bring everyone a truly usable Unraid management experience.