Crystaria Online is a classic text-based browser MMORPG built with PHP, MySQL, and vanilla JS from the whole AJAX-fillDiv lineage. Last week a player told me the thing every solo dev dreads to hear: "the game's super laggy for me," and a single Network-tab screenshot turned that into a concrete fix.
The symptom that didn't add up
The server looked fine. Database queries were indexed and fast, and the tables were tiny at 31 accounts. So I asked the player to open DevTools Network and just walk around.
The screenshot was the whole case. Chatroom refreshing on a 4-second poll was normally 55ms but spiking to 15.6s. Movement was normally about 250ms but spiking to 8.8s. Maintenance Mode checks, a trivial JSON endpoint that doesn't even open a session, were taking 11 seconds, alongside a storm of "(cancelled)" requests.
When a session-less endpoint takes 11 seconds, it's not slow code. It's queueing. Something was eating the worker pool, and requests that should be instant were sitting in line.
The culprit: PHP's session lock
PHP's default session handler takes an exclusive file lock from session_start() until the script ends, or until you call session_write_close(). Only one request per session runs at a time. That's invisible until you have a game that does a heavy render per action and polls in the background.
move.php held the lock through its entire render: load character, update position, then include the big tile and encounter renderer. That meant about 300900ms with the lock held. Walk quickly and each step queues behind the last.
The 4-second chat poll had to grab the same lock, so a poll that landed mid-sprint waited behind a dozen queued moves. Seventeen moves × 800ms becomes a 15-second chat message. While those requests sat blocked, they occupied workers, so even the lock-free maintenance ping starved.
Two fixes, because there were two bugs
The client-side poll loop
The client was making it worse. The chat poll fired a blind setTimeout(readChatroom, 4000) on a shared XHR object. When the server was slow, the next timer fired before the previous poll returned, and calling .open() on a busy XHR aborts it, which caused the "(cancelled)" storm.
So the client kept hammering a server it was already overwhelming. The fix was a self-chaining poll on its own dedicated XHR, scheduling the next request only after the current one finished. The abort storm vanished and chat dropped from 15s back to about 60ms.
The session-lock fix
The real cure was releasing the session lock before the heavy render with session_write_close() in move.php. That was simple in theory, except the renderer leaned on $_SESSION throttle flags for once-per-session schema checks, per-session spawn timers, and a world-boss spawn cooldown.
Close the session before those and they silently fail to persist, so they re-fire on every single move. The fix was moving those throttles to a tiny file-backed store. As a bonus they became global, with one world-boss check per minute for the whole realm instead of once-per-session-per-player.
Now move.php does its authoritative writes, releases the lock, and renders lock-free. Sprinting moves stay sub-400ms and stop blocking the chat poll.
The twist: fixing lag broke loot
A day later I got: "item drops are flagging invalid loot." It wasn't a new bug. It was a very old one that had simply never had the chance to fire.
Loot was looked up by its per-kill slot index, 1, 2, 3, with no owner filter. When two players had un-taken drops at the same moment, one player's Take matched the other player's row, failed the ownership check, and threw "Invalid loot."
It had been latent for as long as the code existed, but the game had never run smoothly enough for two people to be actively fighting at the same time. The performance fix raised concurrency, and concurrency surfaced the concurrency bug. The fix was one line: scope the lookup to the player.
Performance work changes your concurrency profile, and that's where the dormant multiplayer bugs live.
Lessons and release notes
- Close the session early. Use session_write_close() as soon as authoritative writes are done. For a polling game, it's not an optimization.
- Chain polling requests. Never fire a blind polling timer on a shared connection. Schedule the next request from the previous one's completion.
- Move throttles out of $_SESSION. A file, APCu, or DB throttle that is global is usually more correct if you want to close the session early.
- Re-test multiplayer edges after performance fixes. That's where the bugs that were always there but never had a witness show up.
The same release also shipped fully rebindable keybinds for movement and action keys, a mobile fix where combat dropdowns were opening off-screen, and a World Boss re-tune so the realm's titans actually land their hits.
Crystaria runs smooth under load now, which mostly means more of you are online at once.
Comments