Friday, September 4, 2015

We Are at War

Greetings,

It has been a year since I ventured into the modding scene and I am very glad to see the game is coming together pretty nicely, but there is one big issue that has been lingering since the introduction of Reborn client: the late game spike caused by memory leak.

I have been getting a lot of complaints about this, and I feel you all. Being a 9-year veteran, I know how crucial it is to create a lag-free environment where you can actually perceive the approaching 3k nuke of death and react appropriately. With this post, I would like to clear up some misunderstandings and introduce a possible solution.


Why is it so difficult to fix?

1 . Main client and Tools client are not identical

Not everything works in main game even if it works fine on developer platform. The most recent example would be the hero selection bug on 9/1; I was shocked when I got a report saying all Servants were unpickable on main client just as I was testing Rider(4th) on tools client. There had been easily more than 100 instances of inconsistency like this over the year and there are still few, including the cosmetic API

2 .The beta traps are everywhere
Since the introduction of Dota 2 Reborn, a tinker main had insane lag as the game progresses. He tried reinstalling Dota 2, did clean install of OS, and just about anything a Dota 2-addict could possibly imagine. Nothing worked out.

Later, he found out a particle attached to Tinker immortal was causing memory leak.

If you think that's funny, here's another. Some custom games, especially the ones that involve around terrain modification, were suffering from lag. Those games were not even close to being resource-intense and had never had such problem back in Source 1 days.

It was fixed as of July 23, because Valve found out that a function that gets all trees around certain point was causing memory leak.

There are more, but just be reminded that we are still in alpha from custom game developer's perspective. Source 2 library packs some really nasty APIs that can make your game suffer as whole for the simplest task.


Proposed Solution

With all that said, you may ask: Alright Dun, tough luck, but does that mean we should just sit and watch until Valve perfects their engine?

To be honest, this could may as well be sloppy job done on my end, and that's the worst part of issue; I do not know whose fault it is. Given that I have not had any prior experience with code/particle optimization, it is very well possible that Valve is innocent and I am just being silly with excuses.Knowing Valve is unlikely to spoonfeed me with information, it is about time I took a more serious action.

As mentioned above, we do not know what is causing memory leak yet, so the first approach should be the method of deduction, with more focused investigation to follow. Over the next few days I will be hosting a test game with an exclusion rule - no item, no ability, etc -. Once the house rule where no spike occurs is found, we will be further enforcing the exclusion inside that rule to find out the culprit.

I would very appreciate it if everyone reading this post would be co-operative and join the test lobby when you see one.





Here are some food for thoughts :

1. On tools client, I ran a script that commands all heroes to use their abilities whenever the cooldown allows for 11 hours one day, and had no issue with lag. This probably indicates the memory leak is only happening on main client and involves networking.

2. It was investigated that higher-skilled games are more likely to get the spike at earlier stage of game.

3. A seemingly random crash is likely caused by host's memory not being able to keep up with the memleak