Hi,
There are numerous "String number" fields in the CHK format, e.g. for "TRG ":
u32: String number for trigger text (0 means no string)
For some reason in "MRGN" it's a u16. I don't see how the index could be a different size between different sections.
u16: String number of the name of this location
Is this number the index in the "STR " string offsets table or is it the offset itself? If the former, why is it one indexed?
In either case, I see a problem when removing strings from "STR ":
1. Initialize "STR " to its default values (e.g. "Anywhere", "Untitled Scenario", etc.)
2. Add locations with string names, e.g. "Location Foobar" , "Location Foof". Update "STR " with these new strings by updated the num strings, string offsets and appending to the strings table. Replace the string references in the locations with the newly allocated string numbers in "STR " for these locations.
3. Add interpreted "TRG " data that has strings in place of string numbers. Do the same as above for "MRGN" (also check for strings that already exist).
4. Remove unused strings from the "STR " section. Update num strings. Update offsets but I have to shift every offset left or right by the length of the removed string otherwise they won't point to the correct start of that string.
But, since string number is an index, I need to potentially update EVERY STRING NUMBER in EVERY SECTION" since the indices could have shifted by 1!
Removing strings seems like a really "expensive" or at least "complex" operation that needs to potentially modify every other CHK section that uses strings. Is this correct? Or is there a much simpler way to remove strings?
None.
it's a u16. I don't see how the index could be a different size between different sections.
That's correct, the index is different sizes in different sections, which is just poor planning by blizzard DEVs, but because of the limitations in the STR section no index higher than 8192 can feasibly be used in a non-experimental context.
Removing strings seems like a really "expensive" or at least "complex" operation that needs to potentially modify every other CHK section that uses strings. Is this correct?
It is expensive depending on the process you follow and the level of abstraction you give to strings. If you operate on the section in raw form then something like you described or something a little different is indeed required. As far as Staredit/SCMDraft/Chkdraft go there's a few rules of thumb...
- Users can't effectively use the string section directly
- Strings only exist if they are used somewhere in the map, any other data present in the STR section is eligible for immediate overwrite or deletion
- When a string is removed or replaced in any section like MRGN it triggers an immediate usage check, if the string is unused everywhere it can be removed from the string data, else it stays
- A string being completely removed leaves fragmented stringIndexes, we don't update the other sections (TRIG, MRGN, etc.) to shift the indexes by 1 or anything like that, the offset for the stringIndex removed is set to zero, the string character data/offsets for the rest of the strings get rebuilt/de-fragmented
Post has been edited 1 time(s), last time on Mar 17 2019, 2:11 pm by jjf28.
TheNitesWhoSay - Clan Aura -
githubReached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.
That's correct, the index is different sizes in different sections, which is just poor planning by blizzard DEVs, but because of the limitations in the STR section no index higher than 8192 can feasibly be used in a non-experimental context.
Where does it say the limit to strings is 8192? The spec says any number can used:
This section can contain more or less then 1024 string offsests and will work in Starcraft
It uses u16 as the sizes, so I'd expect the maximum strings to be around 65536 (2^16).
A string being completely removed leaves fragmented stringIndexes
So, to confirm, the string number is the index as in the string offsets table but starts at 1? For example, if my first string is "Untitled Scenario", in my TRG struct I would refer to "String ID 1" instead of String ID 0?
A string being completely removed leaves fragmented stringIndexes, we don't update the other sections (TRIG, MRGN, etc.) to shift the indexes by 1 or anything like that, the offset for the stringIndex removed is set to zero, the string character data/offsets for the rest of the strings get rebuilt/de-fragmented
I guess the fragmentation isn't a problem unless the map really does use many unique strings in which case you will run out of space and be forced to defragment the string indices.
So I think this is incorrect behavior. I can imagine a scenario/edge case where a user will get this problem and have no idea why their map isn't working or getting empty strings. I think a complete editor would defragment the string indices, or at least if the limit of 8192 is going to be reached.
This behavior should be tested in one of your test cases too.
Also, does Blizzard's campaign editor not do this? Like if I stop using a string, does it stay in the "STR " and it's still fragmented?
None.
Where does it say the limit to strings is 8192? The spec says any number can used:
http://www.staredit.net/352316/http://www.staredit.net/352320/I do math in these two posts, I guess it was 16384 for (listed assumptions).
So, to confirm, the string number is the index as in the string offsets table but starts at 1? For example, if my first string is "Untitled Scenario", in my TRG struct I would refer to "String ID 1" instead of String ID 0?
The 0th index string is always the "No String" string, which is always a NUL character in the "STR " section (similar to how the Anywhere location is always at index 63) - think of strings as zero based, but effectively the first string you can use is the string at index 1 (maybe you could alter the string at index zero and use it in some places, but I would expect that to cause bugs in editors, confuse the mapper in sections like SPRP where string 0 has special behavior, and otherwise result in undocumented behavior in StarCraft).So I think this is incorrect behavior. I can imagine a scenario/edge case where a user will get this problem and have no idea why their map isn't working or getting empty strings. I think a complete editor would defragment the string indices, or at least if the limit of 8192 is going to be reached.
I see it too, that's why I have a compress method that actually does de-fragment the indexes
https://github.com/jjf28/Chkdraft/blob/master/Chkdraft/src/MappingCore/Scenario.cpp#L2253 but of course that goes through and makes changes to every section, and as such is very expensive, and should only be tried after an out of space error is thrown by the regular string methods.
The behavior of the regular string methods is just to pickup one of the unused indices and use it.
Post has been edited 3 time(s), last time on Mar 17 2019, 6:20 pm by jjf28.
TheNitesWhoSay - Clan Aura -
githubReached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.
The 0th index string is always the "No String" string, which is always a NUL character in the "STR " section (similar to how the Anywhere location is always at index 63) - think of strings as zero based, but effectively the first string you can use is the string at index 1 (maybe you could alter the string at index zero and use it in some places, but I would expect that to cause bugs in editors, confuse the mapper in sections like SPRP where string 0 has special behavior, and otherwise result in undocumented behavior in StarCraft).
Hmm. I must be parsing the STR section wrong, because the 0th offset does not point to a null character but the start of the very first string, "Untitled Scenario".
Here is my model for how strings are looked up. Can you explain how the 0th string resolves to a null character with my model?
1. Load table of offsets from STR section.
2. See a String ID of "X".
3. Get offset of X, i.e. offset = offsets[X]
4. Go to offset in the strings data, i.e. STR[X] or the Xth byte in the whole STR section.
5. Read characters until a null terminator is reached.
6. Return the string from the bytes read.
In my STR, the first entry in offsets is 4050 (i.e. offsets[0]). In the binary data of STR (not including headers), the first character at 4050 is "U." I read all the chars and get "Untitled Scenario". I don't have a null terminator followed by another null terminator or some shenanigans like that.
Here is my actual implementation for reference:
https://github.com/sethmachine/chkjson/blob/master/src/chkjson/section/chkstr.py
None.
Oh I'm wrong...
character data by default starts with a NUL character, but the first offset doesn't point to it, any unused strings in the "STR " section point to it.
I generated a map in StarEdit a moment ago, pulled out the scenario file, and checked in HDX:
u16 numStrings = 0x0400
u16* offset[0] = 0x0803
u16* offset[8+] = 0x0802
u8* sectionData[0x802] = 0x00
u8* sectionData[0x803+] = "Untitled Scenario"
String ID of "1" (as in SPRP first two bytes, for scenario title) corresponds to the zeroth offset (which points us to "Untitled Scenario").
Post has been edited 3 time(s), last time on Mar 17 2019, 6:25 pm by jjf28.
TheNitesWhoSay - Clan Aura -
githubReached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.
Basically, any string ID that references the map string reference should be treated as: if 0 -> don't access the table; else: access table index stringID - 1.
I'm just lazy and when I load the map string table into my memory format I insert a dummy empty string 0, and move all the over indices.
Normally you wouldn't delete strings first, then add new ones, then usually you don't have much fragmentation because adding strings first uses unused indices near the start of the table.