Staredit Network > Forums > SC1 UMS Mapmaking Assistance > Topic: Strings in CHK format
Strings in CHK format
Mar 17 2019, 1:03 pm
By: sethmachine  

Mar 17 2019, 1:03 pm sethmachine Post #1



Hi,

There are numerous "String number" fields in the CHK format, e.g. for "TRG ":

Quote
u32: String number for trigger text (0 means no string)

For some reason in "MRGN" it's a u16. I don't see how the index could be a different size between different sections.
Quote
u16: String number of the name of this location

Is this number the index in the "STR " string offsets table or is it the offset itself? If the former, why is it one indexed?

In either case, I see a problem when removing strings from "STR ":

1. Initialize "STR " to its default values (e.g. "Anywhere", "Untitled Scenario", etc.)
2. Add locations with string names, e.g. "Location Foobar" , "Location Foof". Update "STR " with these new strings by updated the num strings, string offsets and appending to the strings table. Replace the string references in the locations with the newly allocated string numbers in "STR " for these locations.
3. Add interpreted "TRG " data that has strings in place of string numbers. Do the same as above for "MRGN" (also check for strings that already exist).
4. Remove unused strings from the "STR " section. Update num strings. Update offsets but I have to shift every offset left or right by the length of the removed string otherwise they won't point to the correct start of that string.

But, since string number is an index, I need to potentially update EVERY STRING NUMBER in EVERY SECTION" since the indices could have shifted by 1!

Removing strings seems like a really "expensive" or at least "complex" operation that needs to potentially modify every other CHK section that uses strings. Is this correct? Or is there a much simpler way to remove strings?



None.

Mar 17 2019, 2:06 pm jjf28 Post #2

Cartography Artisan

Quote
it's a u16. I don't see how the index could be a different size between different sections.

That's correct, the index is different sizes in different sections, which is just poor planning by blizzard DEVs, but because of the limitations in the STR section no index higher than 8192 can feasibly be used in a non-experimental context.

Quote
Removing strings seems like a really "expensive" or at least "complex" operation that needs to potentially modify every other CHK section that uses strings. Is this correct?

It is expensive depending on the process you follow and the level of abstraction you give to strings. If you operate on the section in raw form then something like you described or something a little different is indeed required. As far as Staredit/SCMDraft/Chkdraft go there's a few rules of thumb...

- Users can't effectively use the string section directly
- Strings only exist if they are used somewhere in the map, any other data present in the STR section is eligible for immediate overwrite or deletion
- When a string is removed or replaced in any section like MRGN it triggers an immediate usage check, if the string is unused everywhere it can be removed from the string data, else it stays
- A string being completely removed leaves fragmented stringIndexes, we don't update the other sections (TRIG, MRGN, etc.) to shift the indexes by 1 or anything like that, the offset for the stringIndex removed is set to zero, the string character data/offsets for the rest of the strings get rebuilt/de-fragmented

Post has been edited 1 time(s), last time on Mar 17 2019, 2:11 pm by jjf28.



Rs_yes-im4real - Clan Aura - jjf28.net84.net

Reached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.

Mar 17 2019, 4:12 pm sethmachine Post #3



Quote
That's correct, the index is different sizes in different sections, which is just poor planning by blizzard DEVs, but because of the limitations in the STR section no index higher than 8192 can feasibly be used in a non-experimental context.

Where does it say the limit to strings is 8192? The spec says any number can used:

Quote
This section can contain more or less then 1024 string offsests and will work in Starcraft

It uses u16 as the sizes, so I'd expect the maximum strings to be around 65536 (2^16).

Quote
A string being completely removed leaves fragmented stringIndexes

So, to confirm, the string number is the index as in the string offsets table but starts at 1? For example, if my first string is "Untitled Scenario", in my TRG struct I would refer to "String ID 1" instead of String ID 0?

Quote
A string being completely removed leaves fragmented stringIndexes, we don't update the other sections (TRIG, MRGN, etc.) to shift the indexes by 1 or anything like that, the offset for the stringIndex removed is set to zero, the string character data/offsets for the rest of the strings get rebuilt/de-fragmented

I guess the fragmentation isn't a problem unless the map really does use many unique strings in which case you will run out of space and be forced to defragment the string indices.

So I think this is incorrect behavior. I can imagine a scenario/edge case where a user will get this problem and have no idea why their map isn't working or getting empty strings. I think a complete editor would defragment the string indices, or at least if the limit of 8192 is going to be reached.

This behavior should be tested in one of your test cases too.

Also, does Blizzard's campaign editor not do this? Like if I stop using a string, does it stay in the "STR " and it's still fragmented?



None.

Mar 17 2019, 4:31 pm jjf28 Post #4

Cartography Artisan

Quote
Where does it say the limit to strings is 8192? The spec says any number can used:

http://www.staredit.net/352316/
http://www.staredit.net/352320/

I do math in these two posts, I guess it was 16384 for (listed assumptions).

Quote
So, to confirm, the string number is the index as in the string offsets table but starts at 1? For example, if my first string is "Untitled Scenario", in my TRG struct I would refer to "String ID 1" instead of String ID 0?

The 0th index string is always the "No String" string, which is always a NUL character in the "STR " section (similar to how the Anywhere location is always at index 63) - think of strings as zero based, but effectively the first string you can use is the string at index 1 (maybe you could alter the string at index zero and use it in some places, but I would expect that to cause bugs in editors, confuse the mapper in sections like SPRP where string 0 has special behavior, and otherwise result in undocumented behavior in StarCraft).

Quote
So I think this is incorrect behavior. I can imagine a scenario/edge case where a user will get this problem and have no idea why their map isn't working or getting empty strings. I think a complete editor would defragment the string indices, or at least if the limit of 8192 is going to be reached.

I see it too, that's why I have a compress method that actually does de-fragment the indexes https://github.com/jjf28/Chkdraft/blob/master/Chkdraft/src/MappingCore/Scenario.cpp#L2253 but of course that goes through and makes changes to every section, and as such is very expensive, and should only be tried after an out of space error is thrown by the regular string methods.

The behavior of the regular string methods is just to pickup one of the unused indices and use it.

Post has been edited 3 time(s), last time on Mar 17 2019, 6:20 pm by jjf28.



Rs_yes-im4real - Clan Aura - jjf28.net84.net

Reached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.

Mar 17 2019, 5:35 pm sethmachine Post #5



Quote
The 0th index string is always the "No String" string, which is always a NUL character in the "STR " section (similar to how the Anywhere location is always at index 63) - think of strings as zero based, but effectively the first string you can use is the string at index 1 (maybe you could alter the string at index zero and use it in some places, but I would expect that to cause bugs in editors, confuse the mapper in sections like SPRP where string 0 has special behavior, and otherwise result in undocumented behavior in StarCraft).

Hmm. I must be parsing the STR section wrong, because the 0th offset does not point to a null character but the start of the very first string, "Untitled Scenario".

Here is my model for how strings are looked up. Can you explain how the 0th string resolves to a null character with my model?

1. Load table of offsets from STR section.
2. See a String ID of "X".
3. Get offset of X, i.e. offset = offsets[X]
4. Go to offset in the strings data, i.e. STR[X] or the Xth byte in the whole STR section.
5. Read characters until a null terminator is reached.
6. Return the string from the bytes read.

In my STR, the first entry in offsets is 4050 (i.e. offsets[0]). In the binary data of STR (not including headers), the first character at 4050 is "U." I read all the chars and get "Untitled Scenario". I don't have a null terminator followed by another null terminator or some shenanigans like that.

Here is my actual implementation for reference: https://github.com/sethmachine/chkjson/blob/master/src/chkjson/section/chkstr.py



None.

Mar 17 2019, 6:18 pm jjf28 Post #6

Cartography Artisan

Oh I'm wrong... character data by default starts with a NUL character, but the first offset doesn't point to it, any unused strings in the "STR " section point to it.

I generated a map in StarEdit a moment ago, pulled out the scenario file, and checked in HDX:
u16 numStrings = 0x0400
u16* offset[0] = 0x0803
u16* offset[8+] = 0x0802
u8* sectionData[0x802] = 0x00
u8* sectionData[0x803+] = "Untitled Scenario"

String ID of "1" (as in SPRP first two bytes, for scenario title) corresponds to the zeroth offset (which points us to "Untitled Scenario").

Post has been edited 3 time(s), last time on Mar 17 2019, 6:25 pm by jjf28.



Rs_yes-im4real - Clan Aura - jjf28.net84.net

Reached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.

Mar 17 2019, 9:04 pm Suicidal Insanity Post #7

I see you !

Basically, any string ID that references the map string reference should be treated as: if 0 -> don't access the table; else: access table index stringID - 1.


I'm just lazy and when I load the map string table into my memory format I insert a dummy empty string 0, and move all the over indices.


Normally you wouldn't delete strings first, then add new ones, then usually you don't have much fragmentation because adding strings first uses unused indices near the start of the table.




Options
  Back to forum
Please log in to reply to this topic or to report it.
Members in this topic: None.
[11:50 am]
UEDCommander -- WARNING: Unit Unplaceable! (u) (69, 69)
[09:26 am]
lil-Inferno -- :wob:
[08:25 am]
Black_Overseer -- razorback9423
razorback9423 shouted: WARNING: Unit Unplaceable! (Aldaris) (1656,2996)
You have no power here, Gandalf the Grey.
[05:43 am]
razorback9423 -- WARNING: Unit Unplaceable! (Aldaris) (1656,2996)
[05:27 am]
razorback9423 -- stop
[05:27 am]
NudeRaider -- :wob: you can't :wob: us
[05:08 am]
razorback9423 -- Are you kidding me? I thought wobbing ended yesterday
[05:07 am]
jjf28 -- ................................................... :wob:
[05:07 am]
razorback9423 -- ...................................................
[05:06 am]
jjf28 -- ..... :wob:
Please log in to shout.


Members Online: Roy, ojumanadu, 2madisonc693hL4, razorback9423, UEDCommander, Voyager7456