Staredit Network > Forums > SC1 Mapping Tools > Topic: Extended string table format discussion
Extended string table format discussion
May 15 2019, 8:19 pm
By: Suicidal Insanity  

May 15 2019, 8:19 pm Suicidal Insanity Post #1

I see you !

This is a suggestion for an extended string table format, in order to get around the 65k limit. Blizzard may implement a common suggestion in the future.

Goals are simplicity of changes and backwards compatibility.

To that aim I propose the following format

- Table chunk name: STRx, instead of STR. If the STRx chunk is present it has priority over the STR chunk.
- Requires the 1.22 map format IDs.
- All strings are encoded using UTF-8 (For consistency with SC:R).
- String offsets changed to uint32 from uint16.
- String count changed to uint32 (necessitates 32bit string IDs within SC - this I am not sure about. If 32bit string IDs are not possible, we limit the valid IDs to 16bit)

Other side effects of 32bit string IDs: Strings that are referenced within sections which store 16bit string indices must be stored in the first 65k strings. (Pointed out by jjf28). We can deal with this by reserving a certain string range for these strings in case we are in danger of overflowing into 32bit indices. (E.g. the last 2K indices are skipped when adding strings unless it is a unit name string). String recycling also needs to keep this in mind.

EUD triggers that are added via EUD compilers should be added to the end of the STRx section instead of the STR section - the rest should function as normal.

Before support is added to SC editors can write both STR and STRx, and keep editor only strings in STRx while leaving the corresponding indices empty in STR to save space.

I want to keep this simpler than the KSTR section just to minimize changes required when updating existing tools and especially the game, in order to hopefully get this earlier. If string parameters are required they can be stored in a second section, or between the end of string NULL byte and the next string. (Although the latter may lead to encoding/decoding errors)

Thoughts?



-------------------------------------------- UPDATE 2019.08.07 -----------------------------

Current thinking is this:

Code
u32 numEntries; // Number of strings in the section (Default: 1024, in-game indices are limited to 65535, stored as uint32 for alignment and indices above 65k may be used by the editor)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to. Should store string data as UTF8 encoded, game will likely attempt korean autodetection regardless but by forcing UTF8 we don't need to emulate that in the tools.


This chunk requires 'VER ' to be set to one of the remastered formats

Post has been edited 3 time(s), last time on Aug 7 2019, 9:38 pm by Suicidal Insanity.




May 15 2019, 9:47 pm jjf28 Post #2

Cartography Artisan

1.) This should have a data spec:
"STR " (current official STR section)


"STRx" (32bit stringIds, 32bit stringOffsets)


"STRx" (16bit stringIds, 32bit stringOffsets)


2.) For quick reference, here's every CHK section and their usage of strings...
CHK STR Usage


Of particular interest, while many of these would easily support 32-bit stringIds, MRGN, SPRP, FORC, UNIS, and UNIx are all 16-bit stringIds.


3.) The main design gap I feel we need to address is whether we use 16-bit or 32-bit stringIds...

Quote
We can deal with this by reserving a certain string range for these strings in case we are in danger of overflowing into 32bit ids. (E.g. the last 2K ids are skipped when adding strings unless it is a unit name string). String recycling also needs to keep this in mind
MRGN: Up to 255 strings
SPRP: Up to 2 strings
FORC: Up to 4 strings
UNIS: Up to 228 strings
UNIx: Up to 228 strings
Total: 717 strings

I would prefer we make 32-bit stringIds usuable, and I like your suggestion, but I want to make it very specific and say stringId 63488-65535 (inclusive, 2048 total) are always reserved for strings from the 16-bit stringId sections, and disallow any string that can use a 32bit id from being used here.

4.) The KSTR Section was designed on the assumption that no changes to StarCraft were being made and that we just wanted to enhance developers ability to document things in the map, not change the amount of loadable strings in the game; if something like this STRx section gets built into StarCraft I would deprecate the KSTR section and put out a one-time migration tool (maybe separate from Chkdraft).

5.)
Quote
- String offsets changed to uint32 from uint16.
For clarity you should write this "- String offsets changed from uint16 to uint32."

Post has been edited 2 time(s), last time on May 15 2019, 10:13 pm by jjf28.



TheNitesWhoSay - Clan Aura - github

Reached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.

May 15 2019, 10:25 pm Suicidal Insanity Post #3

I see you !

Thanks for the cleaner text.

Quote from jjf28
3.) The main design gap I feel we need to address is whether we use 16-bit or 32-bit stringIds...
I agree. If blizzard uses 16bit IDs internally in the function calls, and they don't want to risk changed, we will be limited to 16 bit. Preferably I would go with 32bit just for future proofing.

Quote from jjf28
Quote
We can deal with this by reserving a certain string range for these strings in case we are in danger of overflowing into 32bit ids. (E.g. the last 2K ids are skipped when adding strings unless it is a unit name string). String recycling also needs to keep this in mind
MRGN: Up to 255 strings
SPRP: Up to 2 strings
FORC: Up to 4 strings
UNIS: Up to 228 strings
UNIx: Up to 228 strings
Total: 717 strings

I would prefer we make 32-bit stringIds usuable, and I like your suggestion, but I want to make it very specific and say stringId 63488-65535 (inclusive, 2048 total) are always reserved for strings from the 16-bit stringId sections, and disallow any string that can use a 32bit id from being used here.
I meant it as hard reservation - but I feel we can limit it to 1024 strings, and don't need the full 2k. Thats an implementation detail on our editor side though.

Quote from jjf28
4.) The KSTR Section was designed on the assumption that no changes to StarCraft were being made and that we just wanted to enhance developers ability to document things in the map, not change the amount of loadable strings in the game; if something like this STRx section gets built into StarCraft I would deprecate the KSTR section and put out a one-time migration tool (maybe separate from Chkdraft).

What I was trying to get at is we could store string metadata in a table before or after the string data, and blizzard doesn't need to know about the format. Which is actually what KSTR does, if the offsets are also relative to the start of the chunk.

EG:

Code
u32 numEntries; // Number of strings in the section (Default: 1024)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
u8[arbitrary data] // Store metadata here
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to


OR:

Code
u32 numEntries; // Number of strings in the section (Default: 1024)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to
u8[arbitrary data] // Store metadata here


OR:

Code
u32 numEntries; // Number of strings in the section (Default: 1024)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void *stringData[0]
u8[arbitrary data] // Store metadata here
void *stringData[1]
u8[arbitrary data] // Store metadata here
etc





May 22 2019, 8:55 am Suicidal Insanity Post #4

I see you !

I think we should go with 32bit string count, but limit it to 16bit indices until we are sure that larger indices work - that way we have locked in the binary format but can change the meaning later on.




Jul 1 2019, 10:23 am T-warp Post #5

Unlimited N-word pass winner

We could pass 65k limit by extending offsets to u32 and leaving indexes u16. Having "only" 65k strings is not the issue (who would use more than 65k strings anyway). Addressing them is the issue.

"STRx" (16bit stringIds, 32bit stringOffsets)


This way it wouldn't require any changes outside the STR(x) section as the string IDs would remain u16, which I think is more reasonable.




Jul 1 2019, 10:38 am Suicidal Insanity Post #6

I see you !

So my second option - that is the one I think will be most likely to be acceptable to blizzard in terms of changes required.




Jul 1 2019, 11:19 am T-warp Post #7

Unlimited N-word pass winner

So my second option - that is the one I think will be most likely to be acceptable to blizzard in terms of changes required.
You still need to keep the original STR section in memory for some EUD maps to work. If you want metadata, you should use unreferenced string at the end of the STR section (which would most likely be removed by map compressing/protection tools). Altough if starcraft does use the STR section as source for all outputs, there will be a tool for real time translating that section by modifying offsets with more reason to utilize that unreferenced space than the editor. Make yourself another STR section for your metadata (it should be ignored by starcraft).




Jul 1 2019, 1:05 pm Suicidal Insanity Post #8

I see you !

I don't think staredit still needs to load the STR section. Either the external EUD compiler does not know about STRx - then it wouldn't work with maps that store data in STRx anyways. Or it does know about STRx and uses that to payload it's hidden triggers.

So - STRx is an optional section in SC:R format maps, but if it is present it overwrites the STR section in game memory.




Jul 2 2019, 5:47 pm T-warp Post #9

Unlimited N-word pass winner

If it's in your power to propose new features, propose integer arithmetics. Like sets of 2 addresses in memory accessible through EUD for each operation. Writing to those addresses would perform those actions. That could speed up useful things in maps of all kinds (workarounds exist but are way too complicated).




Jul 2 2019, 6:52 pm Suicidal Insanity Post #10

I see you !

That's third on my list of suggestions - but obviously no promises. Usefulness wise it should be second but its got the lowest chance of implementation.

Post has been edited 1 time(s), last time on Jul 2 2019, 7:29 pm by Suicidal Insanity.




Jul 2 2019, 7:32 pm T-warp Post #11

Unlimited N-word pass winner

That's third on my list of suggestions - but obviously no promises. Usefulness wise it should be second but its got the lowest chance of implementation.
Lowest chances and yet the easiest to implement. Just out of curiosity, what's the first?




Aug 7 2019, 8:19 pm Suicidal Insanity Post #12

I see you !

Current thinking is this:

Code
u32 numEntries; // Number of strings in the section (Default: 1024, in-game indices are limited to 65535, stored as uint32 for alignment and indices above 65k may be used by the editor)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to. Should store string data as UTF8 encoded, game will likely attempt korean autodetection regardless but by forcing UTF8 we don't need to emulate that in the tools.


This chunk requires 'VER ' to be set to one of the remastered formats

Post has been edited 2 time(s), last time on Aug 7 2019, 9:38 pm by Suicidal Insanity.




Aug 7 2019, 8:26 pm jjf28 Post #13

Cartography Artisan



We could still on the editor side use 32bit id's as necessary for comments and such. Maybe make that consideration clear that 65536+ are not game valid; so the game doesn't validate against that count.



TheNitesWhoSay - Clan Aura - github

Reached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.

Aug 7 2019, 8:42 pm Suicidal Insanity Post #14

I see you !

Indices above 65k would just be truncated by the game (even if used in a 32bit storage field), not be forbidden.




Sep 24 2019, 10:39 pm jjf28 Post #15

Cartography Artisan

As of todays update this should be the current STR section format if VER is set to a remastered version.

Code
u32 numEntries; // Number of strings in the section (Default: 1024, in-game indices are limited to 65535, stored as uint32 for alignment and indices above 65k may be used by the editor)
u32[numEntries] stringOffsets; // 1 integer for each string specifying the offset (the spot where the string starts in the section from the start of it).
void * stringData; // All strings in the map, one after another, each NUL terminated, by default starts with one NUL character which all unused stringOffsets point to. Should store string data as UTF8 encoded, game will likely attempt korean autodetection regardless but by forcing UTF8 we don't need to emulate that in the tools.




TheNitesWhoSay - Clan Aura - github

Reached the top of StarCraft theory crafting 2:12 AM CST, August 2nd, 2014.

Sep 24 2019, 11:04 pm Suicidal Insanity Post #16

I see you !

Not quite - in order to avoid breaking all existing maps, the way it works is it searches for 'STRx', and if found loads that data with the new format. Otherwise it loads 'STR ' with the old format. EUD trigger payloads should continue to work fine, they just need to be placed behind the 'STRx' data.




Sep 25 2019, 2:06 am Arta(M) Post #17

Armoha

Not quite - in order to avoid breaking all existing maps, the way it works is it searches for 'STRx', and if found loads that data with the new format. Otherwise it loads 'STR ' with the old format. EUD trigger payloads should continue to work fine, they just need to be placed behind the 'STRx' data.

I'm thinking of updating eudplib/euddraft to support both EUD trigger payloads and new 'STRx' feature. So if 'VER ' is SC:R map and scenario.chk has 'STRx' section, SC only loads 'STRx' and does not load 'STR' even though both exists?



maintainer of euddraft and eudplib.
Armo#6637 at Discord :teehee:

Sep 25 2019, 10:09 am Suicidal Insanity Post #18

I see you !

Exactly.




Options
  Back to forum
Please log in to reply to this topic or to report it.
Members in this topic: None.
[05:05 pm]
Vrael -- Its simple, just send all minerals to Vrael until you have 0 minerals then your account is gone
[04:31 pm]
Zoan -- where's the option to delete my account
[04:30 pm]
Zoan -- goodbye forever
[04:30 pm]
Zoan -- it's over, I've misclicked my top right magic box spot
[2024-4-14. : 9:21 pm]
O)FaRTy1billion[MM] -- there are some real members mixed in those latter pages, but the *vast* majority are spam accounts
[2024-4-14. : 9:21 pm]
O)FaRTy1billion[MM] -- there are almost 3k pages
[2024-4-14. : 9:21 pm]
O)FaRTy1billion[MM] -- the real members stop around page 250
[2024-4-14. : 9:20 pm]
O)FaRTy1billion[MM] -- look at the members list
[2024-4-12. : 12:52 pm]
Oh_Man -- da real donwano
da real donwano shouted: This is the first time I've seen spam bots like this on SEN. But then again, for the last 15 years I haven't been very active.
it's pretty common
[2024-4-11. : 9:53 pm]
da real donwano -- This is the first time I've seen spam bots like this on SEN. But then again, for the last 15 years I haven't been very active.
Please log in to shout.


Members Online: Roy, otoruslddh, jjf28