kion Posted September 28, 2019 Share Posted September 28, 2019 Getting started with parsing models from psp2i. Right now dealing with individual models, so the next step will be to take the file.fpb file as an input and locate the NMLL archives. Quote Link to comment Share on other sites More sharing options...
kion Posted September 29, 2019 Author Share Posted September 29, 2019 Okay, so some random thoughts/updates. I tried to take the file.fpb file as an argument in the page, and the browser made it part of the way through and then promptly crashed. I don't think Javascript is suited for loading in a 829MB file, and then searching through the file for offsets, and then creating a new buffer for each offset. I might need to try something like asm.js so I can be more strict about how memory is handled (potentially)? Or I might need to use a script that will pre-process file.fpb to get it down to more manageable pieces. For the nbl archives, I have a bunch of questions for how to handle them. There are two kinds of nbl archives, with with 0x10 at offset 5 and one with 0x00 at offset 5. We're able to unpack the nbl archives with 0x00 at offset 0x05. I'm hoping that maybe Agrajag or JamRules can provide more information on this format. Though we can at least read the filenames from the NMLL header for both of these types, so right now I can write a function that takes an nbl archive as an input, reads the file header, checks to see if the archive contains a .unj model file, and discards the nbl file if none are found, otherwise we attempt to decrypt and decompress the NMLL and TMLL body and store the archive. The next thing I'm thinking about is grouping and how reliable the filenames are going to be. At first I was hoping that the developers created unique names for each asset, and that when names were re-used it was because the same asset was being cloned for a different scene. Though after seeing the number of duplicated filenames, it looks like they probably used the same name for different assets, but i'll have to do an md5 check to see if these are the same files or not. If the md5 sums match up, that means I can go ahead and extract everything out the the raw files, otherwise it might indicate that I'll have to group by nbl archive. I think what I should be focusing on right now is to continue to work with nbl archive 8894 which contains the drops. I can set up my function to take the nbl file as an argument, extract and display the filenames, and then parse the textures and display the model on click. Quote Link to comment Share on other sites More sharing options...
Selvaria Posted September 29, 2019 Share Posted September 29, 2019 awesome Quote Link to comment Share on other sites More sharing options...
kion Posted September 30, 2019 Author Share Posted September 30, 2019 (edited) Small update got blowfish ported to vanilla Javascript. So it might not be an exciting picture to see some filenames in a console. But it's pretty exciting to see something like this working with vanilla Javascript as there are a few quirks to work with these files in the browser. Since the browser can't access the file system directly, you need to have the user select the file and then read it with the file reader API to get the raw data as an array buffer. For blowfish specifically, it uses the properties of the uint32_t type which doesn't work the same as Javascript's built in variables. So to get around this you can declare a Uint32Array, which will then cause members of the array to act like a native programming language. And last the DataView API has been added, which allows for reading and writing different datatypes to an ArrayBuffer. And the result is being able to parse files directly in the browser. Right now I just have the header, next step will be to decompress and save the files. Edit: PRS is working too. Time to store the files to be used. Edited September 30, 2019 by kion small update Quote Link to comment Share on other sites More sharing options...
kion Posted September 30, 2019 Author Share Posted September 30, 2019 Okay, so now files are unpacked from the nbl archive. And now the question is what the best way to store and sort them is. I added a few properties to try and make it easier to find things like extension, basename, added a shared unique id for files in an nbl archive, and added an md5. So what I think i will do is set filename + md5 as the primary key. And then to find the models i can search for .unj on the extension. And since I'm only interested in models, what I might do is filter out files from the nmll section that don't match .unj, .unt, .unv, or .unm. Quote Link to comment Share on other sites More sharing options...
kion Posted October 1, 2019 Author Share Posted October 1, 2019 Okay, so managing the file.fpb as an input might be an over-reach, but it looks like I can now access nbl files for uploading. Which means I can either figure out a better way to implement fpb without crashing, or take a list of nbl files as input, which would require running a script to per-separate them, but would otherwise work. For the viewer side of thing, to keep the list of models from being too long, i split the models into groups by the first two letters of the model name. So far it seems to work for the limited test cases I've been working with so far. We'll have to see if the approach holds up. Right now the model parsing side is still pretty limited. I guess I'll start with the simple drop models and work from there. Though before working on models I think I'll go ahead and implement textures to get those out of the way first. Because from what I've seen so far, the texture implementation is surprisingly straight forward. Quote Link to comment Share on other sites More sharing options...
kion Posted October 2, 2019 Author Share Posted October 2, 2019 Okay, so now that we can import assets and store them to be referenced later, the next step is to start working on parsing the files. So the way I have it now is that unj files are split into categories and then listed on the left. When a unj file is clicked, we can then load it and parse it. So I did some testing with nodejs as far as converting textures to pngs, and it looks like the texture format is pretty straight forward. So before we get into the models, we can take a second to make sure the textures are working, and that should help us later as far as identifying which assets are what for the models, which can help with debugging. So to be able to know which texture needs to be read for what model, we need to reference the texture list, which will tell us which textures need to be linked with what materials. And thankfully they re-used the NJTL ninja texture list format from the Dreamcast days. Possibility with a slight modification, but looking at the file, it's pretty easy to trace through. So first the magic number NUTL, followed by the length. Followed by the pointer to the header which is on 0x24 in this case. Then the header is the number of textures, followed by a pointer to the first texture struct, which is on 0x10. 0x10 and 0x14 are both zero. And this is because these values are use for attributes after the texture has been loaded into RAM, but not before. And that just leaves a pointer to the start of the texture which is 0x2c. So this is only for a .unt file with one texture. So to get the exact format of the header, we'll have to find a texture list that references more than one texture, but for now we have something that works. We can get the name of the texture, load it, parse it, and then have it ready to be paired with the model geometry when that gets loaded. Quote Link to comment Share on other sites More sharing options...
kion Posted October 4, 2019 Author Share Posted October 4, 2019 Small update. Textures have been added. Right now I'm throwing the textures somewhere (right now outlined in red), so check that the texture has been parsed correctly. I should probably add another window to the UI that can be used for viewing textures that have been loaded in for a model. To save on time, I might make a white box and stick it under the 3d viewport, and then move on to setting the textures to the map diffuse uv. Quote Link to comment Share on other sites More sharing options...
kion Posted October 4, 2019 Author Share Posted October 4, 2019 We're back to something that resembles the first post with a few improvements. And the main improvements are that we're not unpacking and then stores but, but we can take archives and then read the files directly from there. Textures have also been implemented, and I fixed the vertex wind order for the faces. Next step is to test the meshes that are included in the basic archive and then start scale up from there. Quote Link to comment Share on other sites More sharing options...
kion Posted October 5, 2019 Author Share Posted October 5, 2019 Oh look a different model. Now that we are able to read archives and textures, the next step is to track down different vertex lists to track down which flags do what. The game seems to group each vertex is in the order of diffuse map uv, color, and position. And then they also seem to use 32 bit float and 16 bit ints to define values. So the flag defines what types of attributes are in the vertex list and what format they are in. So basically we find a model, debug the vertex list, find another model and debug again. So to start out we have the calorie mate item, which I figured would be a good place to start with the 16 bit values. We have 16 bit uv values and 16 vertices. I don't think I have the exact scale for the vertices and uv values, but I'm surprised with how close I got by plugging in likely values. Quote Link to comment Share on other sites More sharing options...
kion Posted October 6, 2019 Author Share Posted October 6, 2019 Grabbing more files from the common models archive. Time to start looking into other archives for test cases. Two notes: nmll-09283.nbl is broken, and nmll-08904.nbl might be a good test candidate for the next set of vertex flag testing. Quote Link to comment Share on other sites More sharing options...
kion Posted November 9, 2019 Author Share Posted November 9, 2019 (edited) I realized what I was missing in life: tables. So I'm not very knowledgeable about PSU file formats as I haven't had a chance to dig into it at all. But I was able to ask someone who knows a lot more than me: Argajag about the file formats. It looks like the types of files used boils down to a couple of key differences. For models PSU uses the .xja format and PSP uses the .unj format. I think Agrajag also mention that the model animations, texture animations, bone names and texture names have different extensions for PSU and PSP but the file formats are the same for all of them. And then come the differences in archives. All of the titles seem to use the .nbl archive format. Which I think stands for "new-bml" if anyone is familiar with the PSO .bml archive type. It seems to have a similar roll and structure and NMLL section for the model-related files and a TMLL section for the texture files. The texture files seem to have some variation with .xvr on PSU, .gim on PSP1 and .uvr on PSP2(i). And then for the file system PSU uses the computer's folder's directly, so all of the game's files are in the /DATA folder. For the PSP titles, all of the games have 'file.fpb' file, which is around 800MB and has all of the assets in the game stored as a list of .nbl archives. As for the differences between the PSP titles, the basic files such as the models and animations seem to have remained the same. The main difference on the raw files is between the .gim files and .uvr files for the textures. And it looks like PSP2(i) contains a few .gim files that were likely re-used from PSP1, but aside from that the vast majority are .uvr format. Otherwise the main difference between PSP1 and PSP2(i) is obfuscation. The fpb and nbl file types where changed in an attempt to make them more difficult to access. So in terms of the portable tiles, the list of file types seems to break down into .unj, .unm, gim, .uvr, .una, .unt, .nbl and .fpb. For fpb files, it's kind of frustrating that I don't have a better method other than the "slice" method of looking for the NMLL header and writing out the space in-between for each write. Having some more information for how the game manages the header and how it stores the file names would definitely make navigating the file system on the portable titles a lot easier. And right now I'm running into an issue on PSP1 where my extractor is running into an error on PSP1, so I'll have to track down the reason for that to see what's going on. After that I can get back to focusing on the model format and taking more notes. Edit: Okay the reason for the nbl file wasn't working was an empty nbl file that had a header but no content. So that means I can get back to looking at the basic file types. As a side note beign able to get the filenames from fpb files and type 2 nbl files would make working with the portable series a lot easier. Edited November 10, 2019 by kion fix things Quote Link to comment Share on other sites More sharing options...
kion Posted November 13, 2019 Author Share Posted November 13, 2019 Right now I don't have too much to write about in terms of the FPB tarball since I'm simply using the "look for magic numbers and slice" method. It seems like there should be a cleaner approach that involves reading the file header and then finding the offsets in the file. I've been taking a second look at NBL and I think I can write some notes about that. The figure above shows the outline of the structure of an NBL archive. For NBL I'm guessing the extension stands for "New BML" as it has a similar function and approach to PSO's BML archive format, but in implementation a lot has been changed. In terms of function it acts the same grouping a list of models, and files associated with the models such as animations, and then has the textures for those models all included in a single grouped compressed file. In terms of structure the way it does this is somewhat different. NBL groups information with two main sections. First is the NMLL section which contains any kind of information that isn't textures. So the content is anything from models to animations, texture lists, and animation. The second section is the TMLL section which contains the textures. Each section has it's own respective header and body. Which are essentially the same format. The NMLL header has 0x30 bytes of meta data at the top of the header followed by the file list, while the TMLL section has 0x20 bytes followed by the follow list. And the body is the same for both sections, can be compressed, and can be encrypted, and in that order. So if the body is uncompressed then decompress (with PRS for type 0) and then decrypt is encrypted (with blowfish). Once decompressed and encrypted the files are stacked end to end and the respective offsets and lengths are provided in the respective headers. The main difference between the two sections is that texture data is likely copied directly into the framebuffer. So the archive only needs to unpack the files and then copy the information directly into the framebuffer. The NMLL section of the archive gets copied as a block directly into memory, and the archive is implemented so that all pointers in the NMLL files are zero relative to the start of the body. And then following the body is a list of pointer offsets inside the body, with the reasoning being that it allows for all of the pointers to be easily adjusted to the location copied into memory. In terms of approach for extracting the files from an NBL the approach is a little messy. First read the meta data in the header to try and find the number of files in each respective section, the length of each header in bytes, the compressed size of each body and then the decompressed size of each body. And then get the key and initialize blowfish on the condition the files are encrypted. Then it seems like the next step is to try and seek through the file to try and find where each segment is and slice it into it's own section of information. The reason for using slicing is kind of two-fold. The main reason is that it's unclear if there is a pattern for the amount of padding between each group of information. So after the NMLL header there is zero padding before the body, and then the start of the body is the first non-zero bytes following. And that seems to hold true for all of the segments, after the NMLL body is an unknown amount of zero padding followed by the pointer list. And then the TMLL section is optional and doesn't exist in a lot of cases. And then when it does exist it's position needs to be found by looking for the TMLL magic number. And which is then followed by more padding and then the TMLL body. So the most difficultly for handling this function generally comes from handling the different conditions and seeking through to find the different segments. Once each segment has been isolated, you decrypt and read the header. Then decompress and decrypt the body, and then in the case of NMLL adjust the pointers, and then pull out the files. Seems like something that could be addressed by adding a few offset pointers in the headers, but at best I can only speculate at the potential reasons behind this. The best reason that I can think of is that the archive is intended to be more of a streaming format, where the archive is intended to start at the beginning and seek through the file copying information into memory as it goes along. Though having a few more offsets declared in the header would make this format much easier to work with. Quote Link to comment Share on other sites More sharing options...
kion Posted November 13, 2019 Author Share Posted November 13, 2019 Continuing on with NBL archives. I've been trying to think of ways to simplify the code and get around the lack of offsets declared in the header. But I guess if they're actually needed or not is a little debatable, since after once segment has been read, the next section is found by scanning instead of seeking. I think it's really the combination of having to scan paired with not knowing the exact format of what's contained in the archive. Maybe writing out an approach will help me simplify th code that I have. So to start out, it seems like a good idea to read 0x00 as a quick sanity check to check for the NMLL magic and return if it's not found. Then we're probably going to want to read 0x04 for the compression format and then 0x1c to get the seed for blowfish. And that way we'll have those parameters for working with the rest of the archive. Next we're going to need 0x08 to get the byte length of the NMLL header and then 0x0c for the number of files in the NMLL section. With that we can decrypt the NMLL header to get the filenames, offsets and length of all of the files contained. Then we can start from the end of the NMLL header and seek until the first non-zero byte. This will be the NMLL body, and if we can't find this section, then we return. If there is a body then we decompress and decrypt if needed. Then we continue to seek to the first non-zero byte after the body. This will be the pointer table. So once we have the pointer table and the raw NMLL body, then we can go ahead and unpack those files and add them to the files to be returned. Once we've hit the end of the end of the pointer table then we seek until we find the TMLL header and return if it doesn't exist. If it does exist then we read 0x20 to get the header size and then 0x2c to get the number of textures in the section. We then decrypt the header and get the filenames and offsets. Then we seek to the next non-zero byte after the header, get the body, decompress. And then unpack and add the textures to the files to be returned. And then we return the unpacked files from the archive. Quote Link to comment Share on other sites More sharing options...
kion Posted November 19, 2019 Author Share Posted November 19, 2019 Spent some time refactoring my Nbl extractor, and while I can say that I'm now more comfortable with the format, I can't say for sure that I've covered all of the use cases. In terms of approach it seems easier to almost treat NMLL and TMLL as their own separate files. Each one has its own header, file list and body. And then assuming that texture names are unique, and that TMLL is only ever compressed and not encrypted, then it seems to be safe to treat them separately assuming the focus is only to unpack without the need to repack. For NMLL read the header, get the compression format, blowfish key. Read the filelist, read the pointer list. Read the body, decrypt the body, decompress the body. Read the files. Fix the pointer orientation in the files. For NMLL, one issue that I'm running into the is the decrypt method decrypts 8 bytes at a time. I'm not sure if that means that I need to decrypt up to the last full eight bytes, add padding to the nearest eight bytes, or I should simply stop decryption when I run out of bytes at the end of the file. Right now I'm adding padding to the nearest eight bytes, and that seems to work (maybe?). So that seems like it could be something that needs trial and error. And then for TMLL as far as I've seen, this segment seems to only use compression, and then doesn't need for the pointers to be adjusted. But the header list can be encrypted, so I guess it needs the key listed in the NMLL header, which would mean that these two segments do need to be paired together to be able to extract the contents. So i guess that mainly means that I need to do some more testing with the NMLL body to find out what the best approach is for decryption and decompression is. And then look at the resulting files to make sure they look correct. On a side note, I still have my ugly non-refactored code that seemed like it worked for the use-cases I threw at it. So I might go back to my ugly code on the condition that it works and that I've at least found an approach to make it more readable. I'll try more tests on the for NMLL body, and then try the original code for contrast. From there I'll have to start looking into how the files are organized. Quote Link to comment Share on other sites More sharing options...
kion Posted December 1, 2019 Author Share Posted December 1, 2019 I've been spending too much time messing with the tarball and archive formats. I've gotten completely trolled on trying to import them to work with. The FPB file is so big that trying to load it into memory and then trying to extract all of the NBL files generally leads to memory leaks and the program freezing partway through. And there are so many NBL files that trying to mass import them seems to fail as well. So any file management is going to need to happen outside of the tool. So I think I'll get back to working with command line tools. First step is FPB to NBL slicer. Then I can attempt to write a tool that works with both NBL formats. And then once we have that we can start looking at the individual files. Quote Link to comment Share on other sites More sharing options...
kion Posted December 7, 2019 Author Share Posted December 7, 2019 For NBL I've started to come close to testing more cases. For PSP1, it looks like there are 18,821 archives in the file.fpb tarball. Of that number it looks like 18,810 archives are being successfully exported. I'm comparing the extracted files against the expected full size defined in the archive header. So if that number matches it should generally indicate that the files were extracted correctly. Which leaves 9 archives where the full size of the files doesn't match the expected full size listed in the header, and then i have 2 files where some issue is happening when the pointers are updated to be relative to the file and not relative to the archive. So I need to look through these edge cases, try and access what the issue, and then see if those changes work with the rest of the cases that currently work. Quote Link to comment Share on other sites More sharing options...
kion Posted December 10, 2019 Author Share Posted December 10, 2019 (edited) Managed to narrow down the test cases that weren't working in the PSP1 file.fpb tarball. The 9 archives that weren't being decompressed was because I was miscalculating the wrong start of the NMLL payload. The nbl archive puts segments in whole offsets of 0x800 bytes. When a segment ends at a clean offset of 0x800, the next segment will start immediately after it. So the code for seeking to the start of the next data ends up being something like: if(ofs % 0x800) { ofs += 0x800; } ofs = 0xfffff800 & ofs And this seemed to work for all of the PSP1 archives. For the 2 files that had wrong pointer offsets, it looks like those were parameter files that pointed to locations in other files in inside the same archive (and not relative to the same file). Since I'm not too worried about parameter files, I think those can be skipped without issue. Next step will be to look into the TMLL sections, and then see what can be done about the compression format that was added in psp2i. Files: https://pastebin.com/5KbKYZaG nbl.jshttps://pastebin.com/s3fEGPeE fakefish.js https://pastebin.com/sNZcYt8S arkive.js Edited December 10, 2019 by kion Quote Link to comment Share on other sites More sharing options...
kion Posted December 14, 2019 Author Share Posted December 14, 2019 (edited) So I guess we'll (finally) getting into breaking down the .UNJ file format, which seems to be the "Universe Ninja Format". Which is used in Phantasy Star Portable for the PSP. As far as I an tell the format doesn't seem to have any direct relationship to the .NJ format from the Dreamcast and PC versions of Phantasy Star Online. The .NJ format is a node-tree format where each bone is declared as a transformation matrix relative to the origin, and vertices that are weighted to that bone are declared along side the bone. And for NJ the models are likely drawn with a CPU-bound algorithm of pushing each transformation matrix to a stack, drawing the faces, and then popping the matrices once the node has run out of child bones. After the Xbox, the industry seems to have moved to working with the result of the push-pop matrix algorithm to using a system where the mesh of the characters are stored in a T-pose and bones are assigned and weighted to the mesh. And the bones and animations are stored as the result of the pre-multiplied equations of the push-pop stack, where the bones were relative to the origin. The general effect of this is that it both reduces the number of calculations that need to be done per fame (as everything is already pre-calculated for the bone positions), and it allows for mesh deformations to be calculated on the GPU side. So rather than having a 64 bone model where you need to recursively iterate over the file and multiply the bones as you go along, you have a flat vertex list and a list of bones (that could potentially still be in a tree), which a list of assigned bones, and materials. And then each material will likely be a single draw call. So in the case of Phantasy Star Portable where most of the model's I've seen so far have about 1-4 materials, means probably 1-4 draw calls on the GPU side per-model. Rambling side, let's get started with the tracing. At the top of the file we have the Magic Number and file length. So the magic number for .UNJ files is NUOB followed by the length of the data minus 8 bytes for the magic number and length. So the full file size is 0x5f70 bytes and the length is declared as 0x5f68 bytes. Following that we have the offset to the file header, and then we seem to have the constant number of 03 following that, which I'm not sure what that is for. Technically the Magic number, length and then the pointer to header and three are technically two different structs, but I'll go ahead and declare them as one mainly because it reduces the number of colors I need to use for mapping out the file. struct unj_magic_t { uint32_t magic_number; uint32_t data_length; uint32_t header_ofs; uint32_t constant_three; // always 3? } Edited December 14, 2019 by kion Quote Link to comment Share on other sites More sharing options...
kion Posted December 16, 2019 Author Share Posted December 16, 2019 (edited) The pointer to the header is declared at byte, 0x08 and it points to the end of the file, and then the actual data is sandwiched in-between. The figure below shows this relation. So a quick note about 3d model files. What 3d models boil down to is a list of triangles. Triangles can be declared from a vertex list and indices (indexed-buffer), or directly from a list of vertices (buffer). That will give us our mesh. To describe what color each triangle will be, we have materials which can describe the diffuse, specular or emissive color for a list of faces, and we can also define textures to map the diffuse color from the texture to the face. And then to deform the mesh, we can have a list of bones and weights. So when we look at the header, we're basically looking at a table of contents that describes where there attributes are defined inside the file. Let's go ahead and post a picture of the actual data, and then define the struct. struct unj_header_t { float center_x; float center_y; float center_z; float bounding_radius; uint32_t material_count; uint32_t material_ofs; uint32_t vertex_group_count; uint32_t vertex_group_ofs; uint32_t index_group_count; uint32_t index_group_ofs; uint32_t bone_count; uint32_t bone_tree_depth; uint32_t bone_unknown; uint32_t main_group_count; uint32_t main_group_ofs; uint32_t main_group_unknown; } Edited December 22, 2019 by kion Quote Link to comment Share on other sites More sharing options...
Kryslin Posted December 18, 2019 Share Posted December 18, 2019 It's not that the stack of matrices, recursive algorithm isn't fast - it is exceedingly fast - it's just the ability to do all the math on the GPU was starting to hit about that time, and they could use simpler, less efficient algorithms to get the job done. I know the .NJ format took advantage of specific processor instructions on the Dreamcast's CPU that was basically a hardware matrix multiply in 4 instructions. With the ability, though, to do the math on the GPU, it meant that the exporters for the specific model formats are far less complicated, meaning the tool chain could be a lot less finicky. The fact that the PS2 hung in there as long as it did with PSU was testament to the skill of the software guys wringing every last bit of efficiency out of the CPU and coprocessor they could. Not bad for an aging 16 bit system with very limited RAM. Quote Link to comment Share on other sites More sharing options...
kion Posted December 20, 2019 Author Share Posted December 20, 2019 (edited) On 12/19/2019 at 7:47 AM, Kryslin said: It's not that the stack of matrices, recursive algorithm isn't fast - it is exceedingly fast - it's just the ability to do all the math on the GPU was starting to hit about that time, and they could use simpler, less efficient algorithms to get the job done. I know the .NJ format took advantage of specific processor instructions on the Dreamcast's CPU that was basically a hardware matrix multiply in 4 instructions. With the ability, though, to do the math on the GPU, it meant that the exporters for the specific model formats are far less complicated, meaning the tool chain could be a lot less finicky. The fact that the PS2 hung in there as long as it did with PSU was testament to the skill of the software guys wringing every last bit of efficiency out of the CPU and coprocessor they could. Not bad for an aging 16 bit system with very limited RAM. I guess matrix multiplication is a comparatively cheap operation. Though one thing that's always kind of confused me is how much effect the CPU draw-call has on how the developers managed the materials. Trying to export stages like the PSO forest was a nightmare because there are tons of materials that need to be rendered in a specific order, and the stage ends up taking around 300~something draw calls on the GPU. Which I guess transitions me into the materials for the unj format. Materials are generally I come back to after looking after the geometry. Since you kind of need to have something to apply the materials to for testing, but I might as well take notes of what I have available now to be able to refer to. So in the unj_header_t struct points to 0x0e30 for the materials with a count of two. So we can go head and fill that in with orange, and then make notes of pointers to 0x0c80 and 0x0d60 respectively and mark those out. So for the material description we have a struct that's 8 bytes. struct unj_matlist_t { uint8_t tex_count; uint8_t alt_tex_count; uint16_t nop; uint32_t material_ofs; } Since we're not defining the material directly, but describing the texture count before describing the material, we can go ahead and describe this structure as unj_matlist_t, which has 8 bytes. The first two bytes are for the number of textures and number of alternative textures. Then two bytes for padding. And then we have a four byte pointer to the actual material. And now we can to the material. So above we have the bytes for a single material isolated. In purple we have 8 unknown bytes, which is separated into two shorts and one int, but it's not known what they do. For blue, texture will have a mappingType which is one dword defined by the second byte. If the byte value is 1, then env mapping is used, otherwise uv mapping. Next we have the green section which describes the colors for each texture. What we have is six dwords for each texture. Each dword is three bytes that represent values between 0 and 1 (divided by 255), followed by an opcode that defines the property. #define GE_CMD_MATERIALEMISSIVE 0x54 #define GE_CMD_MATERIALAMBIENT 0x55 #define GE_CMD_MATERIALDIFFUSE 0x56 #define GE_CMD_MATERIALSPECULAR 0x57 #define GE_CMD_MATERIALALPHA 0x58 #define GE_CMD_MATERIALSPECULARCOEF 0x5B Following that we come to the orange section which defines the properties for texture flags. The method works the same as the materials with codes on the end, with the values being provided by the lower three bytes. This section will continue to define properties for the texture until the opcode 0x0b (stop) is encountered. #define GE_CMD_ALPHABLENDENABLE 0x21 //(disable lighting?) #define GE_CMD_ALPHATESTENABLE 0x22 #define GE_CMD_ZTESTENABLE 0x23 #define GE_CMD_STENCILTESTENABLE 0x24 #define GE_CMD_COLORTESTENABLE 0x27 #define GE_CMD_LOGICOPENABLE 0x28 #define GE_CMD_TEXSCALEU 0x48 #define GE_CMD_TEXSCALEV 0x49 #define GE_CMD_TEXOFFSETU 0x4A #define GE_CMD_TEXOFFSETV 0x4B #define GE_CMD_TEXWRAP 0xC7 #define GE_CMD_TEXLEVEL 0xC8 #define GE_CMD_TEXFUNC 0xC9 #define GE_CMD_TEXENVCOLOR 0xCA #define GE_CMD_ALPHATEST 0xDB #define GE_CMD_STENCILTEST 0xDC #define GE_CMD_STENCILOP 0xDD #define GE_CMD_ZTEST 0xDE #define GE_CMD_BLENDMODE 0xDF #define GE_CMD_BLENDFIXEDA 0xE0 #define GE_CMD_BLENDFIXEDB 0xE1 #define GE_CMD_DITH0 0xE2 #define GE_CMD_LOGICOP 0xE6 #define GE_CMD_ZWRITEDISABLE 0xE7 #define GE_CMD_MASKRGB 0xE8 #define GE_CMD_MASKALPHA 0xE9 And then we come to the light blue, which is the texture id. Which is a dword for each texture. Last thing to mention is that since this specific material only has one texture. So for seeking I'm not entirely sure how this works. If these were fixed size structs, then it would make sense for these to be defined together for each texture, but because these properties are not fixed length, I think each one of these segments is defined in the current order for each texture. What that means is that there is a dword mapping for each texture at the beginning. Followed by a 0x18 length definition for the colors for each texture. Following that is the texture flags for each texture. And once that is finished, then the texture id for each texture. I'm not sure what the best approach for seeking is. Seems like cleanest approach would be to reach each set of properties for each texture before moving on to the next one. Appendix for PSP opcodes: https://github.com/hrydgard/ppsspp/blob/master/GPU/ge_constants.hhttps://github.com/hrydgard/ppsspp/blob/master/GPU/GeDisasm.cpp Edited December 20, 2019 by kion Quote Link to comment Share on other sites More sharing options...
kion Posted December 21, 2019 Author Share Posted December 21, 2019 (edited) Going in order, the next set of properties from the header is the vertex groups. In this case we have 15 groups (0x0f) at offset 0x5bf4. We can go over to the vertex group listings, and take an image. For each group we have a struct of 8 bytes. Which comes down to two dwords. struct unj_vertex_group_t { uint32_t uv_count; uint32_t group_ofs; } We have the count of the uv, and then the offset to the actual group. To take a quick side rant about bad format design, this seems like a good example, because this pointer is entirely redundant. We have a pointer from the header, to a list of pointers, to a list of a series of fixed size structs. If the vertex group description wasn't a fixed struct, these pointers would make more sense. But the group descriptions themselves are fixed side pointers. So in general stupid, but overall not that bad. We trace out each one of the pointers from the group list, and we find that each vertex group has a fixed struct size of 0x30. So they could have put in the uv count, made the struct size 0x34 and not needed the pointer prior to this. Who knows, maybe the person who designed this format thought they were being clever by adding a needless pointer in the file. That being said, the unnecessary pointer doesn't annoy me as much in practice, more that it make it harder to assign descriptive names to the respective structs. struct unj_vertex_list_t { uint32_t unknown_1; uint32_t vertex_format; uint32_t unknown_2; uint8_t unknown_3; uint8_t vertex_length; uint16_t nop; uint32_t vertex_count_ofs; uint32_t vertex_list_ofs; uint32_t bone_binding_ofs; uint32_t bone_binding_count; uint32_t total_vertex_count; uint32_t unknown_4; uint32_t unknown_5; float vertex_scale; } Oh right, and another stupid thing to mention, is that rather having the vertex list count in the struct, they have a 4 byte pointer to a single 4 byte value for the length of the vertex list. And I'm not sure if that value is any different from the total_vertex_count. Edited December 21, 2019 by kion Quote Link to comment Share on other sites More sharing options...
kion Posted December 22, 2019 Author Share Posted December 22, 2019 I'll continue the vertex group definition in this post. At the bottom in orange is the list of vertex groups. The first outlined struct in the orange section is a single unj_vertex_list_t struct. So three things to mentions to the pointers use in this struct. The first is the red outlined dwords. The top red outlined dword is the flag that defines the vertex type. There are a ton of different vertex options for vertex color, and which 16 bit color type it uses, if vertex normals exist or not, the position format, and likely the uv format. So that's probably going to need it's own post or table or something the track the different options. The bottom red dword is the pointer to the vertex list data itself. Which comes up to right under the material definition, and takes up nearly the entire file for the various vertex lists. In blue we have the pointer to the vertex count, which is labeled with the blue 'a'. We can see the vertex count is 0x50, and then we see that the total vertex count included in the parent struct is also 0x50. So I'm really not sure why they did it this way, it's strange at best and also pretty stupid. And then in the green we have the number of bone influences on the vertex list. Which is labeled with the green 'b', and following that is the count for the number of influences. So for the first group we have 4 influences of 0x00, 0x01, 0x02, and 0x0e. Quote Link to comment Share on other sites More sharing options...
kion Posted December 22, 2019 Author Share Posted December 22, 2019 Going back to the header, the next properties we need to track down are the index groups. I'm guessing this is used in xnj (PSU PC) and not unj (PSP[1,2,2i] PSP), because it looks like it always points to a list of zero values. In the header we have 0xf index values at 0x5c6c. And then tracing out other pointers, we can find where this section ends. And it ends up all being zero's. And as far as I can tell it's always zeros in every file I've seen. This could be indexes for indexed buffer geometry, but I think unj used buffer geometry directly, so this is either unused or not needed in unj. The red values starting at 0x5c6c and ending at 0x5ce4 are the index values, but it looks more like the field where i grow my fucks. So I guess that means it's probably unused. Which means that in the next post we can start looking into bones. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.