Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


kion last won the day on December 10

kion had the most liked content!

Community Reputation

35 Guardian Cadet

About kion

  • Rank

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. So I guess we'll (finally) getting into breaking down the .UNJ file format, which seems to be the "Universe Ninja Format". Which is used in Phantasy Star Portable for the PSP. As far as I an tell the format doesn't seem to have any direct relationship to the .NJ format from the Dreamcast and PC versions of Phantasy Star Online. The .NJ format is a node-tree format where each bone is declared as a transformation matrix relative to the origin, and vertices that are weighted to that bone are declared along side the bone. And for NJ the models are likely drawn with a CPU-bound algorithm of pushing each transformation matrix to a stack, drawing the faces, and then popping the matrices once the node has run out of child bones. After the Xbox, the industry seems to have moved to working with the result of the push-pop matrix algorithm to using a system where the mesh of the characters are stored in a T-pose and bones are assigned and weighted to the mesh. And the bones and animations are stored as the result of the pre-multiplied equations of the push-pop stack, where the bones were relative to the origin. The general effect of this is that it both reduces the number of calculations that need to be done per fame (as everything is already pre-calculated for the bone positions), and it allows for mesh deformations to be calculated on the GPU side. So rather than having a 64 bone model where you need to recursively iterate over the file and multiply the bones as you go along, you have a flat vertex list and a list of bones (that could potentially still be in a tree), which a list of assigned bones, and materials. And then each material will likely be a single draw call. So in the case of Phantasy Star Portable where most of the model's I've seen so far have about 1-4 materials, means probably 1-4 draw calls on the GPU side per-model. Rambling side, let's get started with the tracing. At the top of the file we have the Magic Number and file length. So the magic number for .UNJ files is NUOB followed by the length of the data minus 8 bytes for the magic number and length. So the full file size is 0x5f70 bytes and the length is declared as 0x5f68 bytes. Following that we have the offset to the file header, and then we seem to have the constant number of 03 following that, which I'm not sure what that is for. Technically the Magic number, length and then the pointer to header and three are technically two different structs, but I'll go ahead and declare them as one mainly because it reduces the number of colors I need to use for mapping out the file. struct unj_magic_t { uint32_t magic_number; uint32_t data_length; uint32_t header_ofs; uint32_t constant_three; // always 3? }
  2. Managed to narrow down the test cases that weren't working in the PSP1 file.fpb tarball. The 9 archives that weren't being decompressed was because I was miscalculating the wrong start of the NMLL payload. The nbl archive puts segments in whole offsets of 0x800 bytes. When a segment ends at a clean offset of 0x800, the next segment will start immediately after it. So the code for seeking to the start of the next data ends up being something like: if(ofs % 0x800) { ofs += 0x800; } ofs = 0xfffff800 & ofs And this seemed to work for all of the PSP1 archives. For the 2 files that had wrong pointer offsets, it looks like those were parameter files that pointed to locations in other files in inside the same archive (and not relative to the same file). Since I'm not too worried about parameter files, I think those can be skipped without issue. Next step will be to look into the TMLL sections, and then see what can be done about the compression format that was added in psp2i. Files: https://pastebin.com/5KbKYZaG nbl.js https://pastebin.com/s3fEGPeE fakefish.js https://pastebin.com/sNZcYt8S arkive.js
  3. For NBL I've started to come close to testing more cases. For PSP1, it looks like there are 18,821 archives in the file.fpb tarball. Of that number it looks like 18,810 archives are being successfully exported. I'm comparing the extracted files against the expected full size defined in the archive header. So if that number matches it should generally indicate that the files were extracted correctly. Which leaves 9 archives where the full size of the files doesn't match the expected full size listed in the header, and then i have 2 files where some issue is happening when the pointers are updated to be relative to the file and not relative to the archive. So I need to look through these edge cases, try and access what the issue, and then see if those changes work with the rest of the cases that currently work.
  4. I've been spending too much time messing with the tarball and archive formats. I've gotten completely trolled on trying to import them to work with. The FPB file is so big that trying to load it into memory and then trying to extract all of the NBL files generally leads to memory leaks and the program freezing partway through. And there are so many NBL files that trying to mass import them seems to fail as well. So any file management is going to need to happen outside of the tool. So I think I'll get back to working with command line tools. First step is FPB to NBL slicer. Then I can attempt to write a tool that works with both NBL formats. And then once we have that we can start looking at the individual files.
  5. Spent some time refactoring my Nbl extractor, and while I can say that I'm now more comfortable with the format, I can't say for sure that I've covered all of the use cases. In terms of approach it seems easier to almost treat NMLL and TMLL as their own separate files. Each one has its own header, file list and body. And then assuming that texture names are unique, and that TMLL is only ever compressed and not encrypted, then it seems to be safe to treat them separately assuming the focus is only to unpack without the need to repack. For NMLL read the header, get the compression format, blowfish key. Read the filelist, read the pointer list. Read the body, decrypt the body, decompress the body. Read the files. Fix the pointer orientation in the files. For NMLL, one issue that I'm running into the is the decrypt method decrypts 8 bytes at a time. I'm not sure if that means that I need to decrypt up to the last full eight bytes, add padding to the nearest eight bytes, or I should simply stop decryption when I run out of bytes at the end of the file. Right now I'm adding padding to the nearest eight bytes, and that seems to work (maybe?). So that seems like it could be something that needs trial and error. And then for TMLL as far as I've seen, this segment seems to only use compression, and then doesn't need for the pointers to be adjusted. But the header list can be encrypted, so I guess it needs the key listed in the NMLL header, which would mean that these two segments do need to be paired together to be able to extract the contents. So i guess that mainly means that I need to do some more testing with the NMLL body to find out what the best approach is for decryption and decompression is. And then look at the resulting files to make sure they look correct. On a side note, I still have my ugly non-refactored code that seemed like it worked for the use-cases I threw at it. So I might go back to my ugly code on the condition that it works and that I've at least found an approach to make it more readable. I'll try more tests on the for NMLL body, and then try the original code for contrast. From there I'll have to start looking into how the files are organized.
  6. Continuing on with NBL archives. I've been trying to think of ways to simplify the code and get around the lack of offsets declared in the header. But I guess if they're actually needed or not is a little debatable, since after once segment has been read, the next section is found by scanning instead of seeking. I think it's really the combination of having to scan paired with not knowing the exact format of what's contained in the archive. Maybe writing out an approach will help me simplify th code that I have. So to start out, it seems like a good idea to read 0x00 as a quick sanity check to check for the NMLL magic and return if it's not found. Then we're probably going to want to read 0x04 for the compression format and then 0x1c to get the seed for blowfish. And that way we'll have those parameters for working with the rest of the archive. Next we're going to need 0x08 to get the byte length of the NMLL header and then 0x0c for the number of files in the NMLL section. With that we can decrypt the NMLL header to get the filenames, offsets and length of all of the files contained. Then we can start from the end of the NMLL header and seek until the first non-zero byte. This will be the NMLL body, and if we can't find this section, then we return. If there is a body then we decompress and decrypt if needed. Then we continue to seek to the first non-zero byte after the body. This will be the pointer table. So once we have the pointer table and the raw NMLL body, then we can go ahead and unpack those files and add them to the files to be returned. Once we've hit the end of the end of the pointer table then we seek until we find the TMLL header and return if it doesn't exist. If it does exist then we read 0x20 to get the header size and then 0x2c to get the number of textures in the section. We then decrypt the header and get the filenames and offsets. Then we seek to the next non-zero byte after the header, get the body, decompress. And then unpack and add the textures to the files to be returned. And then we return the unpacked files from the archive.
  7. Right now I don't have too much to write about in terms of the FPB tarball since I'm simply using the "look for magic numbers and slice" method. It seems like there should be a cleaner approach that involves reading the file header and then finding the offsets in the file. I've been taking a second look at NBL and I think I can write some notes about that. The figure above shows the outline of the structure of an NBL archive. For NBL I'm guessing the extension stands for "New BML" as it has a similar function and approach to PSO's BML archive format, but in implementation a lot has been changed. In terms of function it acts the same grouping a list of models, and files associated with the models such as animations, and then has the textures for those models all included in a single grouped compressed file. In terms of structure the way it does this is somewhat different. NBL groups information with two main sections. First is the NMLL section which contains any kind of information that isn't textures. So the content is anything from models to animations, texture lists, and animation. The second section is the TMLL section which contains the textures. Each section has it's own respective header and body. Which are essentially the same format. The NMLL header has 0x30 bytes of meta data at the top of the header followed by the file list, while the TMLL section has 0x20 bytes followed by the follow list. And the body is the same for both sections, can be compressed, and can be encrypted, and in that order. So if the body is uncompressed then decompress (with PRS for type 0) and then decrypt is encrypted (with blowfish). Once decompressed and encrypted the files are stacked end to end and the respective offsets and lengths are provided in the respective headers. The main difference between the two sections is that texture data is likely copied directly into the framebuffer. So the archive only needs to unpack the files and then copy the information directly into the framebuffer. The NMLL section of the archive gets copied as a block directly into memory, and the archive is implemented so that all pointers in the NMLL files are zero relative to the start of the body. And then following the body is a list of pointer offsets inside the body, with the reasoning being that it allows for all of the pointers to be easily adjusted to the location copied into memory. In terms of approach for extracting the files from an NBL the approach is a little messy. First read the meta data in the header to try and find the number of files in each respective section, the length of each header in bytes, the compressed size of each body and then the decompressed size of each body. And then get the key and initialize blowfish on the condition the files are encrypted. Then it seems like the next step is to try and seek through the file to try and find where each segment is and slice it into it's own section of information. The reason for using slicing is kind of two-fold. The main reason is that it's unclear if there is a pattern for the amount of padding between each group of information. So after the NMLL header there is zero padding before the body, and then the start of the body is the first non-zero bytes following. And that seems to hold true for all of the segments, after the NMLL body is an unknown amount of zero padding followed by the pointer list. And then the TMLL section is optional and doesn't exist in a lot of cases. And then when it does exist it's position needs to be found by looking for the TMLL magic number. And which is then followed by more padding and then the TMLL body. So the most difficultly for handling this function generally comes from handling the different conditions and seeking through to find the different segments. Once each segment has been isolated, you decrypt and read the header. Then decompress and decrypt the body, and then in the case of NMLL adjust the pointers, and then pull out the files. Seems like something that could be addressed by adding a few offset pointers in the headers, but at best I can only speculate at the potential reasons behind this. The best reason that I can think of is that the archive is intended to be more of a streaming format, where the archive is intended to start at the beginning and seek through the file copying information into memory as it goes along. Though having a few more offsets declared in the header would make this format much easier to work with.
  8. I realized what I was missing in life: tables. So I'm not very knowledgeable about PSU file formats as I haven't had a chance to dig into it at all. But I was able to ask someone who knows a lot more than me: Argajag about the file formats. It looks like the types of files used boils down to a couple of key differences. For models PSU uses the .xja format and PSP uses the .unj format. I think Agrajag also mention that the model animations, texture animations, bone names and texture names have different extensions for PSU and PSP but the file formats are the same for all of them. And then come the differences in archives. All of the titles seem to use the .nbl archive format. Which I think stands for "new-bml" if anyone is familiar with the PSO .bml archive type. It seems to have a similar roll and structure and NMLL section for the model-related files and a TMLL section for the texture files. The texture files seem to have some variation with .xvr on PSU, .gim on PSP1 and .uvr on PSP2(i). And then for the file system PSU uses the computer's folder's directly, so all of the game's files are in the /DATA folder. For the PSP titles, all of the games have 'file.fpb' file, which is around 800MB and has all of the assets in the game stored as a list of .nbl archives. As for the differences between the PSP titles, the basic files such as the models and animations seem to have remained the same. The main difference on the raw files is between the .gim files and .uvr files for the textures. And it looks like PSP2(i) contains a few .gim files that were likely re-used from PSP1, but aside from that the vast majority are .uvr format. Otherwise the main difference between PSP1 and PSP2(i) is obfuscation. The fpb and nbl file types where changed in an attempt to make them more difficult to access. So in terms of the portable tiles, the list of file types seems to break down into .unj, .unm, gim, .uvr, .una, .unt, .nbl and .fpb. For fpb files, it's kind of frustrating that I don't have a better method other than the "slice" method of looking for the NMLL header and writing out the space in-between for each write. Having some more information for how the game manages the header and how it stores the file names would definitely make navigating the file system on the portable titles a lot easier. And right now I'm running into an issue on PSP1 where my extractor is running into an error on PSP1, so I'll have to track down the reason for that to see what's going on. After that I can get back to focusing on the model format and taking more notes. Edit: Okay the reason for the nbl file wasn't working was an empty nbl file that had a header but no content. So that means I can get back to looking at the basic file types. As a side note beign able to get the filenames from fpb files and type 2 nbl files would make working with the portable series a lot easier.
  9. Grabbing more files from the common models archive. Time to start looking into other archives for test cases. Two notes: nmll-09283.nbl is broken, and nmll-08904.nbl might be a good test candidate for the next set of vertex flag testing.
  10. Oh look a different model. Now that we are able to read archives and textures, the next step is to track down different vertex lists to track down which flags do what. The game seems to group each vertex is in the order of diffuse map uv, color, and position. And then they also seem to use 32 bit float and 16 bit ints to define values. So the flag defines what types of attributes are in the vertex list and what format they are in. So basically we find a model, debug the vertex list, find another model and debug again. So to start out we have the calorie mate item, which I figured would be a good place to start with the 16 bit values. We have 16 bit uv values and 16 vertices. I don't think I have the exact scale for the vertices and uv values, but I'm surprised with how close I got by plugging in likely values.
  11. We're back to something that resembles the first post with a few improvements. And the main improvements are that we're not unpacking and then stores but, but we can take archives and then read the files directly from there. Textures have also been implemented, and I fixed the vertex wind order for the faces. Next step is to test the meshes that are included in the basic archive and then start scale up from there.
  12. Small update. Textures have been added. Right now I'm throwing the textures somewhere (right now outlined in red), so check that the texture has been parsed correctly. I should probably add another window to the UI that can be used for viewing textures that have been loaded in for a model. To save on time, I might make a white box and stick it under the 3d viewport, and then move on to setting the textures to the map diffuse uv.
  13. Okay, so now that we can import assets and store them to be referenced later, the next step is to start working on parsing the files. So the way I have it now is that unj files are split into categories and then listed on the left. When a unj file is clicked, we can then load it and parse it. So I did some testing with nodejs as far as converting textures to pngs, and it looks like the texture format is pretty straight forward. So before we get into the models, we can take a second to make sure the textures are working, and that should help us later as far as identifying which assets are what for the models, which can help with debugging. So to be able to know which texture needs to be read for what model, we need to reference the texture list, which will tell us which textures need to be linked with what materials. And thankfully they re-used the NJTL ninja texture list format from the Dreamcast days. Possibility with a slight modification, but looking at the file, it's pretty easy to trace through. So first the magic number NUTL, followed by the length. Followed by the pointer to the header which is on 0x24 in this case. Then the header is the number of textures, followed by a pointer to the first texture struct, which is on 0x10. 0x10 and 0x14 are both zero. And this is because these values are use for attributes after the texture has been loaded into RAM, but not before. And that just leaves a pointer to the start of the texture which is 0x2c. So this is only for a .unt file with one texture. So to get the exact format of the header, we'll have to find a texture list that references more than one texture, but for now we have something that works. We can get the name of the texture, load it, parse it, and then have it ready to be paired with the model geometry when that gets loaded.
  14. Okay, so managing the file.fpb as an input might be an over-reach, but it looks like I can now access nbl files for uploading. Which means I can either figure out a better way to implement fpb without crashing, or take a list of nbl files as input, which would require running a script to per-separate them, but would otherwise work. For the viewer side of thing, to keep the list of models from being too long, i split the models into groups by the first two letters of the model name. So far it seems to work for the limited test cases I've been working with so far. We'll have to see if the approach holds up. Right now the model parsing side is still pretty limited. I guess I'll start with the simple drop models and work from there. Though before working on models I think I'll go ahead and implement textures to get those out of the way first. Because from what I've seen so far, the texture implementation is surprisingly straight forward.
  15. Okay, so now files are unpacked from the nbl archive. And now the question is what the best way to store and sort them is. I added a few properties to try and make it easier to find things like extension, basename, added a shared unique id for files in an nbl archive, and added an md5. So what I think i will do is set filename + md5 as the primary key. And then to find the models i can search for .unj on the extension. And since I'm only interested in models, what I might do is filter out files from the nmll section that don't match .unj, .unt, .unv, or .unm.
  • Create New...