DC-SWAT Forum

Полная версия: "Emulate async read" parameter
Вы просматриваете yпpощеннyю веpсию форума. Пеpейти к полной веpсии.
I'm trying to understand the impact of the advanced parameter "Emulate async read" when loading games with Dreamshell.
(Note : on Retrodream, this parameters seems to be (mis?)named SYNC).

On Dreamshell, allowed values are "none", 1, 2, 3, 4, 5, 6, 7, 8 and 16.
On Retrodream, they are called "TRUE", 1, 2, 3, 4, 5, 6, 7, 8 and 16.
So basically the same, just a name change.

When testing different parameters to load a game, one can find many (different!) advises here and there about which value to select, and these values vary per title.
It's unclear to me why 8 should be preferable to 16, or 2, or "none",

For reference, here is the information I could collect so far, note though that they could be false :
- Higher async values are supposed to improve read performance
- "none" is supposed to be the best for compatibility, but worse for performance.
- Unfortunately, performance itself can be a problem for compatibility, so "none" is not a sure-win.
- Higher async values are supposed to be "worse" for compatibility, though I'm not sure to understand in which ways ?
- I also don't know if this value impact the resident memory size of the loader, which could then impact compatibility. For example, does a higher async value means a larger look-ahead buffer ?
- It's also unclear to me if storage fragmentation can be a problem for this setting.
- The impact of this setting is said to vary depending on loader version (?)

Based on above explanations, it seems that it would be better to target 16 whenever possible, for better performance, and scale down from there, whenever it seems to hinder compatibility ?

Rather than relying on urban legends, I would prefer to read here a good explanation of the parameter, to get a good sense. Therefore, could it be possible to explain the expected outcome of this parameter ?
In DreamShell "none" param can be replaced by "true" if used optimized image (ISO or GDI) and IDE device.
If you plays with this value seems you are SD card user? For IDE devices just use DMA with "true async" and optimized images and it should works in most cases.
For SD device you can play with emu async value for more smooth gameplay if game load resources at gameplay. If not, use max value for faster loadings. It also auto disabled if game try load more than 100 sectors in one request, I detect it as "loading state" in game.
For some games better use lowest value, like Shenmue and Crazy taxi better use "1+" for emu async. Slow loadings, but smooth gameplay.

Emu async - is emulating DMA (it's works async) with PIO readings. It's split requested data reading for N sectors (this is emu async value is).
Lowest value - smooth gameplay, but slow loadings.
Highest value - lags in gameplay, but faster loadings.

For CDDA games just use highest value, because this games doesn't load resources at gameplay.

You can’t play some games on the SD card because it's only PIO device, the videos with lags, and it’s also more difficult with CDDA, so I would recommend getting an IDE mod and you forget about emu async and other limitations in most cases.
Thanks for these great answers @SWAT.
Let me provide additional details below :

(13.04.2023 13:50)SWAT писал(а): [ -> ]If you plays with this value seems you are SD card user?

Yes, I should have mentioned it, I'm using a SD Card reader on the serial port.
I'm aware that this setup offers limited performance, compared to an IDE mod.
However, it also allows to keep the Dreamcast in pristine (original) condition,
and that's what I'm after for this unit.

Do I understand correctly that your comment implies that the "async" parameter is actually only useful for the SD Card scenario ?
And that, rule of thumb, higher values generally offer higher performance ?

May I ask why is 16 the maximum possible value ? Why not, for example, 32 ?

(13.04.2023 13:50)SWAT писал(а): [ -> ]For some games better use lowest value, like Shenmue and Crazy taxi better use "1+" for emu async. Slow loadings, but smooth gameplay.

This part is very interesting.
So there are some drawback to larger values, and you imply that large values are not good when games load data during gameplay, or said differently, small values offer smoother gameplay.
That's very useful to know, because many games indeed load resources during gameplay.
One big example comes to mind, Dead or Alive 2.

Another example I'm not sure how to categorize : when there is a video cutscene, what's preferable ? High async values, or small ones ?

But it's still unclear to me how a smaller values leads to "smoother gameplay".
So I'm still trying to get my mind around how this work.

You also mentions that the "async" parameter "emulates DMA".
So let me rephrase it, to check if my understanding is correct :

1) In full SYNC mode, the program asks for one sector, the program blocks until it receives the sector, and once this is done, execution resumes.

2) In "async" mode, the program asks for a first sector, the request is triggered but execution immediately comes back to the program, and it may request a second sector before the first has arrived. And it can chain these requests, until it reaches a threshold, which is the "async" number. At which point, execution will finally block, and wait for some data to be delivered before resuming.

I'm not sure if this is the correct way to see the system working, because:
- The ability to request more data seems to be application driven, which must also reserve enough memory space to receive data from so many sectors.
- At some point, execution must still be interrupted, because the cpu is fully involved in the receiving of data from the SD card.
- I'm not sure to understand how a smaller async queue leads to "smoother gameplay".

So it's unclear if I've understood the underlying mechanism.
(14.04.2023 02:48)sundance2 писал(а): [ -> ]it also allows to keep the Dreamcast in pristine (original) condition,
and that's what I'm after for this unit.

Buy another one for this Wink They are not expensive.
All modifications can be made so that it looks like the pristine from the outside.

(14.04.2023 02:48)sundance2 писал(а): [ -> ]Do I understand correctly that your comment implies that the "async" parameter is actually only useful for the SD Card scenario ?
And that, rule of thumb, higher values generally offer higher performance ?

This is for PIO mode and non-optimized images.
SD card is PIO only device.

(14.04.2023 02:48)sundance2 писал(а): [ -> ]May I ask why is 16 the maximum possible value ? Why not, for example, 32 ?

It makes no sense, if you need to get more, just choose "none" to get max.

(14.04.2023 02:48)sundance2 писал(а): [ -> ]
(13.04.2023 13:50)SWAT писал(а): [ -> ]For some games better use lowest value, like Shenmue and Crazy taxi better use "1+" for emu async. Slow loadings, but smooth gameplay.

This part is very interesting.
So there are some drawback to larger values, and you imply that large values are not good when games load data during gameplay, or said differently, small values offer smoother gameplay.
That's very useful to know, because many games indeed load resources during gameplay.
One big example comes to mind, Dead or Alive 2.

Exactly.

(14.04.2023 02:48)sundance2 писал(а): [ -> ]Another example I'm not sure how to categorize : when there is a video cutscene, what's preferable ? High async values, or small ones ?

8/16. Here the main problem is the low connection speed of this device.
It is not enough to play video in full quality.

(14.04.2023 02:48)sundance2 писал(а): [ -> ]But it's still unclear to me how a smaller values leads to "smoother gameplay".
So I'm still trying to get my mind around how this work.

You also mentions that the "async" parameter "emulates DMA".
So let me rephrase it, to check if my understanding is correct :

1) In full SYNC mode, the program asks for one sector, the program blocks until it receives the sector, and once this is done, execution resumes.

2) In "async" mode, the program asks for a first sector, the request is triggered but execution immediately comes back to the program, and it may request a second sector before the first has arrived. And it can chain these requests, until it reaches a threshold, which is the "async" number. At which point, execution will finally block, and wait for some data to be delivered before resuming.

I'm not sure if this is the correct way to see the system working, because:
- The ability to request more data seems to be application driven, which must also reserve enough memory space to receive data from so many sectors.
- At some point, execution must still be interrupted, because the cpu is fully involved in the receiving of data from the SD card.
- I'm not sure to understand how a smaller async queue leads to "smoother gameplay".

So it's unclear if I've understood the underlying mechanism.

Application can request one or a lot of sectors, for example it can load megabyte in one request, there is >500 CD sectors (during gameplay lower of course, but still a lot).
In a real DMA mode, application make request and get back CPU control immediately. Then applications check syscalls for status every frame (KATANA behavior) and/or waiting IRQ (WinCE behavior).
If we load 500 sectors in PIO mode, it's block CPU for a long time and in game you get lag. To reduce this effect, I divide the reading into parts (emu async value) and read each frame by this part, to prevent block CPU for a long time. But the number of frames in the game still falls, it just happens more smoothly, not jerkily. And it also stretches the overall load time, as we put control of CPU to application through each part.
By this reasons the SD card is a very limited device, it not only has a slow connection, it also cannot read data asynchronously, again due to the connection method.
Thanks @SWAT ! I think this is much clearer now !

So, when instructed to load N sectors,
your loader layer intentionally cuts the request into smaller batches of size [1-16]
in order to give back control from time to time to the application,
so that it can do something useful before blocking again for next batch read.

And the option "none" essentially means "no limit" : if the application requests N sectors, it will block until it receives N sectors.

I suspect that what retrodream calls "TRUE" is essentially the same as Dreamshell "none", since it's just an overlay.

I believe what I got wrong in my mental model is that I associated "none" to `0`, because in the UI it seems positioned "before" the `1`,
while it should rather be associated with "infinity", and therefore could be considered "beyond" the `16`.
just a minor follow up:

Now that I've got a better understanding of how SYNC/async works,
I got back to older titles featuring choppy in game animations,
and proceeded to reduce the async value parameter.
And sure enough, on reaching low enough values, the choppy effects disappears !
The effect was especially pronounced in DoA2.
Note that the game can run slower at times, but it remains smooth, which is miles better than previous choppiness.

Thanks @SWAT!
Yes, now it seems you understand everything correctly. I hope this will useful to someone else who uses the SD adapter.
As for the UI, there is always something to work on, but unfortunately no enough time for all.
I can make quick fix and change "none" to "off" if it will better.
URL ссылки