Ryujinx

Author	SHA1	Message	Date
TSR Berry	cee7121058	Move solution and projects to src	2023-04-27 23:51:14 +02:00
gdkchan	9f12e50a54	Refactor attribute handling on the shader generator (#4565 ) * Refactor attribute handling on the shader generator * Implement gl_ViewportMask[] * Add back the Intel FrontFacing bug workaround * Fix GLSL transform feedback outputs mistmatch with fragment stage * Shader cache version bump * Fix geometry shader recognition * PR feedback * Delete GetOperandDef and GetOperandUse * Remove replacements that are no longer needed on GLSL compilation on Vulkan * Fix incorrect load for per-patch outputs * Fix build	2023-04-25 19:51:07 -03:00
riperiperi	8d9d508dc7	Shader: Bias textureGather instructions on AMD/Intel (#4703 ) * Experimental (GLSL, forced) * SPIR-V attempt * Add capability * Fix pCount == 1 on glsl * Fix typo	2023-04-22 18:02:39 -03:00
riperiperi	a64fee29dc	Vulkan: add situational "Fast Flush" mode (#4667 ) * Flush in the middle of long command buffers. * Vulkan: add situational "Fast Flush" mode The AutoFlushCounter class was added to periodically flush Vulkan command buffers throughout a frame, which reduces latency to the GPU as commands are submitted and processed much sooner. This was done by allowing command buffers to flush when framebuffer attachments changed. However, some games have incredibly long render passes with a large number of draws, and really aggressive data access that forces GPU sync. The Vulkan backend could potentially end up building a single command buffer for 4-5ms if a pass has enough draws, such as in BOTW. In the scenario where sync is waited on immediately after submission, this would have to wait for the completion of a much longer command buffer than usual. The solution is to force command buffer submission periodically in a "fast flush" mode. This will end up splitting render passes, but it will only enable if sync is aggressive enough. This should improve performance in GPU limited scenarios, or in games that aggressively wait on synchronization. In some games, it may only kick in when res scaling. It won't trigger in games like SMO where sync is not an issue. Improves performance in Pokemon Scarlet/Violet (res scaled) and BOTW (in general). * Add conversions in milliseconds next to flush timers.	2023-04-11 09:23:41 +02:00
Mary	c95be55091	vulkan: Cleanup PhysicalDevice and Instance querying (#4632 ) * vulkan: Move most of the properties enumeration to VulkanPhysicalDevice That clean up a bit of duplicate logic. Also move to use an hashset for device extensions. * vulkan: Move instance querying to VulkanInstance Also cleanup code to use span when possible instead of unsafe pointers. * Address gdkchan's comments	2023-04-05 14:48:38 -03:00
Mary	f5a6f45b27	vulkan: Separate debug utils logic from VulkanInitialization (#4609 ) * vulkan: Separate debug utils logic from VulkanInitialization Also checks for VK_EXT_debug_utils existence instead of force enabling it and allow possible error during messenger init * Address gdkchan's comment * Use CreateDebugUtilsMessenger Span variant	2023-04-01 08:05:02 +00:00
Mary	f0a3dff136	vulkan: Remove CreateCommandBufferPool from VulkanInitialization (#4606 ) It was only called in one place, that can be simplified.	2023-03-27 02:16:31 +00:00
Mary	f659dcb9d8	vulkan: fix broken "VK_EXT_subgroup_size_control" support check (#4607 ) Not sure since when it was broken...	2023-03-26 19:01:30 +02:00
riperiperi	9f1cf6458c	Vulkan: Migrate buffers between memory types to improve GPU performance (#4540 ) * Initial implementation of migration between memory heaps - Missing OOM handling - Missing `_map` data safety when remapping - Copy may not have completed yet (needs some kind of fence) - Map may be unmapped before it is done being used. (needs scoped access) - SSBO accesses are all "writes" - maybe pass info in another way. - Missing keeping map type when resizing buffers (should this be done?) * Ensure migrated data is in place before flushing. * Fix issue where old waitable would be signalled. - There is a real issue where existing Auto<> references need to be replaced. * Swap bound Auto<> instances when swapping buffer backing * Fix conversion buffers * Don't try move buffers if the host has shared memory. * Make GPU methods return PinnedSpan with scope * Storage Hint * Fix stupidity * Fix rebase * Tweak rules Attempt to sidestep BOTW slowdown * Remove line * Migrate only when command buffers flush * Change backing swap log to debug * Address some feedback * Disallow backing swap when the flush lock is held by the current thread * Make PinnedSpan from ReadOnlySpan explicitly unsafe * Fix some small issues - Index buffer swap fixed - Allocate DeviceLocal buffers using a separate block list to images. * Remove alternative flags * Address feedback	2023-03-19 17:56:48 -03:00
gdkchan	5d85468302	Vulkan: Support list topology primitive restart (#4483 )	2023-02-26 19:19:00 -03:00
gdkchan	cedd200745	Move gl_Layer to vertex shader if geometry is not supported (#4368 ) * Set gl_Layer on vertex shader if it's set on the geometry shader and it does nothing else * Shader cache version bump * PR feedback * Fix typo	2023-02-25 10:39:51 +00:00
gdkchan	7aa430f1a5	Add support for advanced blend (part 1/2) (#2801 ) * Add blend microcode registers * Add advanced blend support using host extension * Remove debug message * Use pre-generated table for blend functions * XML docs * Rename AdvancedBlendMode to AdvancedBlendOp for consistency * Remove redundant code * Fix some advanced blend related issues on Vulkan * Formatting	2023-02-19 22:37:37 -03:00
Mary	17078ad929	vulkan: Respect VK_KHR_portability_subset vertex stride alignment (#4419 ) * vulkan: Respect VK_KHR_portability_subset vertex stride alignment We were hardcoding alignment to 4, but by specs it can be any values that is a power of 2. This also enable VK_KHR_portability_subset if present as per specs requirements. * address gdkchan's comment * Make NeedsVertexBufferAlignment internal	2023-02-15 08:41:48 +00:00
Mary	32450d45de	vulkan: Clean up MemoryAllocator (#4418 ) This started as an attempt to remove vkGetPhysicalDeviceMemoryProperties in FindSuitableMemoryTypeIndex (As this could have some overhead and shouldn't change at runtime) and turned in a little bigger cleanup.	2023-02-15 07:50:26 +01:00
gdkchan	7528f94536	Implement safe depth-stencil blit using stencil export extension (#4356 ) * Implement safe depth-stencil blit using stencil export extension * Delete depth-stencil blit with buffer path	2023-02-06 00:19:31 -03:00
gdkchan	296c4a3d01	Relax Vulkan requirements (#4282 ) * Relax Vulkan requirements * Fix MaxColorAttachmentIndex * Fix ColorBlendAttachmentStateCount value mismatch for background pipelines * Change query capability check to check for pipeline statistics query rather than geometry shader support	2023-01-26 18:34:35 -03:00
riperiperi	e7cf4e6eaf	Vulkan: Reset queries on same command buffer (#4329 ) * Reset queries on same command buffer Vulkan seems to complain when the queries are reset on another command buffer. No idea why, the spec really could be written better in this regard. This fixes complaints, and hopefully any implementations that care extensively about them. This change _guesses_ how many queries need to be reset and resets as many as possible at the same time to avoid splitting render passes. If it resets too many queries, we didn't waste too much time - if it runs out of resets it will batch reset 10 more. The number of queries reset is the maximum number of queries in the last 3 frames. This has been worked into the AutoFlushCounter so that it only resets up to 32 if it is yet to force a command buffer submission in this attachment. This is only done for samples passed queries right now, as they have by far the most resets. * Address Feedback	2023-01-24 13:32:56 -03:00
riperiperi	de3134adbe	Vulkan: Explicitly enable precise occlusion queries (#4292 ) The only guarantee of the occlusion query type in Vulkan is that it will be zero when no samples pass, and non-zero when any samples pass. Of course, most GPUs implement this by just placing the # of samples in the result and calling it a day. However, this lax restriction means that GPUs could just report a boolean (1/0) or report a value after one is recorded, but before all samples have been counted. MoltenVK falls in the first category - by default it only reports 1/0 for occlusion queries. Thankfully, there is a feature and flag that you can use to force compatible drivers to provide a "precise" query result, that being the real # of samples passed. Should fix ink collision in Splatoon 2/3 on MoltenVK.	2023-01-19 00:30:42 +00:00
Ac_K	85faa9d8fa	Revert "Relax Vulkan requirements (#4228 )" (#4279 ) This reverts commit `dca5b14493`.	2023-01-13 06:04:59 +00:00
gdkchan	dca5b14493	Relax Vulkan requirements (#4228 )	2023-01-13 06:09:48 +01:00
riperiperi	8fa248ceb4	Vulkan: Add workarounds for MoltenVK (#4202 ) * Add MVK basics. * Use appropriate output attribute types * 4kb vertex alignment, bunch of fixes * Add reduced shader precision mode for mvk. * Disable ASTC on MVK for now * Only request robustnes2 when it is available. * It's just the one feature actually * Add triangle fan conversion * Allow NullDescriptor on MVK for some reason. * Force safe blit on MoltenVK * Use ASTC only when formats are all available. * Disable multilevel 3d texture views * Filter duplicate render targets (on backend) * Add Automatic MoltenVK Configuration * Do not create color attachment views with formats that are not RT compatible * Make sure that the host format matches the vertex shader input types for invalid/unknown guest formats * FIx rebase for Vertex Attrib State * Fix 4b alignment for vertex * Use asynchronous queue submits for MVK * Ensure color clear shader has correct output type * Update MoltenVK config * Always use MoltenVK workarounds on MacOS * Make MVK supersede all vendors * Fix rebase * Various fixes on rebase * Get portability flags from extension * Fix some minor rebasing issues * Style change * Use LibraryImport for MVKConfiguration * Rename MoltenVK vendor to Apple Intel and AMD GPUs on moltenvk report with the those vendors - only apple silicon reports with vendor 0x106B. * Fix features2 rebase conflict * Rename fragment output type * Add missing check for fragment output types Might have caused the crash in MK8 * Only do fragment output specialization on MoltenVK * Avoid copy when passing capabilities * Self feedback * Address feedback Co-authored-by: gdk <gab.dark.100@gmail.com> Co-authored-by: nastys <nastys@users.noreply.github.com>	2023-01-13 01:31:21 +01:00
riperiperi	e20abbf9cc	Vulkan: Don't flush commands when creating most sync (#4087 ) * Vulkan: Don't flush commands when creating most sync When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute. Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool. This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on. Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading. OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there. Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here) * Add strict argument This is technically a separate concern from whether the sync is a host syncpoint. * Remove _interrupted variable * Actually wait for the invoke This is required by AMD GPUs, and also may have caused some issues on other GPUs. * Remove unused using. * I don't know why it added these ones. * Address Feedback * Fix typo	2022-12-29 15:39:04 +01:00
riperiperi	470be03c2f	GPU: Add fallback when 16-bit formats are not supported (#4108 ) * Add conversion for 16 bit RGBA formats (not supported in Rosetta) * Rebase fix Rebase fix * Forgot to remove this * Fix RGBA16 format conversion * Add RGBA4 -> RGBA8 conversion * Handle host stride alignment * Address Feedback Part 1 * Can't count * Don't zero out rgb when alpha is 0 * Separate RGBA4 and 5-bit component formats Not sure of a better way to name them... * Add A1B5G5R5 conversion * Put this in the right place. * Make format naming consistent for capabilities * Change method names	2022-12-26 15:50:27 -03:00
gdkchan	f906eb06c2	Implement a software ETC2 texture decoder (#4121 ) * Implement a software ETC2 texture decoder * Fix output size calculation for non-2D textures * Address PR feedback	2022-12-21 20:39:58 -03:00
Georg Lehmann	0f50de72be	Vulkan: enable VK_EXT_custom_border_color features (#4116 ) * Vulkan: enable VK_EXT_custom_border_color features radv only create the border color bo if this feature is enabled, so it crashed when creating samplers with custom border colors Fixes #4072 Fixes #3993 * Address gdkchan's comment Co-authored-by: Mary <mary@mary.zone>	2022-12-14 20:53:33 -03:00
gdkchan	17a1cab5d2	Allow SNorm buffer texture formats on Vulkan (#3957 ) * Allow SNorm buffer texture formats on Vulkan * Shader cache version bump	2022-12-04 15:36:03 -03:00
gdkchan	73aed239c3	Implement non-MS to MS copies with draws (#3958 ) * Implement non-MS to MS copies with draws, simplify MS to non-MS copies and supports any host sample count * Remove unused program	2022-12-04 15:07:11 -03:00
Andrey Sukharev	3868a00206	Use source generated regular expressions (#4005 )	2022-12-04 00:43:23 +00:00
Mary-nyan	ce92e8cd04	chore: Update Silk.NET to 2.16.0 (#3953 )	2022-12-01 19:11:56 +01:00
riperiperi	458452279c	GPU: Track buffer migrations and flush source on incomplete copy (#3952 ) * Track buffer migrations and flush source on incomplete copy Makes sure that the modified range list is always from the latest iteration of the buffer, and flushes earlier iterations of a buffer if the data has not been migrated yet. * Cleanup 1 * Reduce cost for redundant signal checks on Vulkan * Only inherit the range list if there are pending ranges. * Fix OpenGL * Address Feedback * Whoops	2022-12-01 16:30:13 +01:00
Ac_K	a1ddaa2736	ui: Fixes disposing on GTK/Avalonia and Firmware Messages on Avalonia (#3885 ) * ui: Only wait on _exitEvent when MainLoop is active under GTK This fixes a dispose issue under Horizon/GTK, we don't check if the ApplicationClient is null so it throw NCE. We don't check if the main loop is active and waiting an event which is set in the main loop... So that could lead to a freeze. Everything works fine in GTK now. Related issue: https://github.com/Ryujinx/Ryujinx/issues/3873 As a side note, same kind of issue appear in Avalonia UI too. Firmware's popup doesn't show anything and the emulator just freeze. * TSRBerry's change Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Fix Avalonia crashing/freezing * Add Avalonia OpenGL fixes * Fix firmware popup on windows * Fixes everything * Add _initialized bool to VulkanRenderer and OpenGL Window Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>	2022-11-24 15:08:27 +01:00
gdkchan	2e43d01d36	Move gl_Layer from vertex to geometry if GPU does not support it on vertex (#3866 ) * Move gl_Layer from vertex to geometry if GPU does not support it on vertex * Shader cache version bump * PR feedback	2022-11-18 23:27:54 -03:00
gdkchan	f1d1670b0b	Implement HLE macro for DrawElementsIndirect (#3748 ) * Implement HLE macro for DrawElementsIndirect * Shader cache version bump * Use GL_ARB_shader_draw_parameters extension on OpenGL * Fix DrawIndexedIndirectCount on Vulkan when extension is not supported * Implement DrawIndex * Alignment * Fix some validation errors * Rename BaseIds to DrawParameters * Fix incorrect index buffer and vertex buffer size in some cases * Add HLE macros for DrawArraysInstanced and DrawElementsInstanced * Perform a regular draw when indirect data is not modified * Use non-indirect draw methods if indirect buffer was not GPU modified * Only check if draw parameters match if the shader actually uses them * Expose Macro HLE setting on GUI * Reset FirstVertex and FirstInstance after draw * Update shader cache version again since some people already tested this * PR feedback Co-authored-by: riperiperi <rhy3756547@hotmail.com>	2022-11-16 14:53:04 -03:00
gdkchan	f82309fa2d	Vulkan: Implement multisample <-> non-multisample copies and depth-stencil resolve (#3723 ) * Vulkan: Implement multisample <-> non-multisample copies and depth-stencil resolve * FramebufferParams is no longer required there * Implement Specialization Constants and merge CopyMS Shaders (#15) * Vulkan: Initial Specialization Constants * Replace with specialized helper shader * Reimplement everything Fix nonexistant interaction with Ryu pipeline caching Decouple specialization info from data and relocate them Generalize mapping and add type enum to better match spv types Use local fixed scopes instead of global unmanaged allocs * Fix misses in initial implementation Use correct info variable in Create2DLayerView Add ShaderStorageImageMultisample to required feature set * Use texture for source image * No point in using ReadOnlyMemory * Apply formatting feedback Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Apply formatting suggestions on shader source Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Support conversion with samples count that does not match the requested count, other minor changes Co-authored-by: mageven <62494521+mageven@users.noreply.github.com>	2022-11-02 18:17:19 -03:00
Wunk	3fe3598d41	Vulkan: Replace `VK_EXT_debug_report` usage with `VK_EXT_debug_utils` (#3802 ) * Vulkan: Replace `VK_EXT_debug_report` usage with `VK_EXT_debug_utils` [VK_EXT_debug_report](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_debug_report.html) has been depreciated for quite some time now in favor of the much more featureful [VK_EXT_debug_utils](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_debug_utils.html) extension. This PR converts our debug-report-callback into the newer debug-messenger pattern. `VK_EXT_debug_utils` adds some additional diagnostic tooling for marking debug-label scopes for queue-operations, command-buffers, and assigning name-labels to vulkan objects to aid in debugging(for a later PR). * Vulkan: Fix `DebugMessenger` severity-flag classification Extension bits between the two flags, for reference: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDebugUtilsMessageSeverityFlagBitsEXT.html https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDebugReportFlagBitsEXT.html	2022-10-29 14:09:25 -03:00
riperiperi	6e92b7a378	Dispose Vulkan TextureStorage when views hit 0 instead of immediately (#3738 ) Due to the `using` statement being scoped to the `CreateTextureView` method, `TextureStorage` would be disposed as soon as the view was returned. This was largely fine as the TextureStorage resources were being kept alive by the views holding their own references to them, but it also meant that dispose is only called as soon as the texture is created. Aliased Storages are TextureStorages created with the same allocation as another TextureStorage, if they have to be aliased as another format. We keep track of a TextureStorage's `_aliasedStorages` as they are created, and dispose them when the TextureStorage is disposed... ...except it is disposed immediately, before any aliased storages are even created. The aliased storages added after this will never be disposed. This PR attempts to fix this by disposing TextureStorage when its view count reaches 0. The other use of texture storage - the D32S8 blit - still manually disposes the storage, but regular uses created via the GAL are now disposed by the view count. I think this makes the most sense, as otherwise in the future this behaviour might be forgotton and more things could be added to the Dispose() method that don't work due to it not actually calling at the right time. This should improve memory leaks in Super Mario Odyssey, most noticeable when resolution scaling. The memory usage of the game is still wildly unpredictable due to how it interacts with the texture cache, but now it shouldn't get considerably longer as you play... I hope. I've seen it typically recover back to the same level occasionally, though it can spike significantly. Please test a bunch of games on multiple GPUs to make sure this doesn't break anything.	2022-10-18 23:52:08 +00:00
riperiperi	4c0eb91d7e	Convert Quads to Triangles in Vulkan (#3715 ) * Add Index Buffer conversion for quads to Vulkan Also adds a reusable repeating pattern index buffer to use for non-indexed draws, and generalizes the conversion cache for buffers. * Fix some issues * End render pass before conversion * Resume transform feedback after we ensure we're in a pass. * Always generate UInt32 type indices for topology conversion * No it's not. * Remove unused code * Rely on TopologyRemap to convert quads to tris. * Remove double newline * Ensure render pass ends before stride or I8 conversion	2022-09-20 18:38:48 -03:00
Emmanuel Hansen	6f0395538b	Avalonia - Use embedded window for avalonia (#3674 ) * wip * use embedded window * fix race condition on opengl Windows * fix glx issues on prime nvidia * fix mouse support win32 * clean up * addressed review * addressed review * fix warnings * fix sotware keyboard dialog * Update Ryujinx.Ava/Ui/Applet/SwkbdAppletDialog.axaml.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> * remove double semi Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-09-19 15:05:26 -03:00
gdkchan	619ac86bd0	Do not output ViewportIndex on SPIR-V if GPU does not support it (#3644 ) * Do not output ViewportIndex on SPIR-V if GPU does not support it * Bump shader cache version	2022-09-10 13:20:23 +00:00
riperiperi	c6d82209ab	Restride vertex buffer when stride causes attributes to misalign in Vulkan. (#3679 ) * Vertex Buffer Alignment part 1 * Update CacheByRange * Add Stride Change compute shader, fix storage buffers in helpers * An AMD exclusive * Reword * Change rules - stride conversion when attrs misalign * Fix stupid mistake * Fix background pipeline compile * Improve a few things. * Fix some feedback * Address Feedback (the shader binary didn't change when i changed the source to use the subgroup size) * Fix bug where rewritten buffer would be disposed instantly.	2022-09-08 20:30:19 -03:00
gdkchan	88a0e720cb	Use RGBA16 vertex format if RGB16 is not supported on Vulkan (#3552 ) * Use RGBA16 vertex format if RGB16 is not supported on Vulkan * Catch all shader compilation exceptions	2022-08-20 16:20:27 -03:00
gdkchan	2232e4ae87	Vulkan backend (#2518 ) * WIP Vulkan implementation * No need to initialize attributes on the SPIR-V backend anymore * Allow multithreading shaderc and vkCreateShaderModule You'll only really see the benefit here with threaded-gal or parallel shader cache compile. Fix shaderc multithreaded changes Thread safety for shaderc Options constructor Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes. * Support multiple levels/layers for blit. Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now. * TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use * New depth-stencil blit method for AMD * Workaround for AMD driver bug * Fix some tessellation related issues (still doesn't work?) * Submit command buffer before Texture GetData. (UE4 fix) * DrawTexture support * Fix BGRA on OpenGL backend * Fix rebase build break * Support format aliasing on SetImage * Fix uniform buffers being lost when bindings are out of order * Fix storage buffers being lost when bindings are out of order (also avoid allocations when changing bindings) * Use current command buffer for unscaled copy (perf) Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies. * Update to .net6 * Update Silk.NET to version 2.10.1 Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before. * Fix PrimitivesGenerated query, disable Transform Feedback queries for now Lets Splatoon 2 work on nvidia. (mostly) * Update counter queue to be similar to the OGL one Fixes softlocks when games had to flush counters. * Don't throw when ending conditional rendering for now This should be re-enabled when conditional rendering is enabled on nvidia etc. * Update findMSB/findLSB to match master's instruction enum * Fix triangle overlay on SMO, Captain Toad, maybe others? * Don't make Intel Mesa pay for Intel Windows bugs * Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders) * Update Spv.Generator * Add alpha test emulation on shader (but no shader specialisation yet...) * Fix R4G4B4A4Unorm texture format permutation * Validation layers should be enabled for any log level other than None * Add barriers around vkCmdCopyImage Write->Read barrier for src image (we want to wait for a write to read it) Write->Read barrier for dst image (we want to wait for the copy to complete before use) * Be a bit more careful with texture access flags, since it can be used for anything * Device local mapping for all buffers May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?) Also some performance things and fixes issues with opengl games loading textures weird. * Cleanup, disable device local buffers for now. * Add single queue support Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment. * Fix some validation errors around extended dynamic state * Remove Intel bug workaround, it was fixed on the latest driver * Use circular queue for checking consumption on command buffers Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once. * Use SupportBufferUpdater, add single layer flush * Fix counter queue leak when game decides to use host conditional rendering * Force device local storage for textures (fixes linux performance) * Port #3019 * Insert barriers around vkCmdBlitImage (may fix some amd flicker) * Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers * Don't pause transform feedback for multi draw * Fix draw outside of render pass and missing capability * Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more) * Better workaround for AMD vertex buffer size alignment issue * More instructions + fixes on SPIR-V backend * Allow custom aspect ratio on Vulkan * Correct GTK UI status bar positions * SPIR-V: Functions must always end with a return * SPIR-V: Fix ImageQuerySizeLod * SPIR-V: Set DepthReplacing execution mode when FragDepth is modified * SPIR-V: Implement LoopContinue IR instruction * SPIR-V: Geometry shader support * SPIR-V: Use correct binding number on storage buffers array * Reduce allocations for Spir-v serialization Passes BinaryWriter instead of the stream to Write and WriteOperand - Removes creation of BinaryWriter for each instruction - Removes allocations for literal string * Some optimizations to Spv.Generator - Dictionary for lookups of type declarations, constants, extinst - LiteralInteger internal data format -> ushort - Deterministic HashCode implementation to avoid spirv result not being the same between runs - Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost) TODO: improve instruction allocation, structured program creator, ssa? * Pool Spv.Generator resources, cache delegates, spv opts - Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module. - NewInstruction is called instead of new Instruction() - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually. - Estimate code size when creating the output MemoryStream - LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast. Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call. TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads. * LocalDefMap for Ssa Rewriter Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator. * SPIR-V: Transform feedback support * SPIR-V: Fragment shader interlock support (and image coherency) * SPIR-V: Add early fragment tests support * SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta * Don't pass depth clip state right now (fix decals) Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted. * Multisampling support * Multisampling: Use resolve if src samples count > dst samples count * Multisampling: We can only resolve for unscaled copies * SPIR-V: Only add FSI exec mode if used. * SPIR-V: Use ConstantComposite for Texture Offset Vector Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are. * SPIR-V: Don't OpReturn if we already OpExit'ed Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist) * SPIR-V: Only use input attribute type for input attributes Output vertex attributes should always be of type float. * Multithreaded Pipeline Compilation * Address some feedback * Make this 32 * Update topology with GpuAccessorState * Cleanup for merge (note: disables spir-v) * Make more robust to shader compilation failure - Don't freeze when GLSL compilation fails - Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure. * Fix Multisampling * Only update fragment scale count if a vertex texture needs a scale. Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage. This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now. * Use a bitmap to do granular tracking for buffer uploads. This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful. Avoids ending render passes to update buffer data (not all the time) - 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI) - Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI) As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory) Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here. * Copy query results after RP ends, rather than ending to copy We need to end the render pass to get the data (submit command buffer) anyways... Reduces render passes created in games that use queries. * Rework Query stuff a bit to avoid render pass end Tries to reset returned queries in background when possible, rather than ending the render pass. Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?) * Remove unnecessary lines Was for testing * Fix validation error for query reset Need to think of a better way to do this. * SPIR-V: Fix SwizzleAdd and some validation errors * SPIR-V: Implement attribute indexing and StoreAttribute * SPIR-V: Fix TextureSize for MS and Buffer sampler types * Fix relaunch issues * SPIR-V: Implement LogicalExclusiveOr * SPIR-V: Constant buffer indexing support * Ignore unsupported attributes rather than throwing (matches current GLSL behaviour) * SPIR-V: Implement tessellation support * SPIR-V: Geometry shader passthrough support * SPIR-V: Implement StoreShader8/16 and StoreStorage8/16 * SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug * SPIR-V: Fix field index for scale count * SPIR-V: Fix another case of wrong field index * SPIRV/GLSL: More scaling related fixes * SPIR-V: Fix ImageLoad CompositeExtract component type * SPIR-V: Workaround for Intel FrontFacing bug * Enable SPIR-V backend by default * Allow null samplers (samplers are not required when only using texelFetch to access the texture) * Fix some validation errors related to texel block view usage flag and invalid image barrier base level * Use explicit subgroup size if we can (might fix some block flickering on AMD) * Take componentMask and scissor into account when clearing framebuffer attachments * Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA) * Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight) Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different * Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc) * Check if the subgroup size is supported before passing a explicit size * Only enable ShaderFloat64 if the GPU supports it * We don't need to recompile shaders if alpha test state changed but alpha test is disabled * Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before) * Fix pipeline state saving before it is updated. This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache. * Allow null samplers on OpenGL backend * _unit0Sampler should be set only for binding 0 * Remove unused PipelineConverter format variable (was causing IOR) * Raise textures limit to 64 on Vulkan * No need to pack the shader binaries if shader cache is disabled * Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL * Do not clear unbound framebuffer color attachments * Geometry shader passthrough emulation * Consolidate UpdateDepthMode and GetDepthMode implementation * Fix A1B5G5R5 texture format and support R4G4 on Vulkan * Add barrier before use of some modified images * Report 32 bit query result on AMD windows (smo issue) * Add texture recompression support (disabled for now) It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures * Do not report R4G4 format as supported on Vulkan It was causing mario head to become white on Super Mario 64 (???) * Improvements to -1 to 1 depth mode. - Transformation is only applied on the last stage in the vertex pipeline. - Should fix some issues with geometry and tessellation (hopefully) - Reading back FragCoord Z on fragment will transform back to -1 to 1. * Geometry Shader index count from ThreadsPerInputPrimitive Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL * Remove gl_FragDepth scaling This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade. * Add Gl StencilOp enum values to Vulkan * Update guest cache to v1.1 (due to specialization state changes) This will explode your shader cache from earlier vulkan build, but it must be done. 😔 * Vulkan/SPIR-V support for viewport inverse * Fix typo * Don't create query pools for unsupported query types * Return of the Vector Indexing Bug One day, everyone will get this right. * Check for transform feedback query support Sometimes transform feedback is supported without the query type. * Fix gl_FragCoord.z transformation FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs. Fixes Pokemon Sword/Shield, possibly others. * Fix Avalonia Rebase Vulkan is currently not available on Avalonia, but the build does work and you can use opengl. * Fix headless build * Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host * Fix BCn 4/5 conversion, GetTextureTarget BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect. GetTextureTarget was not creating a view with the replacement format. * Fix dependency * Fix inverse viewport transform vector type on SPIR-V * Do not require null descriptors support * If MultiViewport is not supported, do not try to set more than one viewport/scissor * Bounds check on bitmap add. * Flush queries on attachment change rather than program change Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending. Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets. * Add support for avalonia (#6) * add avalonia support * only lock around skia flush * addressed review * cleanup * add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check * fix getting window handle on linux. skip render is size is 0 * Combine non-buffer with buffer image descriptor sets * Support multisample texture copy with automatic resolve on Vulkan * Remove old CompileShader methods from the Vulkan backend * Add minimal pipeline layouts that only contains used bindings They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders * Pre-compile helper shader as SPIR-V, and some fixes * Remove pre-compiled shaderc binary for Windows as its no longer needed by default * Workaround RADV crash Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist". * Use RobustBufferAccess on NVIDIA gpus Avoids the SMO waterfall triangle on older NVIDIA gpus. * Implement GPU selector and expose texture recompression on the UI and config * Fix and enable background compute shader compilation Also disables warnings from shader cache pipeline misses. * Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added * If S8D24 is not supported, use D32FS8 * Ensure all fences are destroyed on dispose * Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks * Add missing clear layer parameter after rebase * Use selected gpu from config for avalonia (#7) * use configured device * address review * Fix D32S8 copy workaround (AMD) Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things. * Use push descriptors for uniform buffer updates (disabled for now) * Push descriptor support check, buffer redundancy checks Should make push descriptors faster, needs more testing though. * Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs * Adjust bindings array sizes * Force submit command buffers if memory in use by its resources is high * Add workaround for AMD GCN cubemap view sins `ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue. This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry) Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there. * Add mobile, non-RX variants to the GCN regex. Also make sure that the 3 digit ones only include numbers starting with 7 or 8. * Increase image limit per stage from 8 to 16 Xenoblade Chronicles 2 was hiting the limit of 8 * Minor code cleanup * Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer * Add gpu selector to Avalonia (#8) * Add gpu selector to avalonia settings * show backend label on window * some fixes * address review * Minor changes to the Avalonia UI * Update graphics window UI and locales. (#9) * Update xaml and update locales * locale updates Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish? * Fix locales with more (?) correct translations. * add separator to render widget * fix spanish and portuguese * Add new IdList, replaces buffer list that could not remove elements and had unbounded growth * Don't crash the settings window if Vulkan is not supported * Fix Actions menu not being clickable on GTK UI after relaunch * Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer * Fix IdList and make it not thread safe * Revert useless OpenGL format table changes * Fix headless project build * List throws ArgumentOutOfRangeException * SPIR-V: Fix tessellation * Increase shader cache version due to tessellation fix * Reduce number of Sync objects created (improves perf in some specific titles) * Fix vulkan validation errors for NPOT compressed upload and GCN workaround. * Add timestamp to the shader cache and force rebuild if host cache is outdated * Prefer Mail box present mode for popups (#11) * Prefer Mail box present mode * fix debug * switch present mode when vsync is toggled * only disable vsync on the main window * SPIR-V: Fix geometry shader input load with transform feedback * BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0 * Fix Avalonia build * Address initial PR feedback * Only set transform feedback outputs on last vertex stage * Address riperiperi PR feedback * Remove outdated comment * Remove unused constructor * Only throw for negative results * Throw for QueueSubmit and other errors No point in delaying the inevitable * Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler * Fix some resolution scale issues * No need for two UpdateScale calls * Fix comments on SPIR-V generator project * Try to fix shader local memory size On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region? * Remove RectangleF that is now unused * Fix ImageGather with multiple offsets Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct` * Address PR feedback from jD in all projects except Avalonia * Address most of jD PR feedback on Avalonia * Remove unsafe * Fix VulkanSkiaGpu * move present mode request out of Create Swapchain method * split more parts of create swapchain * addressed reviews * addressed review * Address second batch of jD PR feedback * Fix buffer <-> image copy row length and height alignment AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes * Better fix for NPOT alignment issue * Use switch expressions on Vulkan EnumConversion Thanks jD * Fix Avalonia build * Add Vulkan selection prompt on startup * Grammar fixes on Vulkan prompt message * Add missing Vulkan migration flag Co-authored-by: riperiperi <rhy3756547@hotmail.com> Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com> Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>	2022-07-31 18:26:06 -03:00

42 commits