Author Topic: FXAA in FSO (Read 40102 times)

The E · **on:** May 17, 2011, 09:01:29 am

Okay, as some of you may be aware, it is not possible to have both antialiasing and post-processing active at the same time.
However, in recent months, techniques have surfaced that turn Antialiasing into a post-processing stage. While ATI has chosen to implement one such technique (called MLAA, Morphological AntiAliasing), NVidia has countered with something they call FXAA (Fast Approximate Antialiasing). NV's solution has the advantage of being a) a bit faster and b) being a hell of a lot easier to implement. So that is what I did. Thus, I proudly present FXAA for FSO. Test builds will be available soon, for now, here's the code patch to enable it as well as the necessary shaders.

One note: Even though this technique was developed by NVidia, it is usable on ATI/AMD cards as well.

EDIT: Code patch removed. Code has been committed to trunk.

These are the shaders you will need to use:

fxaa-v.sdr:

Code: [Select]

#extension GL_EXT_gpu_shader4 : enable

noperspective varying vec2 pos;
uniform float rt_w;
uniform float rt_h;
varying vec2 rcpFrame;

void main() { 
	gl_Position = gl_Vertex; 
	
	rcpFrame = vec2(1.0/rt_w, 1.0/rt_h);
	
	pos = gl_Vertex.xy*0.5 + 0.5; 
}

fxaa-f.sdr:

Code: [Select]

#extension GL_EXT_gpu_shader4 : enable

// Copyright for FXAA Source
//
// Copyright (c) 2010 NVIDIA Corporation. All rights reserved.
//
// TO  THE MAXIMUM  EXTENT PERMITTED  BY APPLICABLE  LAW, THIS SOFTWARE  IS PROVIDED
// *AS IS*  AND NVIDIA AND  ITS SUPPLIERS DISCLAIM  ALL WARRANTIES,  EITHER  EXPRESS
// OR IMPLIED, INCLUDING, BUT NOT LIMITED  TO, IMPLIED WARRANTIES OF MERCHANTABILITY
// AND FITNESS FOR A PARTICULAR PURPOSE.  IN NO EVENT SHALL  NVIDIA OR ITS SUPPLIERS
// BE  LIABLE  FOR  ANY  SPECIAL,  INCIDENTAL,  INDIRECT,  OR  CONSEQUENTIAL DAMAGES
// WHATSOEVER (INCLUDING, WITHOUT LIMITATION,  DAMAGES FOR LOSS OF BUSINESS PROFITS,
// BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR ANY OTHER PECUNIARY LOSS)
// ARISING OUT OF THE  USE OF OR INABILITY  TO USE THIS SOFTWARE, EVEN IF NVIDIA HAS
// BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

#define FXAA_GLSL_120 1

/*============================================================================
                                    FXAA                                 
============================================================================*/
 
/*============================================================================
                                 API PORTING
============================================================================*/
#ifndef     FXAA_GLSL_120
    #define FXAA_GLSL_120 0
#endif
#ifndef     FXAA_GLSL_130
    #define FXAA_GLSL_130 0
#endif

#define int2 ivec2
#define float2 vec2
#define float3 vec3
#define float4 vec4
#define FxaaBool3 bvec3
#define FxaaInt2 ivec2
#define FxaaFloat2 vec2
#define FxaaFloat3 vec3
#define FxaaFloat4 vec4
#define FxaaBool2Float(a) mix(0.0, 1.0, (a))
#define FxaaPow3(x, y) pow(x, y)
#define FxaaSel3(f, t, b) mix((f), (t), (b))
#define FxaaTex sampler2D
/*--------------------------------------------------------------------------*/
#if FXAA_GLSL_120
    // Requires "#version 120" or better
    #define FxaaTexLod0(t, p) texture2DLod(t, p, 0.0)
    #define FxaaTexOff(t, p, o, r) texture2DLodOffset(t, p, 0.0, o)
#endif
/*--------------------------------------------------------------------------*/
#if FXAA_GLSL_130
    // Requires "#version 130" or better
    #define FxaaTexLod0(t, p) textureLod(t, p, 0.0)
    #define FxaaTexOff(t, p, o, r) textureLodOffset(t, p, 0.0, o)
#endif
/*--------------------------------------------------------------------------*/

#define FxaaToFloat3(a) FxaaFloat3((a), (a), (a))
float4 FxaaTexGrad(FxaaTex tex, float2 pos, float2 grad) {
    #if FXAA_GLSL_120
        return texture2DGrad(tex, pos.xy, grad, grad);
    #endif
    #if FXAA_GLSL_130
        return textureGrad(tex, pos.xy, grad, grad);
    #endif
}

/*============================================================================
                                 SRGB KNOBS
------------------------------------------------------------------------------
FXAA_SRGB_ROP - Set to 1 when applying FXAA to an sRGB back buffer (DX10/11).
                This will do the sRGB to linear transform, 
                as ROP will expect linear color from this shader,
                and this shader works in non-linear color.
============================================================================*/
#define FXAA_SRGB_ROP 0

/*============================================================================
                                DEBUG KNOBS
------------------------------------------------------------------------------
All debug knobs draw FXAA-untouched pixels in FXAA computed luma (monochrome).
 
FXAA_DEBUG_PASSTHROUGH - Red for pixels which are filtered by FXAA with a
                         yellow tint on sub-pixel aliasing filtered by FXAA.
FXAA_DEBUG_HORZVERT    - Blue for horizontal edges, gold for vertical edges. 
FXAA_DEBUG_PAIR        - Blue/green for the 2 pixel pair choice. 
FXAA_DEBUG_NEGPOS      - Red/blue for which side of center of span.
FXAA_DEBUG_OFFSET      - Red/blue for -/+ x, gold/skyblue for -/+ y.
============================================================================*/
#ifndef     FXAA_DEBUG_PASSTHROUGH
    #define FXAA_DEBUG_PASSTHROUGH 0
#endif    
#ifndef     FXAA_DEBUG_HORZVERT
    #define FXAA_DEBUG_HORZVERT    0
#endif    
#ifndef     FXAA_DEBUG_PAIR   
    #define FXAA_DEBUG_PAIR        0
#endif    
#ifndef     FXAA_DEBUG_NEGPOS
    #define FXAA_DEBUG_NEGPOS      0
#endif
#ifndef     FXAA_DEBUG_OFFSET
    #define FXAA_DEBUG_OFFSET      0
#endif    
/*--------------------------------------------------------------------------*/
#if FXAA_DEBUG_PASSTHROUGH || FXAA_DEBUG_HORZVERT || FXAA_DEBUG_PAIR
    #define FXAA_DEBUG 1
#endif    
#if FXAA_DEBUG_NEGPOS || FXAA_DEBUG_OFFSET
    #define FXAA_DEBUG 1
#endif
#ifndef FXAA_DEBUG
    #define FXAA_DEBUG 0
#endif
  
/*============================================================================
                              COMPILE-IN KNOBS
------------------------------------------------------------------------------
FXAA_PRESET - Choose compile-in knob preset 0-5.
------------------------------------------------------------------------------
FXAA_EDGE_THRESHOLD - The minimum amount of local contrast required 
                      to apply algorithm.
                      1.0/3.0  - too little
                      1.0/4.0  - good start
                      1.0/8.0  - applies to more edges
                      1.0/16.0 - overkill
------------------------------------------------------------------------------
FXAA_EDGE_THRESHOLD_MIN - Trims the algorithm from processing darks.
                          Perf optimization.
                          1.0/32.0 - visible limit (smaller isn't visible)
                          1.0/16.0 - good compromise
                          1.0/12.0 - upper limit (seeing artifacts)
------------------------------------------------------------------------------
FXAA_SEARCH_STEPS - Maximum number of search steps for end of span.
------------------------------------------------------------------------------
FXAA_SEARCH_ACCELERATION - How much to accelerate search,
                           1 - no acceleration
                           2 - skip by 2 pixels
                           3 - skip by 3 pixels
                           4 - skip by 4 pixels
------------------------------------------------------------------------------
FXAA_SEARCH_THRESHOLD - Controls when to stop searching.
                        1.0/4.0 - seems to be the best quality wise
------------------------------------------------------------------------------
FXAA_SUBPIX_FASTER - Turn on lower quality but faster subpix path.
                     Not recomended, but used in preset 0.
------------------------------------------------------------------------------
FXAA_SUBPIX - Toggle subpix filtering.
              0 - turn off
              1 - turn on
              2 - turn on full (ignores FXAA_SUBPIX_TRIM and CAP)
------------------------------------------------------------------------------
FXAA_SUBPIX_TRIM - Controls sub-pixel aliasing removal.
                   1.0/2.0 - low removal
                   1.0/3.0 - medium removal
                   1.0/4.0 - default removal
                   1.0/8.0 - high removal
                   0.0 - complete removal
------------------------------------------------------------------------------
FXAA_SUBPIX_CAP - Insures fine detail is not completely removed.
                  This is important for the transition of sub-pixel detail,
                  like fences and wires.
                  3.0/4.0 - default (medium amount of filtering)
                  7.0/8.0 - high amount of filtering
                  1.0 - no capping of sub-pixel aliasing removal
============================================================================*/

float FXAA_EDGE_THRESHOLD		= (1.0/8.0);
float FXAA_EDGE_THRESHOLD_MIN	= (1.0/24.0);
int   FXAA_SEARCH_STEPS			= 16;  
int   FXAA_SEARCH_ACCELERATION	= 1;
float FXAA_SEARCH_THRESHOLD		= (1.0/4.0);
int   FXAA_SUBPIX				= 1;
int   FXAA_SUBPIX_FASTER		= 0;
float FXAA_SUBPIX_CAP			= (3.0/4.0);
float FXAA_SUBPIX_TRIM			= (1.0/4.0);
float FXAA_SUBPIX_TRIM_SCALE	= (1.0/0.75);    

void FXAA_set_preset(int preset) {

	if (preset > 6)
		preset = 6;
/*--------------------------------------------------------------------------*/
	if  (preset == 0) {
		FXAA_EDGE_THRESHOLD      = (1.0/4.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/12.0);
		FXAA_SEARCH_STEPS        = 2;
		FXAA_SEARCH_ACCELERATION = 4;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 1;
		FXAA_SUBPIX_CAP          = (2.0/3.0);
		FXAA_SUBPIX_TRIM         = (1.0/4.0);
	}
	/*--------------------------------------------------------------------------*/
	else if  (preset == 1) {
		FXAA_EDGE_THRESHOLD      = (1.0/8.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/16.0);
		FXAA_SEARCH_STEPS        = 4;
		FXAA_SEARCH_ACCELERATION = 3;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 0;
		FXAA_SUBPIX_CAP          = (3.0/4.0);
		FXAA_SUBPIX_TRIM         = (1.0/4.0);
	}
	/*--------------------------------------------------------------------------*/
	else if  (preset == 2) {
		FXAA_EDGE_THRESHOLD      = (1.0/8.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/24.0);
		FXAA_SEARCH_STEPS        = 8;
		FXAA_SEARCH_ACCELERATION = 2;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 0;
		FXAA_SUBPIX_CAP          = (3.0/4.0);
		FXAA_SUBPIX_TRIM         = (1.0/4.0);
	}
	/*--------------------------------------------------------------------------*/
	else if  (preset == 3) {
		FXAA_EDGE_THRESHOLD      = (1.0/8.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/24.0);
		FXAA_SEARCH_STEPS        = 16;
		FXAA_SEARCH_ACCELERATION = 1;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 0;
		FXAA_SUBPIX_CAP          = (3.0/4.0);
		FXAA_SUBPIX_TRIM         = (1.0/4.0);
	}
	/*--------------------------------------------------------------------------*/
	else if  (preset == 4) {
		FXAA_EDGE_THRESHOLD      = (1.0/8.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/24.0);
		FXAA_SEARCH_STEPS        = 24;
		FXAA_SEARCH_ACCELERATION = 1;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 0;
		FXAA_SUBPIX_CAP          = (3.0/4.0);
		FXAA_SUBPIX_TRIM         = (1.0/4.0);
	}
	/*--------------------------------------------------------------------------*/
	else if  (preset == 5) {
		FXAA_EDGE_THRESHOLD      = (1.0/8.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/24.0);
		FXAA_SEARCH_STEPS        = 32;
		FXAA_SEARCH_ACCELERATION = 1;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 0;
		FXAA_SUBPIX_CAP          = (7.0/8.0);
		FXAA_SUBPIX_TRIM         = (1.0/8.0);
	}
	/*--------------------------------------------------------------------------*/
	else if  (preset == 6) {
		FXAA_EDGE_THRESHOLD      = (1.0/12.0);
		FXAA_EDGE_THRESHOLD_MIN  = (1.0/24.0);
		FXAA_SEARCH_STEPS        = 32;
		FXAA_SEARCH_ACCELERATION = 1;
		FXAA_SEARCH_THRESHOLD    = (1.0/4.0);
		FXAA_SUBPIX              = 1;
		FXAA_SUBPIX_FASTER       = 0;
		FXAA_SUBPIX_CAP          = (1.0);
		FXAA_SUBPIX_TRIM         = (0.0);
	}
	/*--------------------------------------------------------------------------*/
	FXAA_SUBPIX_TRIM_SCALE = (1.0/(1.0 - FXAA_SUBPIX_TRIM));

}

/*============================================================================
                                   HELPERS
============================================================================*/
// Return the luma, the estimation of luminance from rgb inputs.
// This approximates luma using one FMA instruction,
// skipping normalization and tossing out blue.
// FxaaLuma() will range 0.0 to 2.963210702.
float FxaaLuma(float3 rgb) {
    return rgb.y * (0.587/0.299) + rgb.x; } 
/*--------------------------------------------------------------------------*/
float3 FxaaLerp3(float3 a, float3 b, float amountOfA) {
    return (FxaaToFloat3(-amountOfA) * b) + 
        ((a * FxaaToFloat3(amountOfA)) + b); } 
/*--------------------------------------------------------------------------*/
// Support any extra filtering before returning color.
float3 FxaaFilterReturn(float3 rgb) {
    #if FXAA_SRGB_ROP
        // Do sRGB encoded value to linear conversion.
        return FxaaSel3(
            rgb * FxaaToFloat3(1.0/12.92), 
            FxaaPow3(
                rgb * FxaaToFloat3(1.0/1.055) + FxaaToFloat3(0.055/1.055), 
                FxaaToFloat3(2.4)),
            rgb > FxaaToFloat3(0.04045)); 
    #else
        return rgb;
    #endif
}
 
/*============================================================================
                                PIXEL SHADER
============================================================================*/
float3 FxaaPixelShader(
// Output of FxaaVertexShader interpolated across screen.
//  xy -> actual texture position {0.0 to 1.0}
float2 pos,
// Input texture.
FxaaTex tex,
// RCPFRAME SHOULD PIXEL SHADER CONSTANTS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// {1.0/frameWidth, 1.0/frameHeight}
float2 rcpFrame) {
    
/*----------------------------------------------------------------------------
            EARLY EXIT IF LOCAL CONTRAST BELOW EDGE DETECT LIMIT
------------------------------------------------------------------------------
Majority of pixels of a typical image do not require filtering, 
often pixels are grouped into blocks which could benefit from early exit 
right at the beginning of the algorithm. 
Given the following neighborhood, 
 
      N   
    W M E
      S   
    
If the difference in local maximum and minimum luma (contrast "range") 
is lower than a threshold proportional to the maximum local luma ("rangeMax"), 
then the shader early exits (no visible aliasing). 
This threshold is clamped at a minimum value ("FXAA_EDGE_THRESHOLD_MIN")
to avoid processing in really dark areas.    
----------------------------------------------------------------------------*/
    float3 rgbN = FxaaTexOff(tex, pos.xy, FxaaInt2( 0,-1), rcpFrame).xyz;
    float3 rgbW = FxaaTexOff(tex, pos.xy, FxaaInt2(-1, 0), rcpFrame).xyz;
    float3 rgbM = FxaaTexOff(tex, pos.xy, FxaaInt2( 0, 0), rcpFrame).xyz;
    float3 rgbE = FxaaTexOff(tex, pos.xy, FxaaInt2( 1, 0), rcpFrame).xyz;
    float3 rgbS = FxaaTexOff(tex, pos.xy, FxaaInt2( 0, 1), rcpFrame).xyz;
    float lumaN = FxaaLuma(rgbN);
    float lumaW = FxaaLuma(rgbW);
    float lumaM = FxaaLuma(rgbM);
    float lumaE = FxaaLuma(rgbE);
    float lumaS = FxaaLuma(rgbS);
    float rangeMin = min(lumaM, min(min(lumaN, lumaW), min(lumaS, lumaE)));
    float rangeMax = max(lumaM, max(max(lumaN, lumaW), max(lumaS, lumaE)));
    float range = rangeMax - rangeMin;
    #if FXAA_DEBUG
        float lumaO = lumaM / (1.0 + (0.587/0.299));
    #endif        
    if(range < max(FXAA_EDGE_THRESHOLD_MIN, rangeMax * FXAA_EDGE_THRESHOLD)) {
        #if FXAA_DEBUG
            return FxaaFilterReturn(FxaaToFloat3(lumaO));
        #endif
        return FxaaFilterReturn(rgbM); 
		
	}
	
	float3 rgbL = rgbN + rgbW + rgbM + rgbE + rgbS;
	
    if (FXAA_SUBPIX > 0) {
        if (FXAA_SUBPIX_FASTER != 0) {
            rgbL *= FxaaToFloat3(1.0/5.0);
		}
    }      
    
/*----------------------------------------------------------------------------
                               COMPUTE LOWPASS
------------------------------------------------------------------------------
FXAA computes a local neighborhood lowpass value as follows,
 
  (N + W + E + S)/4
  
Then uses the ratio of the contrast range of the lowpass 
and the range found in the early exit check, 
as a sub-pixel aliasing detection filter. 
When FXAA detects sub-pixel aliasing (such as single pixel dots), 
it later blends in "blendL" amount 
of a lowpass value (computed in the next section) to the final result.
----------------------------------------------------------------------------*/
    
	float blendL = 0.0;
	
	if (FXAA_SUBPIX > 0) {
        float lumaL = (lumaN + lumaW + lumaE + lumaS) * 0.25;
        float rangeL = abs(lumaL - lumaM);
        
		if (FXAA_SUBPIX == 1) {
			blendL = max(0.0, 
				(rangeL / range) - FXAA_SUBPIX_TRIM) * FXAA_SUBPIX_TRIM_SCALE; 
			blendL = min(FXAA_SUBPIX_CAP, blendL);
		}
		if (FXAA_SUBPIX == 2) {
			blendL = rangeL / range; 
		}
		#if FXAA_DEBUG_PASSTHROUGH
			if (FXAA_SUBPIX == 0) {
				blendL = 0.0;
			}
			return FxaaFilterReturn(
				FxaaFloat3(1.0, blendL/FXAA_SUBPIX_CAP, 0.0));
		#endif   
	}	
    
/*----------------------------------------------------------------------------
                    CHOOSE VERTICAL OR HORIZONTAL SEARCH
------------------------------------------------------------------------------
FXAA uses the following local neighborhood,
 
    NW N NE
    W  M  E
    SW S SE
    
To compute an edge amount for both vertical and horizontal directions.
Note edge detect filters like Sobel fail on single pixel lines through M.
FXAA takes the weighted average magnitude of the high-pass values 
for rows and columns as an indication of local edge amount.
 
A lowpass value for anti-sub-pixel-aliasing is computed as 
    (N+W+E+S+M+NW+NE+SW+SE)/9.
This full box pattern has higher quality than other options.
 
Note following this block, both vertical and horizontal cases
flow in parallel (reusing the horizontal variables).
----------------------------------------------------------------------------*/
    float3 rgbNW = FxaaTexOff(tex, pos.xy, FxaaInt2(-1,-1), rcpFrame).xyz;
    float3 rgbNE = FxaaTexOff(tex, pos.xy, FxaaInt2( 1,-1), rcpFrame).xyz;
    float3 rgbSW = FxaaTexOff(tex, pos.xy, FxaaInt2(-1, 1), rcpFrame).xyz;
    float3 rgbSE = FxaaTexOff(tex, pos.xy, FxaaInt2( 1, 1), rcpFrame).xyz;
    if ((FXAA_SUBPIX_FASTER == 0) && (FXAA_SUBPIX > 0)) {
        rgbL += (rgbNW + rgbNE + rgbSW + rgbSE);
        rgbL *= FxaaToFloat3(1.0/9.0);
    }
    float lumaNW = FxaaLuma(rgbNW);
    float lumaNE = FxaaLuma(rgbNE);
    float lumaSW = FxaaLuma(rgbSW);
    float lumaSE = FxaaLuma(rgbSE);
    float edgeVert = 
        abs((0.25 * lumaNW) + (-0.5 * lumaN) + (0.25 * lumaNE)) +
        abs((0.50 * lumaW ) + (-1.0 * lumaM) + (0.50 * lumaE )) +
        abs((0.25 * lumaSW) + (-0.5 * lumaS) + (0.25 * lumaSE));
    float edgeHorz = 
        abs((0.25 * lumaNW) + (-0.5 * lumaW) + (0.25 * lumaSW)) +
        abs((0.50 * lumaN ) + (-1.0 * lumaM) + (0.50 * lumaS )) +
        abs((0.25 * lumaNE) + (-0.5 * lumaE) + (0.25 * lumaSE));
    bool horzSpan = edgeHorz >= edgeVert;
    #if FXAA_DEBUG_HORZVERT
        if(horzSpan) return FxaaFilterReturn(FxaaFloat3(1.0, 0.75, 0.0));
        else         return FxaaFilterReturn(FxaaFloat3(0.0, 0.50, 1.0));
    #endif
    float lengthSign = horzSpan ? -rcpFrame.y : -rcpFrame.x;
    if(!horzSpan) lumaN = lumaW;
    if(!horzSpan) lumaS = lumaE;
    float gradientN = abs(lumaN - lumaM);
    float gradientS = abs(lumaS - lumaM);
    lumaN = (lumaN + lumaM) * 0.5;
    lumaS = (lumaS + lumaM) * 0.5;
    
/*----------------------------------------------------------------------------
                CHOOSE SIDE OF PIXEL WHERE GRADIENT IS HIGHEST
------------------------------------------------------------------------------
This chooses a pixel pair. 
For "horzSpan == true" this will be a vertical pair,
 
    [N]     N
    [M] or [M]
     S     [S]
 
Note following this block, both {N,M} and {S,M} cases
flow in parallel (reusing the {N,M} variables).
 
This pair of image rows or columns is searched below
in the positive and negative direction 
until edge status changes 
(or the maximum number of search steps is reached).
----------------------------------------------------------------------------*/    
    bool pairN = gradientN >= gradientS;
    #if FXAA_DEBUG_PAIR
        if(pairN) return FxaaFilterReturn(FxaaFloat3(0.0, 0.0, 1.0));
        else      return FxaaFilterReturn(FxaaFloat3(0.0, 1.0, 0.0));
    #endif
    if(!pairN) lumaN = lumaS;
    if(!pairN) gradientN = gradientS;
    if(!pairN) lengthSign *= -1.0;
    float2 posN;
    posN.x = pos.x + (horzSpan ? 0.0 : lengthSign * 0.5);
    posN.y = pos.y + (horzSpan ? lengthSign * 0.5 : 0.0);
    
/*----------------------------------------------------------------------------
                         CHOOSE SEARCH LIMITING VALUES
------------------------------------------------------------------------------
Search limit (+/- gradientN) is a function of local gradient.
----------------------------------------------------------------------------*/
    gradientN *= FXAA_SEARCH_THRESHOLD;
    
/*----------------------------------------------------------------------------
    SEARCH IN BOTH DIRECTIONS UNTIL FIND LUMA PAIR AVERAGE IS OUT OF RANGE
------------------------------------------------------------------------------
This loop searches either in vertical or horizontal directions,
and in both the negative and positive direction in parallel.
This loop fusion is faster than searching separately.
 
The search is accelerated using FXAA_SEARCH_ACCELERATION length box filter
via anisotropic filtering with specified texture gradients.
----------------------------------------------------------------------------*/
    float2 posP = posN;
    float2 offNP = horzSpan ? 
        FxaaFloat2(rcpFrame.x, 0.0) :
        FxaaFloat2(0.0f, rcpFrame.y); 
    float lumaEndN = lumaN;
    float lumaEndP = lumaN;
    bool doneN = false;
    bool doneP = false;
    if (FXAA_SEARCH_ACCELERATION == 1) {
        posN += offNP * FxaaFloat2(-1.0, -1.0);
        posP += offNP * FxaaFloat2( 1.0,  1.0);
    }
    if (FXAA_SEARCH_ACCELERATION == 2 ) {
        posN += offNP * FxaaFloat2(-1.5, -1.5);
        posP += offNP * FxaaFloat2( 1.5,  1.5);
        offNP *= FxaaFloat2(2.0, 2.0);
    }
    if (FXAA_SEARCH_ACCELERATION == 3) {
        posN += offNP * FxaaFloat2(-2.0, -2.0);
        posP += offNP * FxaaFloat2( 2.0,  2.0);
        offNP *= FxaaFloat2(3.0, 3.0);
    }
    if (FXAA_SEARCH_ACCELERATION == 4) {
        posN += offNP * FxaaFloat2(-2.5, -2.5);
        posP += offNP * FxaaFloat2( 2.5,  2.5);
        offNP *= FxaaFloat2(4.0, 4.0);
    }
    for(int i = 0; i < FXAA_SEARCH_STEPS; i++) {
        if (FXAA_SEARCH_ACCELERATION == 1) {
            if(!doneN) lumaEndN = 
                FxaaLuma(FxaaTexLod0(tex, posN.xy).xyz);
            if(!doneP) lumaEndP = 
                FxaaLuma(FxaaTexLod0(tex, posP.xy).xyz);
        } else {
            if(!doneN) lumaEndN = 
                FxaaLuma(FxaaTexGrad(tex, posN.xy, offNP).xyz);
            if(!doneP) lumaEndP = 
                FxaaLuma(FxaaTexGrad(tex, posP.xy, offNP).xyz);
        }
        doneN = doneN || (abs(lumaEndN - lumaN) >= gradientN);
        doneP = doneP || (abs(lumaEndP - lumaN) >= gradientN);
        if(doneN && doneP) break;
        if(!doneN) posN -= offNP;
        if(!doneP) posP += offNP; }
    
/*----------------------------------------------------------------------------
               HANDLE IF CENTER IS ON POSITIVE OR NEGATIVE SIDE 
------------------------------------------------------------------------------
FXAA uses the pixel's position in the span 
in combination with the values (lumaEnd*) at the ends of the span,
to determine filtering.
 
This step computes which side of the span the pixel is on. 
On negative side if dstN < dstP,
 
     posN        pos                      posP
      |-----------|------|------------------|
      |           |      |                  | 
      |<--dstN--->|<---------dstP---------->|
                         |
                    span center
                    
----------------------------------------------------------------------------*/
    float dstN = horzSpan ? pos.x - posN.x : pos.y - posN.y;
    float dstP = horzSpan ? posP.x - pos.x : posP.y - pos.y;
    bool directionN = dstN < dstP;
    #if FXAA_DEBUG_NEGPOS
        if(directionN) return FxaaFilterReturn(FxaaFloat3(1.0, 0.0, 0.0));
        else           return FxaaFilterReturn(FxaaFloat3(0.0, 0.0, 1.0));
    #endif
    lumaEndN = directionN ? lumaEndN : lumaEndP;
    
/*----------------------------------------------------------------------------
         CHECK IF PIXEL IS IN SECTION OF SPAN WHICH GETS NO FILTERING
------------------------------------------------------------------------------
If both the pair luma at the end of the span (lumaEndN) 
and middle pixel luma (lumaM)
are on the same side of the middle pair average luma (lumaN),
then don't filter.
 
Cases,
 
(1.) "L",
  
               lumaM
                 |
                 V    XXXXXXXX <- other line averaged
         XXXXXXX[X]XXXXXXXXXXX <- source pixel line
        |      .      | 
    --------------------------                    
       [ ]xxxxxx[x]xx[X]XXXXXX <- pair average
    --------------------------           
        ^      ^ ^    ^
        |      | |    |
        .      |<---->|<---------- no filter region
        .      | |    |
        . center |    |
        .        |  lumaEndN 
        .        |    .
        .      lumaN  .
        .             .
        |<--- span -->|
        
                        
(2.) "^" and "-",
  
                               <- other line averaged
          XXXXX[X]XXX          <- source pixel line
         |     |     | 
    --------------------------                    
        [ ]xxxx[x]xx[ ]        <- pair average
    --------------------------           
         |     |     |
         |<--->|<--->|<---------- filter both sides
 
 
(3.) "v" and inverse of "-",
  
    XXXXXX           XXXXXXXXX <- other line averaged
    XXXXXXXXXXX[X]XXXXXXXXXXXX <- source pixel line
         |     |     |
    --------------------------                    
    XXXX[X]xxxx[x]xx[X]XXXXXXX <- pair average
    --------------------------           
         |     |     |
         |<--->|<--->|<---------- don't filter both!
 
         
Note the "v" case for FXAA requires no filtering.
This is because the inverse of the "-" case is the "v".
Filtering "v" case turns open spans like this,
 
    XXXXXXXXX
    
Into this (which is not desired),
 
    x+.   .+x
    XXXXXXXXX
 
----------------------------------------------------------------------------*/
    if(((lumaM - lumaN) < 0.0) == ((lumaEndN - lumaN) < 0.0)) 
        lengthSign = 0.0;
 
/*----------------------------------------------------------------------------
                COMPUTE SUB-PIXEL OFFSET AND FILTER SPAN
------------------------------------------------------------------------------
FXAA filters using a bilinear texture fetch offset 
from the middle pixel M towards the center of the pair (NM below).
Maximum filtering will be half way between pair.
Reminder, at this point in the code, 
the {N,M} pair is also reused for all cases: {S,M}, {W,M}, and {E,M}.
 
    +-------+
    |       |    0.5 offset
    |   N   |     |
    |       |     V
    +-------+....---
    |       |
    |   M...|....---
    |       |     ^
    +-------+     |
    .       .    0.0 offset
    .   S   .
    .       .
    .........
 
Position on span is used to compute sub-pixel filter offset using simple ramp,
 
             posN           posP
              |\             |<------- 0.5 pixel offset into pair pixel
              | \            |
              |  \           |
    ---.......|...\..........|<------- 0.25 pixel offset into pair pixel
     ^        |   ^\         |
     |        |   | \        |
     V        |   |  \       |
    ---.......|===|==========|<------- 0.0 pixel offset (ie M pixel)
     ^        .   |   ^      .
     |        .  pos  |      .
     |        .   .   |      .
     |        .   . center   .
     |        .   .          .
     |        |<->|<---------.-------- dstN
     |        .   .          .    
     |        .   |<-------->|<------- dstP    
     |        .             .
     |        |<------------>|<------- spanLength    
     |
    subPixelOffset
    
----------------------------------------------------------------------------*/
    float spanLength = (dstP + dstN);
    dstN = directionN ? dstN : dstP;
    float subPixelOffset = (0.5 + (dstN * (-1.0/spanLength))) * lengthSign;
    #if FXAA_DEBUG_OFFSET
        float ox = horzSpan ? 0.0 : subPixelOffset*2.0/rcpFrame.x;
        float oy = horzSpan ? subPixelOffset*2.0/rcpFrame.y : 0.0;
        if(ox < 0.0) return FxaaFilterReturn(
            FxaaLerp3(FxaaToFloat3(lumaO), 
                      FxaaFloat3(1.0, 0.0, 0.0), -ox));
        if(ox > 0.0) return FxaaFilterReturn(
            FxaaLerp3(FxaaToFloat3(lumaO), 
                      FxaaFloat3(0.0, 0.0, 1.0),  ox));
        if(oy < 0.0) return FxaaFilterReturn(
            FxaaLerp3(FxaaToFloat3(lumaO), 
                      FxaaFloat3(1.0, 0.6, 0.2), -oy));
        if(oy > 0.0) return FxaaFilterReturn(
            FxaaLerp3(FxaaToFloat3(lumaO), 
                      FxaaFloat3(0.2, 0.6, 1.0),  oy));
        return FxaaFilterReturn(FxaaFloat3(lumaO, lumaO, lumaO));
    #endif
    float3 rgbF = FxaaTexLod0(tex, FxaaFloat2(
        pos.x + (horzSpan ? 0.0 : subPixelOffset),
        pos.y + (horzSpan ? subPixelOffset : 0.0))).xyz;
    if (FXAA_SUBPIX == 0) {
        return FxaaFilterReturn(rgbF); 
    } else {    
        return FxaaFilterReturn(FxaaLerp3(rgbL, rgbF, blendL)); 
    }
}

uniform sampler2D tex0;
uniform int fxaa_preset;
varying vec2 rcpFrame;
noperspective varying vec2 pos;

void main() {
	FXAA_set_preset(fxaa_preset);
	gl_FragColor.xyz = FxaaPixelShader(pos, tex0, rcpFrame); 
}

As evidence of this working, I present the following screenies:

FXAA off:

FXAA on:

EDIT: To enable FXAA, you need to: 1. Activate post-processing, 2. Enable the "-fxaa" commandline setting (Can be found in the Launchers "Graphics" feature tab).

EDIT AGAIN:
Further refinements have been made. You can now use the "-fxaa_preset" commandline argument to specify how strong FXAA will be. Currently, valid arguments go from 0 (fastest, but least amount of smoothing) to 6 (slowest, but a lot of smoothing), with 3 being the default value.
In addition, you can now go to the F3 lab and go through presets by pressing the number keys.

EDIT0RED:
Since the code (and default shaders) have been added to FSO trunk, it is recommended to use recent (read: post revision 7201) nightly builds to test this. The custom test builds have been removed.

CommanderDJ · **Reply #1 on:** May 17, 2011, 09:03:58 am

All hail the glorious code masters. Will definitely be keeping a close eye on this.

General Battuta · **Reply #2 on:** May 17, 2011, 09:04:58 am

It's subtle but it's definitely working.

The E · **Reply #3 on:** May 17, 2011, 09:14:02 am

Builds added to post.

Zacam · **Reply #4 on:** May 17, 2011, 09:30:00 am

Plays well with Driver Side AA. When I un-limit the FPS in FSO and set driver side vsync, I get solid 60fps either on or off.

I set Driver side AA. Then played with the "Hide Post Processing" check-box in the lab.
With FXAA on, the results are just a little behind what one get's when the post processing is turned off, but it has definitely overall improved the existing driver side AA and is noticeable in the same fashion that 4x is noticeable from 8x. And with objects in motion, that there is -any- AA at all in conjunction with the post-processing makes this an invaluable WIN any way you slice it.

Sushi · **Reply #5 on:** May 17, 2011, 10:36:54 am

Confirmed awesome.

Kolgena · **Reply #6 on:** May 17, 2011, 12:05:33 pm

Screenshots show that the performance hit is at least 50%. That's kind of intimidating XD

Still, it's excellent that AA is finally working. I'm very happy about this development.

The E · **Reply #7 on:** May 17, 2011, 12:10:40 pm

The performance hit you see there is not at all representative. Yes, there is one. But ingame, it is almost unnoticeable. Reason being that the FXAA algorithm has a constant run-time, much like any of the other post effects.

Herra Tohtori · **Reply #8 on:** May 17, 2011, 12:13:18 pm

Quote from: Kolgena on May 17, 2011, 12:05:33 pm

Screenshots show that the performance hit is at least 50%. That's kind of intimidating XD

VSync always drops the frame rate to a factor of display's native frame rate. If display can do 120 Hz, then when VSync is enabled but the computer is only capable of, let's say 100 FPS, the VSync throttles it down to 60 FPS - each frame is then shown for the duration of two frames in the display's native frequency.

This prevents switching of frames between the times when they can be changed - which can cause screen tearing, as the image is being drawn on the screen and suddenly changes into another image.

This doesn't mean the performance drop is 50%...

Kolgena · **Reply #9 on:** May 17, 2011, 12:41:37 pm

The_E never mentioned that his game was vsynced, so I assumed it wasn't. Also, 64 fps would be weird to cap at, instead of 60. I just know that the in-game FPS cap in ship lab is 120 fps. I have vysnc off and it never lets me get higher than 120, which is fine because my screen is 60hz.

By constant runtime, do you mean that the performance load stays relatively constant, therefore fps jitter is minimized?

The E · **Reply #10 on:** May 17, 2011, 12:45:53 pm

Yes. The algorithm's runtime depends entirely on your resolution and GPU speed, no other factors play into it.

Rodo · **Reply #11 on:** May 17, 2011, 12:51:44 pm

Cool stuff, it looks quite better in that pic... will be testing it

Quote from: Zacam on May 17, 2011, 09:30:00 am

When I un-limit the FPS in FSO

say wat?

The E · **Reply #12 on:** May 17, 2011, 04:30:13 pm

Ooookaaaay.

So.

The shaders posted above? Forget them. They were based on FXAA v2, aka the fast-and-dirty console version. For the really powerful v1 shaders, which are made for members of the PC GAMING MASTER RACE, look here:

EDIT: FILES REMOVED. Look at the first post in the thread for the current version.

DaBrain · **Reply #13 on:** May 17, 2011, 05:07:11 pm

This rocks!

Now Post Processing has now downsides anymore. I love it!

I'll try to run a complete test on this tomorrow.

Sushi · **Reply #14 on:** May 17, 2011, 06:01:29 pm

Quote from: The E on May 17, 2011, 04:30:13 pm

Ooookaaaay.

So.

The shaders posted above? Forget them. They were based on FXAA v2, aka the fast-and-dirty console version. For the really powerful v1 shaders, which are made for members of the PC GAMING MASTER RACE, look here:

So are these slower, but higher quality? Or faster AND higher quality?

Or just more pretentious? I'm cool with that too.

The E · **Reply #15 on:** May 17, 2011, 06:06:13 pm

They might be slower, but if I can't tell the difference on my machine, I'm not counting it. The main advantage these thingies have is that they are more configurable, which means that in the future, you'll be able to adjust the FXAA strength and other parameters on-the-fly.

Kolgena · **Reply #16 on:** May 17, 2011, 08:37:11 pm

Just curious: the result looks a heck of a lot like Crysis 2's AA filter. Is it the same thing?

(Also, I'd appreciate anything that could be done for performance. The current AA already drops my framerates by 10-20%, which is quite a bit when I can get as low as 22 fps without the FXAA)

ED: For curiosity sake, I did a bench to compare the hit of the ver1 FXAA shaders.

With FXAA: 30 fps
Same scene without FXAA: 77 fps

Ver2 FXAA shaders:

With FXAA: 58 fps
Same scene without FXAA: 76 fps

Core2Duo 2.4 GHz, Mobility 3650 HD @ 593/791. The hardware is ****tier than that found in a console, so I think I'll stick with my ver2 shaders, if at all

rscaper1070 · **Reply #17 on:** May 17, 2011, 11:27:36 pm

I thought these were just for NVidia cards. Does this work with ATI as well?

General Battuta · **Reply #18 on:** May 17, 2011, 11:29:36 pm

Quote from: rscaper1070 on May 17, 2011, 11:27:36 pm

I thought these were just for NVidia cards. Does this work with ATI as well?

Yes, it does.

On an HD 5850, works beautifully for me. Slight degradation in texture definition but a definite antialiasing effect. More than worth it as far as I'm concerned. My FPS is capped at 60 and does not drop from 60 using FXAA.

Zacam · **Reply #19 on:** May 18, 2011, 02:19:01 am

Code: [Select]

0(3) : warning C7022: unrecognized profile specifier "noperspective"
0(3) : error C0502: syntax error at token "noperspective"

ERROR! Unable to create vertex shader!
Error while compiling FXAA shaders. FXAA will be unavailable.

Removing 'noperspective' from both Fragment and Vertex resolves.

Setting "#define FXAA_GLSL_120 1" to being "#define FXAA_GLSL_130 1" and 'noperspective' removed produces the following:

Code: [Select]

Fragment shader compiled with warnings:
0(45) : warning C7532: global function textureGrad requires "#version 130" or later
0(184) : warning C7532: global function textureLodOffset requires "#version 130" or later
0(311) : warning C7532: global function textureLod requires "#version 130" or later

Shaders still compile so long as "#version 120" is not set.

Setting a "#version 130" with "#define FXAA_GLSL_130 1" and 'noperspective' removed produces the following:

Code: [Select]

Fragment shader compiled with warnings:
0(369) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(370) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(375) : warning C7533: global variable gl_FragColor is deprecated after version 120

Shader linked with warnings:
Fragment info
-------------
0(369) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(370) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(375) : warning C7533: global variable gl_FragColor is deprecated after version 120

Setting "#version 130" with "#define FXAA_GLSL_120 1" and 'noperspective' removed produces the following:

Code: [Select]

Fragment shader compiled with warnings:
0(311) : warning C7533: global function texture2DLod is deprecated after version 120
0(369) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(370) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(375) : warning C7533: global variable gl_FragColor is deprecated after version 120

Shader linked with warnings:
Fragment info
-------------
0(311) : warning C7533: global function texture2DLod is deprecated after version 120
0(369) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(370) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(375) : warning C7533: global variable gl_FragColor is deprecated after version 120

Setting "#version 130" with "#define FXAA_GLSL_120 1" and leaving 'noperspective' produces the following:

Code: [Select]

Vertex shader failed to compile:
0(4) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(4) : error C7560: OpenGL does not allow 'noperspective' with 'varying'
0(4) : error C7561: OpenGL requires 'in/out' with 'noperspective'
0(7) : warning C7555: 'varying' is deprecated, use 'in/out' instead
0(10) : warning C7533: global variable gl_Vertex is deprecated after version 120

ERROR! Unable to create vertex shader!
Error while compiling FXAA shaders. FXAA will be unavailable.

This concludes today's exercise

News:

Author Topic: FXAA in FSO (Read 40102 times)

FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO

Re: FXAA in FSO