iDCT and its impact on encoding times

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • UncasMS
    Super Moderator
    • Nov 2001
    • 9047

    iDCT and its impact on encoding times

    following questions on and discussions about the impact iDCT has on the conversion speed, i decided to test all available iDCTs in rebuilder (pro):



    i converted Van Helsing once more:
    126 min main movie + 35 min extras

    using an athlon64 @ 3500, 1gb ram with cce 270 in OPV (one pass variable bitrate) mode

    all conversion were done via batch-processing and of course the pc wasnt touched at all

    the times are ENCODING ONLY because rebuilding for example has absolutely nothing to do with idct and that's what i wanted to test

    Code:
    idct
    
    decoder default			66 min
    32 bit mmx			67 min 
    32 bit sse/mmx			67 min 
    64 bit floating point		81 min
    64 bit IEEE-1180 reference	83 min
    32 bit sse2/mmx			67 min
    32 bit ssemmx idct (skal)	64 min
    32 bit simple mmx (xvid)	61 min

    it is no wonder that the most precise modes (64bit) take longest but ~1/3 speed difference is quite a lot and i guess and wont ever use them until i got myself a dual-core 6ghz machine

    xvid idct being the winner is no surprise either - it is considered very fast but not as good/precise


    according to rockas the default settings is:
    ... the default value is SSE/MMX not SSE2/MMX... anyway...
    SSE2/MMX is compatible with Pentium IV and AMD64...
    SSE/MMX is compatible with those two plus Pentium III and Amd ATHLON.
    jdobbs on: 32 bit ssemmx idct (skal):
    The two SSE/MMX versions are just different algorithms that use the same precision levels, one was designed by SKAL.

    this one is runner-up in terms of speed; the quality of which i cannot comment on as i wont spend time on something that will have to be made by means of image-comparison software and thus cannot be detected by the human eye


    i'm a little bit puzzled as to my encoding time using DEFAULT as it does not match ANY other time - according to what i cited above default equals sse/mmx and thus it should have been 67 min not 66 but anyway

    a little more surprising to me is the fact that sse2/mmx is NOT faster than mmx//sse/mmx which some people say it outperforms


    taking into account that the default setting is #3 out of 8 when talking speed, i should think i will stay with this setting - it has provided good results before and it is not horribly slow compared to other settings

    old dog, new tricks you know

    of course results may differ with different cpus so in case anyone wants to test a couple of the idct, please to so and report back
    Last edited by UncasMS; 15 Oct 2005, 08:52 PM.
  • UncasMS
    Super Moderator
    • Nov 2001
    • 9047

    #2
    idct & HC ENC test release 2

    i have repeated the test with hank315's encoder and opv mode

    final output size was also added this time
    again i can clearly state that sse2 is NOT the fastest mode like some people claim every now and then

    again the times stated are ENCODING ONLY times - no preparation, no rebuilding

    Code:
    HC 016 TR-2
    
    decoder default			111 min  4.434.924
    32 bit mmx			113 min  4.434.924
    32 bit sse mmx			113 min  4.435.880
    64 bit floating point		137 min  4.437.452
    64 bit IEEE-1180 reference	144 min  4.444.302
    32 bit sse2/mmx			113 min  4.434.924
    32 bit ssemmx idct (skal)	110 min  4.440.598
    32 bit simple mmx (xvid)	111 min  4.433.900
    Last edited by UncasMS; 15 Oct 2005, 08:29 PM.

    Comment

    • UncasMS
      Super Moderator
      • Nov 2001
      • 9047

      #3
      HC testrelease 3 (10-26-2005) results OPV mode

      i have done another testrun

      this time i used the latest (not publically available) testrelease 3 of HC Enc 016, in which Hank315 has made changes for OPV mode

      Code:
      HC 016 TR-3
      
      Default Matrix, default GOP size
      
      decoder default			132 min  4.516.914
      32 bit mmx			126 min  4.516.914
      32 bit sse mmx			126 min  4.516.914
      64 bit floating point		154 min  4.518.124
      64 bit IEEE-1180 reference	168 min  4.518.466
      32 bit sse2/mmx			146 min  4.517.962
      [COLOR="Red"]32 bit sse2/mmx			131 min  4.516.650[/COLOR]
      32 bit ssemmx idct (skal)	142 min  4.518.272
      [COLOR="Red"]32 bit ssemmx idct (skal)	130 min  4.517.136[/COLOR]
      32 bit simple mmx (xvid)	144 min  4.518.132
      [COLOR="Red"]32 bit simple mmx (xvid)	131 min  4.516.940[/COLOR]
      
      [COLOR="Red"]EDIT:
      since i couldnt believe that the fastest idcts were now extremely slow i 
      redid skal & xvid and received different results this second time
      
      sse2/mmx redone as well
      
      now the speed is as it is to be expected, which is good
      the bad side is: i must have done some mistake and will try to find out 
      what went wrong/was made differently[/COLOR]
      
      
      32 bit simple mmx (xvid) MATRIX: [COLOR="Red"]angelverylow[/COLOR]	144 min  4.518.184

      a couple of things can be noticed with this release:

      1. due to changes in the OPV prediction/mode this version takes longer than TR 2

      2. the final output size is 85-95 mb HIGHER than TR2 and it is much more consistent; differences are ~1.5mb

      3. the speed is the odd thing out, however
      mmx and mmx/sse idct are fastest; skal and xvid idcts were fastest before and are much slower now
      3b. speed is back to normal now - the reason for the first results is still being looked for

      let's wait what future releases might bring
      Last edited by UncasMS; 6 Nov 2005, 11:08 AM.

      Comment

      • UncasMS
        Super Moderator
        • Nov 2001
        • 9047

        #4
        i think i have found the reason why the speed of the sse2/mmx, skal & xvid idcts were slower in one of the testruns and much faster when repeating the test

        for some reason dvdrebuilder "forgot" about the *half d1* setting, which i have used for all extras

        the slower encodings did NOT use half d1, the faster ones were made with the usual routine (i.e. half d1 activated)


        i repeated the conversion with skal idct and *half d1* NOT used and again it took 143 min, which is almost identical to the first result
        Last edited by UncasMS; 7 Apr 2006, 05:24 AM.

        Comment

        Working...