✔ Probleme mit pow()

Rene42 · 10. Mai 2017

Hallo miteinander,

erstmal ich bin neu bei "C++" versuche keine dummen Fragen zu stellen ;-). Ich benutze "(Code)" für meinen Code, gibt es auch einen für C++?

ich benutze folgenden Code:

Code:

// Bestimmt die Stellenzahl der Zahl
 int Stellenzahl = 5;

 //Index um alle Stellen zu durchlaufen
 int iStellen = 0;

 //Definiere eine Ausgabe
 std::string Ausgabe;

 //Alle Stellen durchlaufen
 do

 {

    int Faktor = pow(10, iStellen);

     std::cout << "10^" << iStellen << " = " << Faktor << std::endl;

     //Zur nächsten Stelle gehen
     iStellen++;

 }

 while (Stellenzahl > iStellen);

  return 42;

Zu meiner Überraschung ergibt sich dabei folgende Ausgabe:
10^0=1
10^1=10
10^2=99
10^3=1000
10^4=9999

Sicher irgendein Datentypproblem aber ich komm nicht dahinter...

cwriter · 10. Mai 2017

Rene42 hat gesagt.:
Ich benutze "(Code)" für meinen Code, gibt es auch einen für C++?

[code=cpp][/code]

Rene42 hat gesagt.:
Sicher irgendein Datentypproblem aber ich komm nicht dahinter...

http://www.cplusplus.com/reference/cmath/pow/

Pow ist ein Floating Point Typ, bevorzugt Double.

Da du implizit auf int castest:

C++:

int Faktor = pow(10, iStellen);
//Ist dasselbe wie
int Faktor = (int)pow(10, iStellen);

und da diese Conversion nur die Vorkommastellen berücksichtigt, wird abgeschnitten. Da Doubles nicht sehr genau sind, wird 10^2 wohl etwas Richtung 99.9999irgendwas sein, und dann wird halt hart abgerundet.
Zwei Möglichkeiten zur Lösung:
a) Du machst "Faktor" einen Double und hast ein bisschen "unreine", aber bessere Ergebnisse.
b) Du nutzt lrint() (C++11 wird benötigt): http://www.cplusplus.com/reference/cmath/lrint/

C++:

int Faktor = lrint(pow(10, iStellen));

Da wird zwar auch von long int auf int runterskaliert, in diesem Bereich sollte es aber nicht auffallen.

/EDIT: Das normale round() (http://www.cplusplus.com/reference/cmath/round/) müsste natürlich auch gehen.

Gruss
cwriter

sheel · 10. Mai 2017

....oder man verwendet für int's kein pow, sondern multipliziert selber in einer Schleife n-mal dazu.

Ein bisschen mehr dazu, was das eigentliche Problem ist:

Wie du vllt. weißt, steht jede Stelle in einer Binärzahl für eine Potenz von zwei, zB.
1101 binär ist 13 dezimal weil 1*8 + 1*4 + 0*2 + 1*1
Die Stelle ganz rechts steht also für 1, die nächste für 2, dann 4, 8, 16, 32 usw.

Bei Kommazahlen geht das rechts vom Komma weiter:
11.11 ist 1*2 + 1*1 + 1*0.5 + 1*0.25 und damit 3.75 (und speziell ist 11.11 nicht 3.3; links/rechts vom Komma wird nicht als abgetrennte Zahl behandelt). Nach 0.5 und 0.25 kommt dann 0.125 usw., also immer die Hälfte.

Dass eine Zahl mit unendlich vielen Kommastellen (zB. Pi) auch im Binärsystem unendlich viele Kommastellen hat ist vermutlich klar. Mit der oben beschriebenen Zusammensetzungsart der Binärstellen gibt es jetzt leider aber auch viele Zahlen, die dezimal endlich viele Kommastellen, binär aber unendlich viele haben: Nur als Beispiel 0.3: Binär wäre das 0.010101010101... usw. so lang mal will. Anders gehts es mit den verfügbaren Stellenwerten einfach nicht zusammenzusetzen. (Das Problem gibts es nur rechts vom Komma. mit 1,2,4 usw. links gehts immer endlich).

Und da ein double im Computer natürlich nicht unendlich viele Stellen speichern kann, kann man 0.3 nie genau abspeichern. Es wird dann bestenfalls 0.3000000000000000002349 oder 0.299999999999999992340 oder so etwas.

Für einfach 0.3 ist das auch noch wirkliches Problem, die üblichen Ausgabefunktionen runden selber schon ein wenig.

Wenn in pow aber mehrmals falsche Zahlen miteinander verrechnet werden, kann sich der Fehler erhöhen, dann hat man zB. 0.299999999123 statt 0.299999999999999992340. Und das ist mehr, als die Ausgabesachen einfach so wegrunden wollen.

cwriter · 10. Mai 2017

sheel hat gesagt.:
Wenn in pow aber mehrmals falsche Zahlen miteinander verrechnet werden, kann sich der Fehler erhöhen, dann hat man zB. 0.299999999123 statt 0.299999999999999992340. Und das ist mehr, als die Ausgabesachen einfach so wegrunden wollen.

Das ist hier aber nicht das Problem: Der Code geht immer von 10^i aus, und nicht von pow(10, 1) * lastResult oder so.

sheel hat gesagt.:
....oder man verwendet für int's kein pow, sondern multipliziert selber in einer Schleife n-mal dazu.

Stimmt, das wird oft angefügt.
Ich habe das nicht erwähnt, da nicht klar schien, ob immer eine Integerbasis verwendet wird (denn mit FP-Zahlen wäre Multiplikation potenziell noch ungenauer als pow).
Ich bin mir nicht sicher, welche der beiden Varianten effzienter sind - Multiplikation ist ohnehin schon sehr teuer, daher könnte ich mir gut vorstellen, dass grosse Exponenten bei pow einen Vorteil haben könnten (im Netz findet man Vergleiche in der kubischen Ordnung, was natürlich ziemlicher Quark ist, um die Performance zu vergleichen; zumindest wenn man nicht einen spezifischen Fall betrachten will). Ich müsste mal einen Test dafür schreiben...

sheel hat gesagt.:
(Das Problem gibts es nur rechts vom Komma. mit 1,2,4 usw. links gehts immer endlich)

Ja, aber zur Klarstellung:
Je mehr Stellen links, desto weniger Stellen sind rechts möglich. (Bzw. der Exponent verschiebt die Mantisse). Ebenso ist auch ein Double nicht unendlich gross; für Integer ist er sogar schlechter (=weniger genau/umfassend) als ein int64_t, der gleich gross ist.
Aus Performancesicht ist Double ein grässlich teurer Datentyp, aber schon die Multiplikation ist so teuer wie 14 Additionen (zumindest auf ARM, auf x86 wird es ähnlich sein), also ist alles relativ.
Als Grundregel: Nutze Double, wenn man im den Reellen Zahlen rechnen will, und Int, sooft es geht.

Gruss
cwriter

sheel · 10. Mai 2017

cwriter hat gesagt.:
Das ist hier aber nicht das Problem: Der Code geht immer von 10^i aus, und nicht von pow(10, 1) * lastResult oder so.

Und pow innen...? Das war eigentlich gemeint.

Zu Multiplikationen und Performance: SquareMultiply

cwriter · 10. Mai 2017

sheel hat gesagt.:
Und pow innen...? Das war eigentlich gemeint.

Hm. Ich weiss nicht, ob nicht irgendwelche Tricks verwendet werden, um nicht pure Multiplikationen durchzuführen...

http://x86.renejeschke.de/html/file_module_x86_id_210.html
Lat: 18 cycles (?)

http://x86.renejeschke.de/html/file_module_x86_id_130.html
Lat: 250 cycles (?)

http://x86.renejeschke.de/html/file_module_x86_id_113.html
Lat: 60 cycles (?)

http://x86.renejeschke.de/html/file_module_x86_id_79.html
Lat: 200 cycles (?)

Mit purer Addition: 510 Cycles / 14 Cycles =~ 37.

Nach dieser Logik müsste es sich spätestens beim Exponent 37 lohnen, pow zu verwenden.
Experimente mit ^3 haben aber schon engere Ergebnisse gezeigt.

http://stackoverflow.com/questions/...pow-to-square-or-just-multiply-it-with-itself

Aber es ist schon spät und es könnte ein Denkfehler dahinterstecken.
Und ja, x87 ist nicht die einzige Architektur

Gruss
cwriter

/EDIT: FMUL passt hier wohl besser: http://x86.renejeschke.de/html/file_module_x86_id_104.html
Lat: 8.
^74. Yep. Ziemlich sicher ein Denkfehler

sheel · 10. Mai 2017

cwriter hat gesagt.:
Hm. Ich weiss nicht, ob nicht irgendwelche Tricks verwendet werden, um nicht pure Multiplikationen durchzuführen...

Natürlich ... aber FP-Fehlerfortpflanzung passiert auch ohne stumpfe Multiplikationsschleifen... und sogar wenn es eine perfekt verlustfreie CPU-Anweisung geben würde, schon rein mathematisch muss der Unterschied zwischen 9.999 und 10 weniger tragisch sein als zB. 9.999^5 und 10^5

cwriter · 11. Mai 2017

cwriter hat gesagt.:
Aber es ist schon spät und es könnte ein Denkfehler dahinterstecken.
Und ja, x87 ist nicht die einzige Architektur

Na, jetzt muss ich wohl liefern

C++:

#include <stdio.h>
#include <stdlib.h>
#include <memory>
#include <iostream>
#include <chrono>
#include <random>
#include <assert.h>

using namespace std;
using namespace std::chrono;

#define GO_DOUBLE
#define DATA_TYPE double

//Here to make the compiler think I could use concurrent access.
volatile DATA_TYPE ret = (DATA_TYPE)(1);

int main(int argc, char* argv[])
{
    constexpr size_t testsize = 4096/* * 4096*/;
    constexpr size_t max_exp = 100;

    cout << "Starting test with a size of " << testsize << endl;
    //Create a 16MB testset
    DATA_TYPE* arr = (DATA_TYPE*)malloc(testsize * sizeof(DATA_TYPE));
    if(arr == nullptr)
    {
        std::cout << "failed allocation" << endl;
        return -1;
    }
    //#TODO: Fill array with random double values

    cout << "Filling testarray..." << endl;
    for(size_t i = 0; i < testsize; i++)
    {
        default_random_engine gen;
        //Arbitrary range; this range should be representative enough
#ifdef GO_DOUBLE
        uniform_real_distribution<DATA_TYPE> dist(-1000,+1000);
#else
        uniform_int_distribution<DATA_TYPE> dist(-1000,+1000);

#endif
        arr[i] = dist(gen);  // generates number in the range 1..6
    }
    //The filling should already page the values in the RAM. Hopefully, they won't be paged out...

    cout << "Starting test..." << endl;

    //Now loop over different exponents for both versions
    for(size_t exp = 0; exp <= max_exp; exp++)
    {
        cout << "For exponent " << exp << ":" << endl;
        //Ok, now start the timer
        //#TODO
        high_resolution_clock::time_point t1 = high_resolution_clock::now();

        for(size_t i = 0; i < testsize; i++)
        {
            ret = 1;
            //#TODO: Do the test
            ret *= pow(arr[i], exp);

        }
        //#TODO: Stop the timer and print the output.
        high_resolution_clock::time_point t2 = high_resolution_clock::now();
        auto ns = duration_cast<nanoseconds>(t2-t1);
        cout << "POW took " << ns.count() / testsize << "ns" << endl;


        //#TODO: Now the same again for the other case
                //#TODO: Start the timer
        t1 = high_resolution_clock::now();
                for(size_t i = 0; i < testsize; i++)
                {
            ret = 1;
                        //#TODO: Do the test for raw multiplication
            for(size_t j = 0; j < exp; j++)
            {
                ret *= arr[i];
            }

                }
                //#TODO: Stop the timer and print the output.
        t2 = high_resolution_clock::now();
                ns = duration_cast<nanoseconds>(t2-t1);
                cout << "MUL took " << ns.count() / testsize << "ns" << endl;
    }

    free(arr);
    return 0;
}

Kompiliert mit "g++ testcpp.cpp -O2".
Resultat:

Code:

Starting test with a size of 4096
Filling testarray...
Starting test...
For exponent 0:
POW took 45ns
MUL took 2ns
For exponent 1:
POW took 30ns
MUL took 7ns
For exponent 2:
POW took 36ns
MUL took 28ns
For exponent 3:
POW took 445ns
MUL took 16ns
For exponent 4:
POW took 392ns
MUL took 21ns
For exponent 5:
POW took 443ns
MUL took 26ns
For exponent 6:
POW took 394ns
MUL took 32ns
For exponent 7:
POW took 1511ns
MUL took 39ns
For exponent 8:
POW took 393ns
MUL took 45ns
For exponent 9:
POW took 435ns
MUL took 33ns
For exponent 10:
POW took 262ns
MUL took 39ns
For exponent 11:
POW took 288ns
MUL took 50ns
For exponent 12:
POW took 265ns
MUL took 57ns
For exponent 13:
POW took 288ns
MUL took 63ns
For exponent 14:
POW took 265ns
MUL took 68ns
For exponent 15:
POW took 305ns
MUL took 74ns
For exponent 16:
POW took 262ns
MUL took 79ns
For exponent 17:
POW took 398ns
MUL took 69ns
For exponent 18:
POW took 196ns
MUL took 67ns
For exponent 19:
POW took 216ns
MUL took 76ns
For exponent 20:
POW took 202ns
MUL took 77ns
For exponent 21:
POW took 229ns
MUL took 81ns
For exponent 22:
POW took 196ns
MUL took 85ns
For exponent 23:
POW took 216ns
MUL took 89ns
For exponent 24:
POW took 216ns
MUL took 93ns
For exponent 25:
POW took 216ns
MUL took 98ns
For exponent 26:
POW took 225ns
MUL took 102ns
For exponent 27:
POW took 230ns
MUL took 107ns
For exponent 28:
POW took 206ns
MUL took 114ns
For exponent 29:
POW took 223ns
MUL took 117ns
For exponent 30:
POW took 209ns
MUL took 122ns
For exponent 31:
POW took 216ns
MUL took 138ns
For exponent 32:
POW took 201ns
MUL took 139ns
For exponent 33:
POW took 229ns
MUL took 140ns
For exponent 34:
POW took 208ns
MUL took 147ns
For exponent 35:
POW took 229ns
MUL took 166ns
For exponent 36:
POW took 196ns
MUL took 160ns
For exponent 37:
POW took 216ns
MUL took 167ns
For exponent 38:
POW took 214ns
MUL took 174ns
For exponent 39:
POW took 216ns
MUL took 181ns
For exponent 40:
POW took 196ns
MUL took 201ns
For exponent 41:
POW took 216ns
MUL took 195ns
For exponent 42:
POW took 196ns
MUL took 202ns
For exponent 43:
POW took 230ns
MUL took 209ns
For exponent 44:
POW took 196ns
MUL took 226ns
For exponent 45:
POW took 228ns
MUL took 223ns
For exponent 46:
POW took 196ns
MUL took 243ns
For exponent 47:
POW took 240ns
MUL took 241ns
For exponent 48:
POW took 208ns
MUL took 243ns
For exponent 49:
POW took 216ns
MUL took 264ns
For exponent 50:
POW took 196ns
MUL took 257ns
For exponent 51:
POW took 216ns
MUL took 288ns
For exponent 52:
POW took 196ns
MUL took 271ns
For exponent 53:
POW took 216ns
MUL took 290ns
For exponent 54:
POW took 196ns
MUL took 285ns
For exponent 55:
POW took 216ns
MUL took 305ns
For exponent 56:
POW took 196ns
MUL took 309ns
For exponent 57:
POW took 219ns
MUL took 305ns
For exponent 58:
POW took 196ns
MUL took 312ns
For exponent 59:
POW took 218ns
MUL took 329ns
For exponent 60:
POW took 196ns
MUL took 326ns
For exponent 61:
POW took 220ns
MUL took 333ns
For exponent 62:
POW took 196ns
MUL took 342ns
For exponent 63:
POW took 216ns
MUL took 347ns
For exponent 64:
POW took 225ns
MUL took 409ns
For exponent 65:
POW took 237ns
MUL took 512ns
For exponent 66:
POW took 204ns
MUL took 478ns
For exponent 67:
POW took 218ns
MUL took 470ns
For exponent 68:
POW took 196ns
MUL took 489ns
For exponent 69:
POW took 216ns
MUL took 497ns
For exponent 70:
POW took 199ns
MUL took 501ns
For exponent 71:
POW took 216ns
MUL took 510ns
For exponent 72:
POW took 196ns
MUL took 517ns
For exponent 73:
POW took 216ns
MUL took 511ns
For exponent 74:
POW took 200ns
MUL took 518ns
For exponent 75:
POW took 216ns
MUL took 527ns
For exponent 76:
POW took 207ns
MUL took 534ns
For exponent 77:
POW took 216ns
MUL took 549ns
For exponent 78:
POW took 200ns
MUL took 546ns
For exponent 79:
POW took 216ns
MUL took 555ns
For exponent 80:
POW took 196ns
MUL took 561ns
For exponent 81:
POW took 216ns
MUL took 581ns
For exponent 82:
POW took 196ns
MUL took 573ns
For exponent 83:
POW took 229ns
MUL took 580ns
For exponent 84:
POW took 196ns
MUL took 589ns
For exponent 85:
POW took 216ns
MUL took 598ns
For exponent 86:
POW took 196ns
MUL took 623ns
For exponent 87:
POW took 218ns
MUL took 610ns
For exponent 88:
POW took 207ns
MUL took 614ns
For exponent 89:
POW took 220ns
MUL took 621ns
For exponent 90:
POW took 198ns
MUL took 628ns
For exponent 91:
POW took 216ns
MUL took 648ns
For exponent 92:
POW took 196ns
MUL took 656ns
For exponent 93:
POW took 216ns
MUL took 651ns
For exponent 94:
POW took 196ns
MUL took 658ns
For exponent 95:
POW took 216ns
MUL took 666ns
For exponent 96:
POW took 210ns
MUL took 1014ns
For exponent 97:
POW took 259ns
MUL took 760ns
For exponent 98:
POW took 199ns
MUL took 737ns
For exponent 99:
POW took 248ns
MUL took 834ns
For exponent 100:
POW took 197ns
MUL took 726ns

Meine Interpretation davon:
Bis zum Exponent 40 ist die Multiplikationsmethode mehr oder weniger klar im Vorteil.
40-50 ist es etwa gleich auf.
Ab 50 Ist pow() klar im Vorteil.

Ebenso: 0 ist definitiv ein optimierter Codepfad.
(das Testset ist relativ klein, aber ich bin hier gerade auf Akku).

Für ints ist es etwa gleich (die höheren Zeiten sind wohl auf grundsätzliche Überlastung zurückzuführen).
Durch die zusätzlichen Casts lohnt sich pow() hier aber erst ab Exponent 60.

Code:

Starting test with a size of 4096
Filling testarray...
Starting test...
For exponent 0:
POW took 47ns
MUL took 4ns
For exponent 1:
POW took 31ns
MUL took 6ns
For exponent 2:
POW took 35ns
MUL took 9ns
For exponent 3:
POW took 444ns
MUL took 15ns
For exponent 4:
POW took 397ns
MUL took 19ns
For exponent 5:
POW took 434ns
MUL took 14ns
For exponent 6:
POW took 227ns
MUL took 16ns
For exponent 7:
POW took 933ns
MUL took 20ns
For exponent 8:
POW took 258ns
MUL took 22ns
For exponent 9:
POW took 250ns
MUL took 24ns
For exponent 10:
POW took 231ns
MUL took 27ns
For exponent 11:
POW took 249ns
MUL took 30ns
For exponent 12:
POW took 227ns
MUL took 36ns
For exponent 13:
POW took 249ns
MUL took 52ns
For exponent 14:
POW took 266ns
MUL took 51ns
For exponent 15:
POW took 292ns
MUL took 54ns
For exponent 16:
POW took 486ns
MUL took 59ns
For exponent 17:
POW took 291ns
MUL took 63ns
For exponent 18:
POW took 339ns
MUL took 67ns
For exponent 19:
POW took 291ns
MUL took 71ns
For exponent 20:
POW took 265ns
MUL took 108ns
For exponent 21:
POW took 301ns
MUL took 92ns
For exponent 22:
POW took 265ns
MUL took 85ns
For exponent 23:
POW took 292ns
MUL took 94ns
For exponent 24:
POW took 265ns
MUL took 95ns
For exponent 25:
POW took 291ns
MUL took 100ns
For exponent 26:
POW took 269ns
MUL took 105ns
For exponent 27:
POW took 291ns
MUL took 108ns
For exponent 28:
POW took 265ns
MUL took 122ns
For exponent 29:
POW took 306ns
MUL took 118ns
For exponent 30:
POW took 265ns
MUL took 124ns
For exponent 31:
POW took 294ns
MUL took 128ns
For exponent 32:
POW took 265ns
MUL took 135ns
For exponent 33:
POW took 294ns
MUL took 140ns
For exponent 34:
POW took 267ns
MUL took 147ns
For exponent 35:
POW took 292ns
MUL took 160ns
For exponent 36:
POW took 266ns
MUL took 160ns
For exponent 37:
POW took 293ns
MUL took 165ns
For exponent 38:
POW took 272ns
MUL took 172ns
For exponent 39:
POW took 379ns
MUL took 273ns
For exponent 40:
POW took 397ns
MUL took 287ns
For exponent 41:
POW took 444ns
MUL took 284ns
For exponent 42:
POW took 683ns
MUL took 298ns
For exponent 43:
POW took 442ns
MUL took 303ns
For exponent 44:
POW took 402ns
MUL took 328ns
For exponent 45:
POW took 437ns
MUL took 324ns
For exponent 46:
POW took 401ns
MUL took 338ns
For exponent 47:
POW took 441ns
MUL took 344ns
For exponent 48:
POW took 438ns
MUL took 363ns
For exponent 49:
POW took 445ns
MUL took 367ns
For exponent 50:
POW took 401ns
MUL took 379ns
For exponent 51:
POW took 440ns
MUL took 384ns
For exponent 52:
POW took 403ns
MUL took 399ns
For exponent 53:
POW took 441ns
MUL took 404ns
For exponent 54:
POW took 402ns
MUL took 423ns
For exponent 55:
POW took 439ns
MUL took 431ns
For exponent 56:
POW took 398ns
MUL took 444ns
For exponent 57:
POW took 438ns
MUL took 448ns
For exponent 58:
POW took 397ns
MUL took 465ns
For exponent 59:
POW took 445ns
MUL took 464ns
For exponent 60:
POW took 403ns
MUL took 479ns
For exponent 61:
POW took 444ns
MUL took 484ns
For exponent 62:
POW took 405ns
MUL took 499ns
For exponent 63:
POW took 458ns
MUL took 504ns
For exponent 64:
POW took 402ns
MUL took 520ns
For exponent 65:
POW took 443ns
MUL took 663ns
For exponent 66:
POW took 400ns
MUL took 673ns
For exponent 67:
POW took 445ns
MUL took 689ns
For exponent 68:
POW took 397ns
MUL took 697ns
For exponent 69:
POW took 438ns
MUL took 706ns
For exponent 70:
POW took 398ns
MUL took 870ns
For exponent 71:
POW took 443ns
MUL took 724ns
For exponent 72:
POW took 400ns
MUL took 740ns
For exponent 73:
POW took 438ns
MUL took 746ns
For exponent 74:
POW took 402ns
MUL took 756ns
For exponent 75:
POW took 438ns
MUL took 770ns
For exponent 76:
POW took 403ns
MUL took 773ns
For exponent 77:
POW took 443ns
MUL took 795ns
For exponent 78:
POW took 398ns
MUL took 797ns
For exponent 79:
POW took 437ns
MUL took 815ns
For exponent 80:
POW took 406ns
MUL took 820ns
For exponent 81:
POW took 441ns
MUL took 552ns
For exponent 82:
POW took 266ns
MUL took 555ns
For exponent 83:
POW took 300ns
MUL took 562ns
For exponent 84:
POW took 269ns
MUL took 569ns
For exponent 85:
POW took 292ns
MUL took 585ns
For exponent 86:
POW took 265ns
MUL took 588ns
For exponent 87:
POW took 291ns
MUL took 593ns
For exponent 88:
POW took 265ns
MUL took 695ns
For exponent 89:
POW took 438ns
MUL took 917ns
For exponent 90:
POW took 402ns
MUL took 917ns
For exponent 91:
POW took 438ns
MUL took 930ns
For exponent 92:
POW took 398ns
MUL took 943ns
For exponent 93:
POW took 440ns
MUL took 950ns
For exponent 94:
POW took 397ns
MUL took 960ns
For exponent 95:
POW took 440ns
MUL took 975ns
For exponent 96:
POW took 398ns
MUL took 977ns
For exponent 97:
POW took 442ns
MUL took 990ns
For exponent 98:
POW took 398ns
MUL took 1003ns
For exponent 99:
POW took 438ns
MUL took 1007ns
For exponent 100:
POW took 402ns
MUL took 1017ns

Gruss
cwriter

/EDIT: Oh, nano ist wieder kreativ gewesen.
Ich hänge die Codedatei hier an.

Technipion · 11. Mai 2017

Ich schätze einfach mal darauf los, dass pow intern Logarithmen benutzt um die Berechnung von Potenzen zu beschleunigen, zumindest bei großen Zahlen. Das würde auch erklären, warum es kein ipow gibt

.

Aus den Logarithmengesetzen folgt, dass x^p = e^( ln(x^p) ) = e^(p * ln(x)). Hat man jetzt schnelle interne Funktionen zur Logarithmus-/Exponentialberechnung parat, könnte man damit also was feines bauen

.

Ich habe mal schnell einen Codeschnipsel geschrieben, um ein improvisiertes ipow mit einem selbstgebastelten pow zu vergleichen. Ich erhalte eine relative Abweichung der pow-Werte von den ipow-Werten von etwa 0,0025%. Ich schätze mit ein wenig mehr Aufwand könnte man daraus wirklich was machen (wenn man noch Korrekturen etc. einbaut). Bzw. eigentlich ist meine Vermutung, dass das eingebaute pow genau das tut.

C++:

#include <iostream>     // for std::cout
#include <cmath>        // for std::log, std::exp, std::abs


uint64_t pow_integral(uint64_t base, uint64_t exponent)
{
    uint64_t result = 1;

    for (uint64_t i = 0; i < exponent; i++)
        result *= base;

    return result;
}


uint64_t pow_logarithm(uint64_t base, uint64_t exponent)
{
    double new_exponent = exponent * std::log(base);
    return (uint64_t)std::exp(new_exponent);
}


int main(int argc, char *argv[])
{
    double deviations = 0.0;
    uint64_t iterations = 0;

    for (uint64_t b = 2; b < 65536; b++) {

        for (uint64_t e = 2; e < 4; e++) {

            auto integral = pow_integral(b, e);
            auto logarithm = pow_logarithm(b, e);

            deviations += std::abs(((double)(integral) - (double)(logarithm)) / integral);
            iterations += 1;
        }
    }

    std::cout << "Ran " << iterations << " cycles and found an average relative "
              << "deviation of " << deviations / iterations << "." << std::endl;

    return 0;
}

@Rene42: In dem Codeschnipsel siehst du übrigens, wie man sich eine ipow schnell selbst basteln kann. Aber behalte immer im Hinterkopf, dass du bei großen Potenzen selbst mit uint64 rasch bis zum Overflow kommst.

Ist natürlich alles nur Spekulation

...

Gruß Technipion

Rene42 · 11. Mai 2017

Ersteinmal vielen Dank für die vielen schnellen Antworten. Ich mache es nun mit round(). Hatte mich irgendwie verrannt und dann das mit dem "Kommaproblem".

✔ Probleme mit pow()

Rene42

Erfahrenes Mitglied

cwriter

Erfahrenes Mitglied

sheel

I love Asm

cwriter

Erfahrenes Mitglied

sheel

I love Asm

cwriter

Erfahrenes Mitglied

sheel

I love Asm

cwriter

Erfahrenes Mitglied

Anhänge

Technipion

Erfahrenes Mitglied

Rene42

Erfahrenes Mitglied

Neue Beiträge