StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWhy is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
primarykey
Id
18707984
data
AcceptedAnswerId
18735505
AnswerCount
4
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-09-09T22:23:00.770
FavoriteCount
1
LastActivityDate
2013-09-12T07:58:36.693
LastEditDate
2013-09-12T07:39:06.280
LastEditorUserId
1686013
OwnerUserId
1686013
ParentId
0
PostTypeId
1
Score
5
ViewCount
582
LastEditorDisplayName
text
Body
I want to perform polynomial calculus in Python. The <code>polynomial</code> package in <code>numpy</code> is not fast enough for me. Therefore I decided to rewrite several functions in Fortran and use <code>f2py</code> to create shared libraries which are easily imported into Python. Currently I am benchmarking my routines for univariate and bivariate polynomial evaluation against their <code>numpy</code> counterparts. In the univariate routine I use <a href="http://en.wikipedia.org/wiki/Horner%27s_method" rel="nofollow noreferrer">Horner's method</a> as does <code>numpy.polynomial.polynomial.polyval</code>. I have observed that the factor by which the Fortran routine is faster than the <code>numpy</code> counterpart increases as the order of the polynomial increases. In the bivariate routine I use Horner's method twice. First in y and then in x. Unfortunately I have observed that for increasing polynomial order, the <code>numpy</code> counterpart catches up and eventually surpasses my Fortran routine. As <code>numpy.polynomial.polynomial.polyval2d</code> uses an approach similar to mine, I consider this second observation to be strange. I am hoping that this result stems forth from my inexperience with Fortran and <code>f2py</code>. Might someone have any clue why the univariate routine always appears superior, while the bivariate routine is only superior for low order polynomials? EDIT Here is my latest updated code, benchmark script and performance plots: polynomial.f95 <pre><code>subroutine polyval(p, x, pval, nx) implicit none real(8), dimension(nx), intent(in) :: p real(8), intent(in) :: x real(8), intent(out) :: pval integer, intent(in) :: nx integer :: i pval = 0.0d0 do i = nx, 1, -1 pval = pval*x + p(i) end do end subroutine polyval subroutine polyval2(p, x, y, pval, nx, ny) implicit none real(8), dimension(nx, ny), intent(in) :: p real(8), intent(in) :: x, y real(8), intent(out) :: pval integer, intent(in) :: nx, ny real(8) :: tmp integer :: i, j pval = 0.0d0 do j = ny, 1, -1 tmp = 0.0d0 do i = nx, 1, -1 tmp = tmp*x + p(i, j) end do pval = pval*y + tmp end do end subroutine polyval2 subroutine polyval3(p, x, y, z, pval, nx, ny, nz) implicit none real(8), dimension(nx, ny, nz), intent(in) :: p real(8), intent(in) :: x, y, z real(8), intent(out) :: pval integer, intent(in) :: nx, ny, nz real(8) :: tmp, tmp2 integer :: i, j, k pval = 0.0d0 do k = nz, 1, -1 tmp2 = 0.0d0 do j = ny, 1, -1 tmp = 0.0d0 do i = nx, 1, -1 tmp = tmp*x + p(i, j, k) end do tmp2 = tmp2*y + tmp end do pval = pval*z + tmp2 end do end subroutine polyval3 </code></pre> benchmark.py (use this script to produce plots) <pre><code>import time import os import numpy as np import matplotlib.pyplot as plt # Compile and import Fortran module os.system('f2py -c polynomial.f95 --opt="-O3 -ffast-math" \ --f90exec="gfortran-4.8" -m polynomial') import polynomial # Create random x and y value x = np.random.rand() y = np.random.rand() z = np.random.rand() # Number of repetition repetition = 10 # Number of times to loop over a function run = 100 # Number of data points points = 26 # Max number of coefficients for univariate case n_uni_min = 4 n_uni_max = 100 # Max number of coefficients for bivariate case n_bi_min = 4 n_bi_max = 100 # Max number of coefficients for trivariate case n_tri_min = 4 n_tri_max = 100 # Case on/off switch case_on = [1, 1, 1] case_1_done = 0 case_2_done = 0 case_3_done = 0 #=================# # UNIVARIATE CASE # #=================# if case_on[0]: # Array containing the polynomial order + 1 for several univariate polynomials n_uni = np.array([int(x) for x in np.linspace(n_uni_min, n_uni_max, points)]) # Initialise arrays for storing timing results time_uni_numpy = np.zeros(n_uni.size) time_uni_fortran = np.zeros(n_uni.size) for i in xrange(len(n_uni)): # Create random univariate polynomial of order n - 1 p = np.random.rand(n_uni[i]) # Time evaluation of polynomial using NumPy dt = [] for j in xrange(repetition): t1 = time.time() for r in xrange(run): np.polynomial.polynomial.polyval(x, p) t2 = time.time() dt.append(t2 - t1) time_uni_numpy[i] = np.average(dt[2::]) # Time evaluation of polynomial using Fortran dt = [] for j in xrange(repetition): t1 = time.time() for r in xrange(run): polynomial.polyval(p, x) t2 = time.time() dt.append(t2 - t1) time_uni_fortran[i] = np.average(dt[2::]) # Speed-up factor factor_uni = time_uni_numpy / time_uni_fortran results_uni = np.zeros([len(n_uni), 4]) results_uni[:, 0] = n_uni results_uni[:, 1] = factor_uni results_uni[:, 2] = time_uni_numpy results_uni[:, 3] = time_uni_fortran print results_uni, '\n' plt.figure() plt.plot(n_uni, factor_uni) plt.title('Univariate comparison') plt.xlabel('# coefficients') plt.ylabel('Speed-up factor') plt.xlim(n_uni[0], n_uni[-1]) plt.ylim(0, max(factor_uni)) plt.grid(aa=True) case_1_done = 1 #================# # BIVARIATE CASE # #================# if case_on[1]: # Array containing the polynomial order + 1 for several bivariate polynomials n_bi = np.array([int(x) for x in np.linspace(n_bi_min, n_bi_max, points)]) # Initialise arrays for storing timing results time_bi_numpy = np.zeros(n_bi.size) time_bi_fortran = np.zeros(n_bi.size) for i in xrange(len(n_bi)): # Create random bivariate polynomial of order n - 1 in x and in y p = np.random.rand(n_bi[i], n_bi[i]) # Time evaluation of polynomial using NumPy dt = [] for j in xrange(repetition): t1 = time.time() for r in xrange(run): np.polynomial.polynomial.polyval2d(x, y, p) t2 = time.time() dt.append(t2 - t1) time_bi_numpy[i] = np.average(dt[2::]) # Time evaluation of polynomial using Fortran p = np.asfortranarray(p) dt = [] for j in xrange(repetition): t1 = time.time() for r in xrange(run): polynomial.polyval2(p, x, y) t2 = time.time() dt.append(t2 - t1) time_bi_fortran[i] = np.average(dt[2::]) # Speed-up factor factor_bi = time_bi_numpy / time_bi_fortran results_bi = np.zeros([len(n_bi), 4]) results_bi[:, 0] = n_bi results_bi[:, 1] = factor_bi results_bi[:, 2] = time_bi_numpy results_bi[:, 3] = time_bi_fortran print results_bi, '\n' plt.figure() plt.plot(n_bi, factor_bi) plt.title('Bivariate comparison') plt.xlabel('# coefficients') plt.ylabel('Speed-up factor') plt.xlim(n_bi[0], n_bi[-1]) plt.ylim(0, max(factor_bi)) plt.grid(aa=True) case_2_done = 1 #=================# # TRIVARIATE CASE # #=================# if case_on[2]: # Array containing the polynomial order + 1 for several bivariate polynomials n_tri = np.array([int(x) for x in np.linspace(n_tri_min, n_tri_max, points)]) # Initialise arrays for storing timing results time_tri_numpy = np.zeros(n_tri.size) time_tri_fortran = np.zeros(n_tri.size) for i in xrange(len(n_tri)): # Create random bivariate polynomial of order n - 1 in x and in y p = np.random.rand(n_tri[i], n_tri[i]) # Time evaluation of polynomial using NumPy dt = [] for j in xrange(repetition): t1 = time.time() for r in xrange(run): np.polynomial.polynomial.polyval3d(x, y, z, p) t2 = time.time() dt.append(t2 - t1) time_tri_numpy[i] = np.average(dt[2::]) # Time evaluation of polynomial using Fortran p = np.asfortranarray(p) dt = [] for j in xrange(repetition): t1 = time.time() for r in xrange(run): polynomial.polyval3(p, x, y, z) t2 = time.time() dt.append(t2 - t1) time_tri_fortran[i] = np.average(dt[2::]) # Speed-up factor factor_tri = time_tri_numpy / time_tri_fortran results_tri = np.zeros([len(n_tri), 4]) results_tri[:, 0] = n_tri results_tri[:, 1] = factor_tri results_tri[:, 2] = time_tri_numpy results_tri[:, 3] = time_tri_fortran print results_tri plt.figure() plt.plot(n_bi, factor_bi) plt.title('Trivariate comparison') plt.xlabel('# coefficients') plt.ylabel('Speed-up factor') plt.xlim(n_tri[0], n_tri[-1]) plt.ylim(0, max(factor_tri)) plt.grid(aa=True) print '\n' case_3_done = 1 #============================================================================== plt.show() </code></pre> Results <img src="https://i.stack.imgur.com/fLcVj.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/nP60i.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/15AJh.png" alt="enter image description here"> EDIT correction to the proposal of steabert <pre><code>subroutine polyval(p, x, pval, nx) implicit none real*8, dimension(nx), intent(in) :: p real*8, intent(in) :: x real*8, intent(out) :: pval integer, intent(in) :: nx integer, parameter :: simd = 8 real*8 :: tmp(simd), xpower(simd), maxpower integer :: i, j, k xpower(1) = x do i = 2, simd xpower(i) = xpower(i-1)*x end do maxpower = xpower(simd) tmp = 0.0d0 do i = nx+1, simd+2, -simd do j = 1, simd tmp(j) = tmp(j)*maxpower + p(i-j)*xpower(simd-j+1) end do end do k = mod(nx-1, simd) if (k == 0) then pval = sum(tmp) + p(1) else pval = sum(tmp) + p(k+1) do i = k, 1, -1 pval = pval*x + p(i) end do end if end subroutine polyval </code></pre> EDIT Test code to verify that the code directly above gives poor results for x > 1 <pre><code>import polynomial as P import numpy.polynomial.polynomial as PP import numpy as np for n in xrange(2,100): poly1n = np.random.rand(n) poly1f = np.asfortranarray(poly1n) x = 2 print np.linalg.norm(P.polyval(poly1f, x) - PP.polyval(x, poly1n)), '\n' </code></pre>
Tags
<python><arrays><numpy><fortran><f2py>
Title
Why is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USAeronaelius
UserOwnerUserId
1. USAeronaelius
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POWhy is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POWhy is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
 UserUserId
 USusethedeathstar
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POWhy is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI have edited the code in `benchmark.py`. There was a small but significant error. I have uploaded also new result plots. The univariate result is now looking even better
 singulars
 PostPostId
 POWhy is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
 UserUserId
 USAeronaelius
2. COI wanted to note that ``pval`` and ``tmp`` seem to be used uninitialized in your fortran subroutines.
 singulars
 PostPostId
 POWhy is univariate Horner in Fortran faster than NumPy counterpart while bivariate Horner is not
 UserUserId
 USsteabert

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.