The pattern doesn't significantly change when connected via SSH. HTOP indicates ~6% CPU use at idle when connected by VNC, and about 0.3% (wifi supplicant) when connected via ssh. I don't think it's related to cache competition between processes. The CPU scaling governors have been set to performance, and all CPUs are running at 1.8Ghz. All matrices, matrix rows, and vectors are 16-byte aligned (checked with asserts in debug compiles). The code under test gets executed three times, taking ~0.3 x realtime, so further optimizations are in fact critical. I think there's probably another ~15% improvement available, but the timing inaccuracies are getting in the way at this point. So far, optimizaton has resulted in an improvement from ~0.18 x realtime, to 0.093 x realtime. Most of the code pipelines beautifully, so code may be running close L1-cache bandwidth limits. Code footprint is unknown, but may not fit in L1 i-cache. Data footprint is about 13kb (fits comfortably in L1 d-cache). The bulk of the execution (~90%) is matrix and vector arithmetic using hand-optimized neon intrinsics. The code under test is a machine-learning algorithm (LSTM->Dense), using hand-optimized neon intrinsics used to generate real-time audio. And I am unable to come up with a reasonable theory as to why the variances changes from invocation to invocation. But the ~10% variance is unworkable for what I'm trying to do. I'm an experienced programmer, so I get that benchmarks will vary somewhat. But the peculiar thing here is that the results are consistent to +/- 1% between the three tests performed each time the program is run. Results vary randomly roughly between those two extremes for each run. But when I re-run the program the results of the three benchmarks vary by +/- 10% from the previous run, but with each of the three results in the new run being +/- 1%.ĩ:21:37. I run the benchmark three times in a row each time the program is run, and get roughly the same results +/- 1%. But the peculiar thing is that the variance seems to be sticky for a particular execution of the benchmark. Results for a ~6-second benchmark vary by ~10%. What would cause performance to vary by 10% between executions of the benchmark program, while remaining consistent +/- 1% when the same test is run multiple times in the same execution of the program? Because GNU profiling tools don't work on Raspberry Pi, I'm stuck with benchmarking to evaluate code optimizations, so this is rather a big deal. And there’s no such thing as that.I'm trying to benchmark a piece of DSP code on a Raspberry Pi 4 using the std::chrono::steady_clock, but the results I'm getting are peculiar. ![]() What we needed was an after-school routine clock FOR US PARENTS.īecause parenthood is like controlled anarchy. ![]() I yield.Īnd with that, I decided that what we all needed wasn’t an after-school routine clock for the kids. the 6 feet stretch of our standard, nothing notable hallway).Īnd know what? I’d STILL be picking up those flipping undies. I can even whip out the label maker and stick 260 point, bold font arrows along the exceedingly difficult terrain one must traverse in order to get the undies to the laundry hamper (i.e. I can magnetize that order, color it with chalk pens… ![]() The reality, in our household, is that no amount of craftiness is going to make the “putting one’s dirty underwear in the clothes hamper” task any more appealing. Not today, and not after this darling routine clock is made. There will be NO order in this court, before school, after school, none of it. Just a long line of feeble attempts to rally a bit of support from the short people round the cause of ORDER or just a hint of CLEANLINESS.Īnd here’s the reality bitch slap moment.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |