Startup claims it can boost any processor's performance by 100X — Flow Computing introduces its 'CPU 2.0' architecture (2024)

Startup claims it can boost any processor's performance by 100X — Flow Computing introduces its 'CPU 2.0' architecture (1)

Flow Computing has exited stealth mode with some of the most explosive claims from an emerging technology company we have seen in a long time. A spinout from Finland’s acclaimed VTT Technical Research Center, Flow says its Parallel Processing Unit (PPU) can enable “100X Improved performance for any CPU architecture.” The startup has just secured €4M in pre-seed funding and its founders and backers are asking the tech world to brace for its self-titled era of "CPU 2.0."

According to Timo Valtonen, co-founder and CEO of Flow Computing, the new PPU was designed to break a decades-long stagnation in CPU performance. Valtonen says that this has made the CPU the weakest link in computing in recent years. “Flow intends to lead the SuperCPU revolution through its radical new Parallel Performance Unit (PPU) architecture, enabling up to 100X the performance of any CPU,” boldly claimed the Flow CEO.

Startup claims it can boost any processor's performance by 100X — Flow Computing introduces its 'CPU 2.0' architecture (2)

Flow’s PPU is claimed to be very broadly compatible with existing CPU architectures. It is claimed that any current Von Neumann architecture CPU can integrate a PPU. According to a press release we received, Flow has already prepared optimized PPU licenses targeting mobile, PC, and data center processors. Above, you can see a slide showing Flow PPUs configured with between 4 and 256 cores.

Another eyebrow-raising claim is that an integrated PPU can enable its astonishing 100x performance uplift “regardless of architecture and with full backward software compatibility.” Well-known CPU architectures/designs such as X86, Apple M-Series, Exynos, Arm, and RISC-V are name-checked. The company claims that despite the touted broad compatibility and boosting of existing parallel functionality in all existing software, there will still be worthwhile benefits gleaned from software recompilation for the PPU. In fact, recompilation will be necessary to reach the headlining 100X performance improvements. The company says existing code will run up to 2x faster, though.

Startup claims it can boost any processor's performance by 100X — Flow Computing introduces its 'CPU 2.0' architecture (3)

PC DIYers have traditionally balanced systems to their preferences by choosing and budgeting between CPU and GPU. However, the startup claims Flow “eliminates the need for expensive GPU acceleration of CPU instructions in performant applications.” Meanwhile, any existing co-processors like Matrix units, Vector units, NPUs, or indeed GPUs will benefit from a “far-more-capable CPU,” asserts the startup.

Flow explains the key differences between its PPU and a modern GPU in a FAQ document. “PPU is optimized for parallel processing, while the GPU is optimized for graphics processing,” contrasts the startup. “PPU is more closely integrated with the CPU, and you could think of it as a kind of a co-processor, whereas the GPU is an independent unit that is much more loosely connected to the CPU.” It also highlights the importance of the PPU not requiring a separate kernel and its variable parallelism width.

For now, we are taking the above statements with bucketloads of salt. The claims about 100x performance and ease/transparency of adding a PPU seem particularly bold. Flow says it will deliver more technical details about the PPU in H2 this year. Hopefully, that will be a deeper dive stuffed with benchmarks and relevant comparisons.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

The Helsinki-based startup hasn’t shared a commercialization timetable as yet but seems open to partnerships. It mentions the possibility of working with companies like AMD, Apple, Arm, Intel, Nvidia, Qualcomm, and Tenstorrent. Flow’s PR highlights its preference for an IP licensing model, similar to Arm, where customers embed its PPU for a fee.

Startup claims it can boost any processor's performance by 100X — Flow Computing introduces its 'CPU 2.0' architecture (7)

Mark Tyson

Freelance News Writer

Mark Tyson is a Freelance News Writer at Tom's Hardware US. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

See more CPUs News

More about cpus

Russian chipmaker unveils Raspberry Pi competitor — despite US sanction woes, Elbrus packs its homegrown chip into the smallest form factor yetIntel fires back at AMD's AI benchmarks, shares results claiming current-gen Xeon chips are faster at AI than AMD's next-gen 128-core EPYC Turin

Latest

EVGA's classified AMD motherboard has been declassified — EVGA X670E Classified prototype motherboard fetches over $1,300
See more latest►

34 CommentsComment from the forums

  • peachpuff

    Intel: buy'em out boys!

    Reply

  • rluker5

    Sounds suspiciously like an NPU.

    Reply

  • Notton

    I hope you guys get one to review before they fold or get bought out and never see the light of day again.

    Reply

  • Findecanor

    I have yet a few papers to read, but from my understanding so far this I think this approach is basically about taking SIMT technology that you'd typically find in a GPU and adapting it to run general-purpose code that you'd otherwise only run on a CPU.

    An NPU on the other hand, can only do one thing: matrix multiplication, and often does it at low precision.

    Reply

  • Metal Messiah.

    Tom's article seem to be short on details. I got some insight.

    https://techcrunch.com/2024/06/11/flow-claims-it-can-100x-any-cpus-power-with-its-companion-chip-and-some-elbow-grease/
    CPUs have gotten very fast, but even with nanosecond-level responsiveness, there’s a tremendous amount of waste in how instructions are carried out simply because of the basic limitation that one task needs to finish before the next one starts.

    What Flow claims to have done is remove this limitation, turning the CPU from a one-lane street into a multi-lane highway.

    The CPU is still limited to doing one task at a time, but Flow’s PPU, as they call it, essentially performs nanosecond-scale traffic management on-die to move tasks into and out of the processor faster than has previously been possible.

    Think of the CPU as a chef working in a kitchen.

    The chef can only work so fast, but what if that person had a superhuman assistant swapping knives and tools in and out of the chef’s hands, clearing the prepared food and putting in new ingredients, removing all tasks that aren’t actual chef stuff?

    The chef still only has two hands, but now the chef can work 10 times as fast. It’s not a perfect analogy, but it gives you an idea of what’s happening here, at least according to Flow’s internal tests and demos with the industry (and they are talking with everyone).

    The PPU doesn’t increase the clock frequency or push the system in other ways that would lead to extra heat or power; in other words, the chef is not being asked to chop twice as fast. It just more efficiently uses the CPU cycles that are already taking place.

    Flow’s big achievement, in other words, isn’t high-speed traffic management, but rather doing it without having to modify any code on any CPU or architecture that it has tested. It sounds kind of unhinged to say that arbitrary code can be executed twice as fast on any chip with no modification beyond integrating the PPU with the die.

    Therein lies the primary challenge to Flow’s success as a business: Unlike a software product, Flow’s tech needs to be included at the chip-design level, meaning it doesn’t work retroactively, and the first chip with a PPU would necessarily be quite a ways down the road.

    Chart showing improvements in an FPGA PPU-enhanced chip versus unmodified Intel chips. Increasing the number of PPU cores continually improves performance.

    https://techcrunch.com/wp-content/uploads/2024/06/flow-chart.png

    Reply

  • TerryLaze

    peachpuff said:

    Intel: buy'em out boys!

    Intel already failed at trying to do this some 10 years ago, this is just intel's xeon phi / knights landing but without intel's backing.

    Reply

  • Choiman1559

    There are already products on the market that are very specific to OOoE technology, such as the POWER architecture, which produces similar results to this chip. (Although not x86 compatible and not for consumer use!)

    Reply

  • Alvar "Miles" Udell

    It sounds good in theory, but it does sound suspiciously like a the NPU that Qualcomm, AMD, and Intel have themselves so it's impact may be limited to other companies, like Samsung if they don't license Qualcomm's.

    Reply

  • Murissokah

    Metal Messiah. said:

    What Flow claims to have done is remove this limitation, turning the CPU from a one-lane street into a multi-lane highway.

    Uberthreding? Seems like the kind of thing that is very powerful for some applications, but useless for othres.

    Reply

  • bit_user

    My guess is that it's basically like a tightly-integrated GPU. Like, if the compute cores of a GPU were integrated almost as tightly into a CPU as their FPU.

    Metal Messiah. said:

    CPUs have gotten very fast, but even with nanosecond-level responsiveness, there’s a tremendous amount of waste in how instructions are carried out simply because of the basic limitation that one task needs to finish before the next one starts.

    There are a couple things this could mean. They could be talking about the way that CPU cores can operate on only 1 or 2 threads at a time (although I think POWER has done up to 8-way SMT), but the main thing I think they're probably talking about is how CPUs (especially x86) have tight memory synchronization requirements. GPUs famously have very weak memory models. This can limit concurrency, in CPUs.

    Metal Messiah. said:

    Chart showing improvements in an FPGA PPU-enhanced chip versus unmodified Intel chips. Increasing the number of PPU cores continually improves performance.

    https://techcrunch.com/wp-content/uploads/2024/06/flow-chart.png

    This is super sketchy, if they don't even tell us which models of CPUs they're talking about or show us what code they ran, compiler + options, etc. I find it pretty funny they used a core i7 that was 10x as fast as a Xeon W. Did they compare the newest Raptor Lake i7 against the slowest and worst Skylake Xeon W??

    I'm guessing the code they used on the x86 CPUs was probably brain dead and got compiled to use all scalar operations.

    Reply

Most Popular
China only produces 1% of critical litho chipmaking tools, exposing it to harsh US sanctions — investment in domestic toolmakers looks to reverse the trend
Russian chipmaker unveils Raspberry Pi competitor — despite US sanction woes, Elbrus packs its homegrown chip into the smallest form factor yet
GPU-boosting GDDR7 memory coming this year — SK hynix retracts comments, launching in Q4
Microsoft recalls Recall: Controversial AI feature won't be in Copilot+ Windows build at launch
Snapdragon X Elite in the wild is allegedly slower than iPhone 12 — first benchmarks of Samsung Book4 Edge disappoint
Raspberry Pi powers creepy one-eyed, head-swiveling stuffed koala assistant with ChatGPT
Wells Fargo fires more than a dozen employees for faking work using mouse jigglers and keyboard simulation
Microsoft dev shares story behind iconic Windows screensavers — 3D Pipes, 3D Maze, and more
Sanctions-defying chip design tools that work on China chipmakers' Huawei and Phytium processors introduced by China-based firm
Intel fires back at AMD's AI benchmarks, shares results claiming current-gen Xeon chips are faster at AI than AMD's next-gen 128-core EPYC Turin
Samsung's new roadmap unveils its 2nm process nodes and outlines backside power delivery plans
Startup claims it can boost any processor's performance by 100X — Flow Computing introduces its 'CPU 2.0' architecture (2024)

References

Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 5911

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.