schizoidman@lemm.ee to Technology@lemmy.worldEnglish · 5 days agoDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comexternal-linkmessage-square22fedilinkarrow-up1175arrow-down118
arrow-up1157arrow-down1external-linkDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comschizoidman@lemm.ee to Technology@lemmy.worldEnglish · 5 days agomessage-square22fedilink
minus-squareAvid Amoeba@lemmy.calinkfedilinkEnglisharrow-up7·5 days agoIt proved sqrt(2) irrational with 40tps on a 3090 here. The 32b R1 did it with 32tps but it thought a lot longer.
minus-squareIrdial@lemmy.sdf.orglinkfedilinkEnglisharrow-up3arrow-down1·edit-25 days agoOn my Mac mini running LM Studio, it managed 1702 tokens at 17.19 tok/sec and thought for 1 minute. If accurate, high-performance models were more able to run on consumer hardware, I would use my 3060 as a dedicated inference device
It proved sqrt(2) irrational with 40tps on a 3090 here. The 32b R1 did it with 32tps but it thought a lot longer.
On my Mac mini running LM Studio, it managed 1702 tokens at 17.19 tok/sec and thought for 1 minute. If accurate, high-performance models were more able to run on consumer hardware, I would use my 3060 as a dedicated inference device