PyTorch’s Next Generation of Data Tooling

apply(conf) - May '22 - 10 minutes

An overview and lookahead of our data efforts within PyTorch, including our new API extension points to support state-of-the-art ML data processing in both research and production. TorchData, an extensible library for constructing data loading graphs, and TorchArrow, and lightweight front-end for dispatchable data processing, will be introduced with examples.

Donny Greenberg

Product Management Lead, PyTorch


Donny Greenberg is the PyTorch PM lead at Meta, supporting the AI community across research, production, OSS, and enterprise. Recent projects include TorchRec, the open-sourcing of Meta’s large-scale recommendations infra, TorchArrow&TorchData, PyTorch’s next generation of data APIs, and unified training infrastructure at Meta. He previously worked at IBM as a Quantum Computing Researcher and Tech Lead for IBM’s open-source Quantum Algorithms library, and at Google on its Ad Exchange real-time bidding platform. He holds 9 patents and graduated from Binghamton University with bachelor’s degrees in Computer Science, Mathematics, and Finance.