Amber Xie and Ajay Jain and Pieter Abbeel

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-61

May 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-61.pdf

Diffusion models have shown impressive results in text- to-image synthesis. Using massive datasets of captioned images, diffusion models learn to generate raster images of highly diverse objects and scenes. However, designers frequently use vector representations of images like Scalable Vector Graphics (SVGs) for digital icons or art. Vector graphics can be scaled to any size, and are compact. We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG- exportable vector graphics. We do so without access to large datasets of captioned SVGs. By optimizing a differentiable vector graphics rasterizer, our method, VectorFusion, distills abstract semantic knowledge out of a pretrained diffusion model. Inspired by recent text-to-3D work, we learn an SVG consistent with a caption using Score Distillation Sampling. To accelerate generation and improve fidelity, VectorFusion also initializes from an image sample. Experiments show greater quality than prior work, and demonstrate a range of styles including pixel art and sketches.

Advisors: Pieter Abbeel


BibTeX citation:

@mastersthesis{Xie:EECS-2023-61,
    Author= {Xie, Amber and Jain, Ajay and Abbeel, Pieter},
    Title= {VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-61.html},
    Number= {UCB/EECS-2023-61},
    Abstract= {Diffusion models have shown impressive results in text-
to-image synthesis. Using massive datasets of captioned images, diffusion models learn to generate raster images
of highly diverse objects and scenes. However, designers
frequently use vector representations of images like Scalable
Vector Graphics (SVGs) for digital icons or art. Vector
graphics can be scaled to any size, and are compact. We
show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-
exportable vector graphics. We do so without access to large datasets of captioned SVGs. By optimizing a differentiable
vector graphics rasterizer, our method, VectorFusion, distills
abstract semantic knowledge out of a pretrained diffusion
model. Inspired by recent text-to-3D work, we learn an SVG
consistent with a caption using Score Distillation Sampling.
To accelerate generation and improve fidelity, VectorFusion
also initializes from an image sample. Experiments show
greater quality than prior work, and demonstrate a range of
styles including pixel art and sketches.},
}

EndNote citation:

%0 Thesis
%A Xie, Amber 
%A Jain, Ajay 
%A Abbeel, Pieter 
%T VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 1
%@ UCB/EECS-2023-61
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-61.html
%F Xie:EECS-2023-61