wordcloud.WordCloud

class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling='auto', regexp=None, collocations=True, colormap=None, normalize_plurals=True, contour_width=0, contour_color='black', repeat=False, include_numbers=False, min_word_length=0)[source]

Word cloud object for generating and drawing.

Parameters
font_pathstring

Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path.

widthint (default=400)

Width of the canvas.

heightint (default=200)

Height of the canvas.

prefer_horizontalfloat (default=0.90)

The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.)

masknd-array or None (default=None)

If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd “masked out” while other entries will be free to draw on. [This changed in the most recent version!]

contour_width: float (default=0)

If mask is not None and contour_width > 0, draw the mask contour.

contour_color: color value (default=”black”)

Mask contour color.

scalefloat (default=1)

Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.

min_font_sizeint (default=4)

Smallest font size to use. Will stop when there is no more room in this size.

font_stepint (default=1)

Step size for the font. font_step > 1 might speed up computation but give a worse fit.

max_wordsnumber (default=200)

The maximum number of words.

stopwordsset of strings or None

The words that will be eliminated. If None, the build-in STOPWORDS list will be used. Ignored if using generate_from_frequencies.

background_colorcolor value (default=”black”)

Background color for the word cloud image.

max_font_sizeint or None (default=None)

Maximum font size for the largest word. If None, height of the image is used.

modestring (default=”RGB”)

Transparent background will be generated when mode is “RGBA” and background_color is None.

relative_scalingfloat (default=’auto’)

Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. If ‘auto’ it will be set to 0.5 unless repeat is true, in which case it will be set to 0.

color_funccallable, default=None

Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites “colormap”. See colormap for specifying a matplotlib colormap instead. To create a word cloud with a single color, use color_func=lambda *args, **kwargs: "white". The single color can also be specified using RGB code. For example color_func=lambda *args, **kwargs: (255,0,0) sets color to red.

regexpstring or None (optional)

Regular expression to split the input text into tokens in process_text. If None is specified, r"\w[\w']+" is used. Ignored if using generate_from_frequencies.

collocationsbool, default=True

Whether to include collocations (bigrams) of two words. Ignored if using generate_from_frequencies.

colormapstring or matplotlib colormap, default=”viridis”

Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified.

normalize_pluralsbool, default=True

Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’. Ignored if using generate_from_frequencies.

repeatbool, default=False

Whether to repeat words and phrases until max_words or min_font_size is reached.

include_numbersbool, default=False

Whether to include numbers as phrases or not.

min_word_lengthint, default=0

Minimum number of letters a word must have to be included.

Notes

Larger canvases with make the code significantly slower. If you need a large word cloud, try a lower canvas size, and set the scale parameter.

The algorithm might give more weight to the ranking of the words than their actual frequencies, depending on the max_font_size and the scaling heuristic.

Attributes
``words_``dict of string to float

Word tokens with associated frequency.

``layout_``list of tuples (string, int, (int, int), int, color))

Encodes the fitted word cloud. Encodes for each word the string, font size, position, orientation and color.

Methods

fit_words(self, frequencies)

Create a word_cloud from words and frequencies.

generate(self, text)

Generate wordcloud from text.

generate_from_frequencies(self, frequencies)

Create a word_cloud from words and frequencies.

generate_from_text(self, text)

Generate wordcloud from text.

process_text(self, text)

Splits a long text into words, eliminates the stopwords.

recolor(self[, random_state, color_func, …])

Recolor existing layout.

to_array(self)

Convert to numpy array.

to_file(self, filename)

Export to image file.

to_svg(self[, embed_font, …])

Export to SVG.

to_html

to_image

__init__(self, font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling='auto', regexp=None, collocations=True, colormap=None, normalize_plurals=True, contour_width=0, contour_color='black', repeat=False, include_numbers=False, min_word_length=0)[source]

Initialize self. See help(type(self)) for accurate signature.

fit_words(self, frequencies)[source]

Create a word_cloud from words and frequencies.

Alias to generate_from_frequencies.

Parameters
frequenciesdict from string to float

A contains words and associated frequency.

Returns
self
generate(self, text)[source]

Generate wordcloud from text.

The input “text” is expected to be a natural text. If you pass a sorted list of words, words will appear in your output twice. To remove this duplication, set collocations=False.

Alias to generate_from_text.

Calls process_text and generate_from_frequencies.

Returns
self
generate_from_frequencies(self, frequencies, max_font_size=None)[source]

Create a word_cloud from words and frequencies.

Parameters
frequenciesdict from string to float

A contains words and associated frequency.

max_font_sizeint

Use this font-size instead of self.max_font_size

Returns
self
generate_from_text(self, text)[source]

Generate wordcloud from text.

The input “text” is expected to be a natural text. If you pass a sorted list of words, words will appear in your output twice. To remove this duplication, set collocations=False.

Calls process_text and generate_from_frequencies.

..versionchanged:: 1.2.2

Argument of generate_from_frequencies() is not return of process_text() any more.

Returns
self
process_text(self, text)[source]

Splits a long text into words, eliminates the stopwords.

Parameters
textstring

The text to be processed.

Returns
wordsdict (string, int)

Word tokens with associated frequency.

..versionchanged:: 1.2.2

Changed return type from list of tuples to dict.

Notes

There are better ways to do word tokenization, but I don’t want to include all those things.

recolor(self, random_state=None, color_func=None, colormap=None)[source]

Recolor existing layout.

Applying a new coloring is much faster than generating the whole wordcloud.

Parameters
random_stateRandomState, int, or None, default=None

If not None, a fixed random state is used. If an int is given, this is used as seed for a random.Random state.

color_funcfunction or None, default=None

Function to generate new color from word count, font size, position and orientation. If None, self.color_func is used.

colormapstring or matplotlib colormap, default=None

Use this colormap to generate new colors. Ignored if color_func is specified. If None, self.color_func (or self.color_map) is used.

Returns
self
to_array(self)[source]

Convert to numpy array.

Returns
imagend-array size (width, height, 3)

Word cloud image as numpy matrix.

to_file(self, filename)[source]

Export to image file.

Parameters
filenamestring

Location to write to.

Returns
self
to_svg(self, embed_font=False, optimize_embedded_font=True, embed_image=False)[source]

Export to SVG.

Font is assumed to be available to the SVG reader. Otherwise, text coordinates may produce artifacts when rendered with replacement font. It is also possible to include a subset of the original font in WOFF format using embed_font (requires fontTools).

Note that some renderers do not handle glyphs the same way, and may differ from to_image result. In particular, Complex Text Layout may not be supported. In this typesetting, the shape or positioning of a grapheme depends on its relation to other graphemes.

Pillow, since version 4.2.0, supports CTL using libraqm. However, due to dependencies, this feature is not always enabled. Hence, the same rendering differences may appear in to_image. As this rasterized output is used to compute the layout, this also affects the layout generation. Use PIL.features.check to test availability of raqm.

Consistant rendering is therefore expected if both Pillow and the SVG renderer have the same support of CTL.

Contour drawing is not supported.

Parameters
embed_fontbool, default=False

Whether to include font inside resulting SVG file.

optimize_embedded_fontbool, default=True

Whether to be aggressive when embedding a font, to reduce size. In particular, hinting tables are dropped, which may introduce slight changes to character shapes (w.r.t. to_image baseline).

embed_imagebool, default=False

Whether to include rasterized image inside resulting SVG file. Useful for debugging.

Returns
contentstring

Word cloud image as SVG string