wordcloud.WordCloud

class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling='auto', regexp=None, collocations=True, colormap=None, normalize_plurals=True, contour_width=0, contour_color='black', repeat=False, include_numbers=False, min_word_length=0)[source]

Word cloud object for generating and drawing.

Parameters
font_pathstring

Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path.

widthint (default=400)

Width of the canvas.

heightint (default=200)

Height of the canvas.

prefer_horizontalfloat (default=0.90)

The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.)

masknd-array or None (default=None)

If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd “masked out” while other entries will be free to draw on. [This changed in the most recent version!]

contour_width: float (default=0)

If mask is not None and contour_width > 0, draw the mask contour.

contour_color: color value (default=”black”)

Mask contour color.

scalefloat (default=1)

Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.

min_font_sizeint (default=4)

Smallest font size to use. Will stop when there is no more room in this size.

font_stepint (default=1)

Step size for the font. font_step > 1 might speed up computation but give a worse fit.

max_wordsnumber (default=200)

The maximum number of words.

stopwordsset of strings or None

The words that will be eliminated. If None, the build-in STOPWORDS list will be used. Ignored if using generate_from_frequencies.

background_colorcolor value (default=”black”)

Background color for the word cloud image.

max_font_sizeint or None (default=None)

Maximum font size for the largest word. If None, height of the image is used.

modestring (default=”RGB”)

Transparent background will be generated when mode is “RGBA” and background_color is None.

relative_scalingfloat (default=’auto’)

Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. If ‘auto’ it will be set to 0.5 unless repeat is true, in which case it will be set to 0.

color_funccallable, default=None

Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites “colormap”. See colormap for specifying a matplotlib colormap instead. To create a word cloud with a single color, use color_func=lambda *args, **kwargs: "white". The single color can also be specified using RGB code. For example color_func=lambda *args, **kwargs: (255,0,0) sets color to red.

regexpstring or None (optional)

Regular expression to split the input text into tokens in process_text. If None is specified, r"\w[\w']+" is used. Ignored if using generate_from_frequencies.

collocationsbool, default=True

Whether to include collocations (bigrams) of two words. Ignored if using generate_from_frequencies.

colormapstring or matplotlib colormap, default=”viridis”

Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified.

normalize_pluralsbool, default=True

Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’. Ignored if using generate_from_frequencies.

repeatbool, default=False

Whether to repeat words and phrases until max_words or min_font_size is reached.

include_numbersbool, default=False

Whether to include numbers as phrases or not.

min_word_lengthint, default=0

Minimum number of letters a word must have to be included.

Notes

Larger canvases with make the code significantly slower. If you need a large word cloud, try a lower canvas size, and set the scale parameter.

The algorithm might give more weight to the ranking of the words than their actual frequencies, depending on the max_font_size and the scaling heuristic.

Attributes
``words_``dict of string to float

Word tokens with associated frequency.

``layout_``list of tuples (string, int, (int, int), int, color))

Encodes the fitted word cloud. Encodes for each word the string, font size, position, orientation and color.

Methods

fit_words(self, frequencies)

Create a word_cloud from words and frequencies.

generate(self, text)

Generate wordcloud from text.

generate_from_frequencies(self, frequencies)

Create a word_cloud from words and frequencies.

generate_from_text(self, text)

Generate wordcloud from text.

process_text(self, text)

Splits a long text into words, eliminates the stopwords.

recolor(self[, random_state, color_func, …])

Recolor existing layout.

to_array(self)

Convert to numpy array.

to_file(self, filename)

Export to image file.

to_html

to_image

__init__(self, font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling='auto', regexp=None, collocations=True, colormap=None, normalize_plurals=True, contour_width=0, contour_color='black', repeat=False, include_numbers=False, min_word_length=0)[source]

Initialize self. See help(type(self)) for accurate signature.

fit_words(self, frequencies)[source]

Create a word_cloud from words and frequencies.

Alias to generate_from_frequencies.

Parameters
frequenciesdict from string to float

A contains words and associated frequency.

Returns
self
generate(self, text)[source]

Generate wordcloud from text.

The input “text” is expected to be a natural text. If you pass a sorted list of words, words will appear in your output twice. To remove this duplication, set collocations=False.

Alias to generate_from_text.

Calls process_text and generate_from_frequencies.

Returns
self
generate_from_frequencies(self, frequencies, max_font_size=None)[source]

Create a word_cloud from words and frequencies.

Parameters
frequenciesdict from string to float

A contains words and associated frequency.

max_font_sizeint

Use this font-size instead of self.max_font_size

Returns
self
generate_from_text(self, text)[source]

Generate wordcloud from text.

The input “text” is expected to be a natural text. If you pass a sorted list of words, words will appear in your output twice. To remove this duplication, set collocations=False.

Calls process_text and generate_from_frequencies.

..versionchanged:: 1.2.2

Argument of generate_from_frequencies() is not return of process_text() any more.

Returns
self
process_text(self, text)[source]

Splits a long text into words, eliminates the stopwords.

Parameters
textstring

The text to be processed.

Returns
wordsdict (string, int)

Word tokens with associated frequency.

..versionchanged:: 1.2.2

Changed return type from list of tuples to dict.

Notes

There are better ways to do word tokenization, but I don’t want to include all those things.

recolor(self, random_state=None, color_func=None, colormap=None)[source]

Recolor existing layout.

Applying a new coloring is much faster than generating the whole wordcloud.

Parameters
random_stateRandomState, int, or None, default=None

If not None, a fixed random state is used. If an int is given, this is used as seed for a random.Random state.

color_funcfunction or None, default=None

Function to generate new color from word count, font size, position and orientation. If None, self.color_func is used.

colormapstring or matplotlib colormap, default=None

Use this colormap to generate new colors. Ignored if color_func is specified. If None, self.color_func (or self.color_map) is used.

Returns
self
to_array(self)[source]

Convert to numpy array.

Returns
imagend-array size (width, height, 3)

Word cloud image as numpy matrix.

to_file(self, filename)[source]

Export to image file.

Parameters
filenamestring

Location to write to.

Returns
self