common_text_features_functions module

Module which contains functions which are used more than once in text features extraction scripts

common_text_features_functions.cut_xml(_x1, _y1, _x2, _y2, xml_file)[source]

Returns text, contained in specified area (recognized rectangle in newspaper), from xml file.

Args:
_x1 (int) : Upper left-sided x coordinate _y1 (int) : Upper left-sided y coordinate _x2 (int) : Lower right-sided x coordinate _y2 (int) : Lower right-sided y coordinate
common_text_features_functions.get_punct_amount(words_list)[source]

Returns number of punctation in desired rectangle.

Args:
words_list (list) : list of words in which we need to check amount of punctation