'column' Object Is Not Callable With Regex And Pyspark
I need to extract the integers only from url stings in the column 'Page URL' and append those extracted integers to a new column. I am using PySpark. My code below: from pyspark.s
Solution 1:
You may use
spark_df_url.withColumn("new_column", regexp_extract("Page URL", "\d+", 0))
Specify the name of the string column as the first argument to regexp_replace
and make sure the third argument is set to 0
as your pattern has no capturing groups and you are interested in getting the whole match value as a result.
Note that when you specified 1
as the third argument, you got empty results:
If the regex did not match, or the specified group did not match, an empty string is returned.
Post a Comment for "'column' Object Is Not Callable With Regex And Pyspark"