Data Scientist

163k – 204kSan Francisco, CAHybrid4+ YOEJun 15

Summary

Early member of the data team defining how the company uses data to improve user security. Works across product insights, business strategy, data pipelines, and security research.

About the role

What you'll do

Contribute to specific data science projects and initiatives at Semgrep; discovering each department’s most pressing data problems, and proactively identifying the most critical areas to focus your efforts
Bring your wide knowledge of data-science approaches to each problem you solve: the first day you might build a dashboard to track Board level metrics for the Engineering team, the second you might apply multivariate regression to identify important product features, the third you might apply active-learning techniques to guide data collection and labeling
Iteratively tackle problems as a series of experiments, proving the value of your work with proof-of-concept to ever more refined results
Convince your peers of your conclusions with clear data visualizations and well-reasoned explanation
Help grow your team through the recruitment and hiring of top data talent

Example projects

Build a client-facing dashboard showing scan time metrics over time to show how the product is improving
Work together with Product leadership to identify the correct north-star metrics to measure Product usage and what features to build next
Partner with the rule-writing team to identify the most impactful rules and languages to focus on in real-time
Build out cleaned/medallion Silver and Gold tables in our Data Lakehouse for internal engineering and product teams to self-serve their analytics needs
Build an S3 → Snowflake data pipeline and processing engine to improve Repo contributor count metrics for Billing team
Build a statistical model that analyzes pseudonymous usage data to recommend the next features built into the Semgrep open-source tool
Consume infrastructure observation metrics to identify and address potential Semgrep.dev registry outages before they occur
Recruit varied and disjoint data into a “North Star” metric for the performance of the Semgrep open-source tool over time
Craft a security-rule-recommendation decision tree, using codebase features like languages, frameworks, code sentiment, and commit-message sentiment, to deliver targeted, high-value static-analysis rules to users

Requirements

4+ years of experience in data and strategy fields
Knowledge of data-science approaches; this may include machine-learning algorithms, optimization methods or symbolic artificial-intelligence, but should also include statistical methods and “good-enough” heuristics — and the taste to know when to use each
Experience clearly visualizing information and experimental results across the full company stack: Board-level, leadership team, and individual team leads
Sufficient familiarity with production data-processing pipelines to construct them working together with generalist infrastructure engineers; tools we use include S3, FiveTran, DBT, Snowflake, Metabase, Retool, Sagemaker/JupyterNotebook (Python)
Aptitude delivering technical projects via rapid iterative development
Experience working on a small team in a fast-paced environment and are willing to experiment with different approaches before settling on the best and most elegant solution given time constraints
Excellent, proactive communication, both verbal and written

Compensation

The estimated starting annual salary range for this position is $163,000 - $204,000 USD
In addition to base salary, total compensation may include equity, variable compensation, and benefits

Skills

PythonMachine LearningStatistical AnalysisData VisualizationSnowflakeDBTS3FivetranMetabaseSageMaker

Similar roles at this salary range

All Data Science jobs →

Harvey

Jun 18

Data Scientist, Product

Product Data Scientist embedded with Product and Engineering to define metrics, run experiments, and turn product usage data into actionable recommendations for AI platform decisions.

155k – 240kSan Francisco, CAData ScienceHybrid5+ YOESQLPython

Plaid

Jun 18

Senior Data Scientist

First Data Scientist on Plaid's Embedded Insights team building analytics and measurement frameworks for ML models. Partners with product, engineering, and ML teams to drive data-informed decisions and evaluate model performance.

191k – 263kSan Francisco, CA +2Data ScienceHybrid5+ YOESQLDBT

Gusto

Jun 18

Senior Data Scientist, Risk

Staff Data Scientist on the Risk team driving experimentation, statistical inference, and causal analysis to inform strategic decisions. Requires 7-10 years of DS experience, strong SQL/Python, and proven impact influencing leadership.

186k – 230kSan Francisco, CA +1Data ScienceHybrid7+ YOESQLPython

Jun 18

Sr. Data Scientist, Marketing

Data Scientist driving marketing measurement and optimization at Pinterest. Builds statistical/ML models, attribution frameworks, and causal analyses to quantify ROI and guide revenue strategy.

140k – 288kSan Francisco, CAData ScienceRemote6+ YOERSQL

Findigs

Jun 18

Data Scientist

Own feature engineering, model iteration, and A/B testing for an AI underwriting engine that influences rental decisions. Build production risk models and partner with Product and Engineering on high-stakes decisioning systems.

160k – 185kNew York, NYData ScienceHybrid4+ YOESQLdbt

Apply